As we now take the breadcrumbs spinlock within the interrupt handler, we
wish to minimise its hold time. During the interrupt we do not care
about the state of the full rbtree, only that of the first element, so
we can guard that with a separate lock.
v2: Rename first_wait to irq_wait to make it clearer that it is guarded
by irq_lock.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170303190824.1330-1-chris@chris-wilson.co.uk
Start computing the vlv/chv watermarks the atomic way, from the
.compute_pipe_wm() hook. We'll recompute the actual watermarks
for only planes that are part of the state, the other planes will
keep their watermark from the last time it was computed.
And the actual watermark programming will happen from the
.initial_watermarks() hook. For now we'll just compute the
optimal watermarks, and we'll hook up the intermediate
watermarks properly later.
The DSPARB registers responsible for the FIFO paritioning are
double buffered, so they will be programming from
intel_begin_crtc_commit().
v2: s/noninverted/raw/ for consistency with other platforms
s/vlv_plane_wm_set/vlv_raw_plane_wm_set/ for clarity
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170302171508.1666-8-ville.syrjala@linux.intel.com
It is preferred to pass pipe_config to functions instead of accessing
crtc->config directly. Follow suit and pass pipe_config to the fdi link
train functions.
v2: Add const; s/pipe_config/crtc_state/ (Ville)
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170302125857.14665-5-ander.conselvan.de.oliveira@intel.com
Useful for double checking that the device is powered up when it hung,
include both the status of the power management and our rpm wakelock.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170302151544.16915-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Whilst investigating some mysterious failures with hangcheck not running
during gem_busy/basic-hang-default, the question is why did we decide to
cancel the retire_work (which queues the hangcheck)? That decision is
based around GT activity, so include that information in the debug
report.
v2: Include the GT awake status in the error state
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170302150356.9713-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Listen for PMIC bus access notifications and get FORCEWAKE_ALL while
the bus is accessed to avoid needing to do any forcewakes, which need
PMIC bus access, while the PMIC bus is busy:
This fixes errors like these showing up in dmesg, usually followed
by a gfx or system freeze:
[drm:fw_domains_get [i915]] *ERROR* render: timed out waiting for forcewake ack request.
[drm:fw_domains_get [i915]] *MEDIA* render: timed out waiting for forcewake ack request.
i2c_designware 808622C1:06: punit semaphore timed out, resetting
i2c_designware 808622C1:06: PUNIT SEM: 2
i2c_designware 808622C1:06: couldn't acquire bus ownership
Downside of this approach is that it causes wakeups whenever the PMIC
bus is accessed. Unfortunately we cannot simply wait for the PMIC bus
to go idle when we hit a race, as forcewakes may be done from interrupt
handlers where we cannot sleep to wait for the i2c PMIC bus access to
finish.
Note that the notifications and thus the wakeups will only happen on
baytrail / cherrytrail devices using PMICs with a shared i2c bus for
P-Unit and host PMIC access (i2c busses with a _SEM method in their
APCI node), e.g. an axp288 PMIC.
I plan to write some patches for drivers accessing the PMIC bus to
limit their bus accesses to a bare minimum (e.g. cache registers, do not
update battery level more often then 4 times a minute), to limit the
amount of wakeups.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=155241
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: tagorereddy <tagore.chandan@gmail.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
[danvet: Wiggle in conflicts.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Rename intel_uncore_early_sanitize to intel_uncore_resume, dropping the
(always true) restore_forcewake argument and add a new intel_uncore_resume
function to replace the intel_uncore_forcewake_reset(dev_priv, false)
calls done from the suspend / runtime_suspend functions and make
intel_uncore_forcewake_reset private.
This is a preparation patch for adding PMIC bus access notifier support.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=155241
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: tagorereddy <tagore.chandan@gmail.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170210102802.20898-12-hdegoede@redhat.com
As execlists and other non-semaphore multi-engine devices coordinate
between engines using interrupts, we can shave off a few 10s of
microsecond of scheduling latency by doing the fence signaling from the
interrupt as opposed to a RT kthread. (Realistically the delay adds
about 1% to an individual cross-engine workload.) We only signal the
first fence in order to limit the amount of work we move into the
interrupt handler. We also have to remember that our breadcrumbs may be
unordered with respect to the interrupt and so we still require the
waiter process to perform some heavyweight coherency fixups, as well as
traversing the tree of waiters.
v2: No need for early exit in irq handler - it breaks the flow between
patches and prevents the tracepoint
v3: Restore rcu hold across irq signaling of request
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170227205850.2828-2-chris@chris-wilson.co.uk
According to bspec, the DDI IO power domains should be enabled after
enabling the DPLL and mapping it to the DDI. The current order doesn't
seem to create problems with Skylake and Kabylake, but causes enable
timeouts in Geminilake.
v2: Rebase.
- Take power domain references before sanitizing encoders. (Imre)
- Add comment to get_encoder_power_domains() defition. (Ander)
v3: Don't put the domain if called with HSW/BDW's analog encoder. (CI)
v4: Put IO power domain before unmapping DPLL. (Imre)
- Change return type of intel_ddi_get_power_domains() to u64. (Imre)
Cc: David Weinehall <david.weinehall@linux.intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com> # v1
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170224141959.5955-1-ander.conselvan.de.oliveira@intel.com
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.
Remove the vma parameter to simplify things.
[arnd@arndb.de: fix ARM build]
Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch makes the I915_PARAM_HAS_EXEC_CONSTANTS getparam return 0
(indicating the optional feature is not supported), and makes execbuf
always return -EINVAL if the flags are used.
Apparently, no userspace ever shipped which used this optional feature:
I checked the git history of Mesa, xf86-video-intel, libva, and Beignet,
and there were zero commits showing a use of these flags. Kernel commit
72bfa19c8d apparently introduced the feature prematurely. According
to Chris, the intention was to use this in cairo-drm, but "the use was
broken for gen6", so I don't think it ever happened.
'relative_constants_mode' has always been tracked per-device, but this
has actually been wrong ever since hardware contexts were introduced, as
the INSTPM register is saved (and automatically restored) as part of the
render ring context. The software per-device value could therefore get
out of sync with the hardware per-context value. This meant that using
them is actually unsafe: a client which tried to use them could damage
the state of other clients, causing the GPU to interpret their BO
offsets as absolute pointers, leading to bogus memory reads.
These flags were also never ported to execlist mode, making them no-ops
on Gen9+ (which requires execlists), and Gen8 in the default mode.
On Gen8+, userspace can write these registers directly, achieving the
same effect. On Gen6-7.5, it likely makes sense to extend the command
parser to support them. I don't think anyone wants this on Gen4-5.
Based on a patch by Dave Gordon.
v3: Return -ENODEV for the getparam, as this is what we do for other
obsolete features. Suggested by Chris Wilson.
Cc: stable@vger.kernel.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92448
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215093446.21291-1-kenneth@whitecape.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYr5aeAAoJEAx081l5xIa+ZK4P/RD3XUsduYqziVFCRQ2n0X8r
+D92F4peTnSeSq7ZcZvprv+fezUGAHbfsWFs8feYCI5quUO6pEQSPwN+wyGazUi0
4hUVB/K9Iq7U/Bj7Z/SmsU3NuWJnkNqbmvSFvUdqYK9D/kl+Tnllzap2N4cTzjwu
GZOObz4n85cx94NqC3qw+7/ptL1X2MhXa+z0MzbkKyas84Bko1LwCSHRHsDKUnJc
IcSpOcYZ6pSRMIsKH4Kd79Go4vWm7djXT9XL3PwDk2NcXXUOuR+cfdHqYchYaM/O
iD2hvaSywBcflxSAml5x6vlXraoRd91ZZulgOObXtFfnUXdZB81TVq4uv6LU4Bx3
jLFixUZuk/TJT+W/8N10l7M6yMIFaTpNoNMc5n4IF5RNNyWba4BKnrI+f+lQiOpY
mmjIaidb0t5BICnJzCD264RhCEXmP0HaDV+iQQV6y6jJRXfd1bgnOXLKP73JekzB
TsbDshCoE7UO0dJ7n0LFpXSTQDTYzlazoEp14f2kFBxir5/l7r67nUlnDTvUQfuN
tSRvpN/s0wqvH3o7zhmpHxyJ/ZasPMQjNCFAuUEbx8L5SKXsua0FubIzN4aVpilb
XvfdFRWM/lkOT/q+8cGI/TcE3YTqEmALmGxdV/akbdNCiCg6aClyCLRE/DZhgmSQ
UMFjr9wlHl5Qo/OqLKj0
=Yjfg
-----END PGP SIGNATURE-----
Merge tag 'drm-for-v4.11-less-shouty' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
"This is the main drm pull request for v4.11.
Nothing too major, the tinydrm and mmu-less support should make
writing smaller drivers easier for some of the simpler platforms, and
there are a bunch of documentation updates.
Intel grew displayport MST audio support which is hopefully useful to
people, and FBC is on by default for GEN9+ (so people know where to
look for regressions). AMDGPU has a lot of fixes that would like new
firmware files installed for some GPUs.
Other than that it's pretty scattered all over.
I may have a follow up pull request as I know BenH has a bunch of AST
rework and fixes and I'd like to get those in once they've been tested
by AST, and I've got at least one pull request I'm just trying to get
the author to fix up.
Core:
- drm_mm reworked
- Connector list locking and iterators
- Documentation updates
- Format handling rework
- MMU-less support for fbdev helpers
- drm_crtc_from_index helper
- Core CRC API
- Remove drm_framebuffer_unregister_private
- Debugfs cleanup
- EDID/Infoframe fixes
- Release callback
- Tinydrm support (smaller drivers for simple hw)
panel:
- Add support for some new simple panels
i915:
- FBC by default for gen9+
- Shared dpll cleanups and docs
- GEN8 powerdomain cleanup
- DMC support on GLK
- DP MST audio support
- HuC loading support
- GVT init ordering fixes
- GVT IOMMU workaround fix
amdgpu/radeon:
- Power/clockgating improvements
- Preliminary SR-IOV support
- TTM buffer priority and eviction fixes
- SI DPM quirks removed due to firmware fixes
- Powerplay improvements
- VCE/UVD powergating fixes
- Cleanup SI GFX code to match CI/VI
- Support for > 2 displays on 3/5 crtc asics
- SI headless fixes
nouveau:
- Rework securre boot code in prep for GP10x secure boot
- Channel recovery improvements
- Initial power budget code
- MMU rework preperation
vmwgfx:
- Bunch of fixes and cleanups
exynos:
- Runtime PM support for MIC driver
- Cleanups to use atomic helpers
- UHD Support for TM2/TM2E boards
- Trigger mode fix for Rinato board
etnaviv:
- Shader performance fix
- Command stream validator fixes
- Command buffer suballocator
rockchip:
- CDN DisplayPort support
- IOMMU support for arm64 platform
imx-drm:
- Fix i.MX5 TV encoder probing
- Remove lower fb size limits
msm:
- Support for HW cursor on MDP5 devices
- DSI encoder cleanup
- GPU DT bindings cleanup
sti:
- stih410 cleanups
- Create fbdev at binding
- HQVDP fixes
- Remove stih416 chip functionality
- DVI/HDMI mode selection fixes
- FPS statistic reporting
omapdrm:
- IRQ code cleanup
dwi-hdmi bridge:
- Cleanups and fixes
adv-bridge:
- Updates for nexus
sii8520 bridge:
- Add interlace mode support
- Rework HDMI and lots of fixes
qxl:
- probing/teardown cleanups
ZTE drm:
- HDMI audio via SPDIF interface
- Video Layer overlay plane support
- Add TV encoder output device
atmel-hlcdc:
- Rework fbdev creation logic
tegra:
- OF node fix
fsl-dcu:
- Minor fixes
mali-dp:
- Assorted fixes
sunxi:
- Minor fix"
[ This was the "fixed" pull, that still had build warnings due to people
not even having build tested the result. I'm not a happy camper
I've fixed the things I noticed up in this merge. - Linus ]
* tag 'drm-for-v4.11-less-shouty' of git://people.freedesktop.org/~airlied/linux: (1177 commits)
lib/Kconfig: make PRIME_NUMBERS not user selectable
drm/tinydrm: helpers: Properly fix backlight dependency
drm/tinydrm: mipi-dbi: Fix field width specifier warning
drm/tinydrm: mipi-dbi: Silence: ‘cmd’ may be used uninitialized
drm/sti: fix build warnings in sti_drv.c and sti_vtg.c files
drm/amd/powerplay: fix PSI feature on Polars12
drm/amdgpu: refuse to reserve io mem for split VRAM buffers
drm/ttm: fix use-after-free races in vm fault handling
drm/tinydrm: Add support for Multi-Inno MI0283QT display
dt-bindings: Add Multi-Inno MI0283QT binding
dt-bindings: display/panel: Add common rotation property
of: Add vendor prefix for Multi-Inno
drm/tinydrm: Add MIPI DBI support
drm/tinydrm: Add helper functions
drm: Add DRM support for tiny LCD displays
drm/amd/amdgpu: post card if there is real hw resetting performed
drm/nouveau/tmr: provide backtrace when a timeout is hit
drm/nouveau/pci/g92: Fix rearm
drm/nouveau/drm/therm/fan: add a fallback if no fan control is specified in the vbios
drm/nouveau/hwmon: expose power_max and power_crit
..
A request is assigned a global seqno only when it is on the hardware
execution queue. The global seqno can be used to maintain a list of
requests on the same engine in retirement order, for example for
constructing a priority queue for waiting. Prior to its execution, or
if it is subsequently removed in the event of preemption, its global
seqno is zero. As both insertion and removal from the execution queue
may operate in IRQ context, it is not guarded by the usual struct_mutex
BKL. Instead those relying on the global seqno must be prepared for its
value to change between reads. Only when the request is complete can
the global seqno be stable (due to the memory barriers on submitting
the commands to the hardware to write the breadcrumb, if the HWS shows
that it has passed the global seqno and the global seqno is unchanged
after the read, it is indeed complete).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-9-chris@chris-wilson.co.uk
When dma_fence_signal() is called, it sets a flag to indicate the fence
is complete. Before the dma_fence is signaled, the seqno check will
first be passed. During an unlocked check (such as inside a waiter), it
is possible for the fence to be signaled even though the seqno has been
reset (by engine wraparound). In this case the waiter will be kicked,
but for an extra layer of protection we can check the persistent
signaled bit from the fence.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-2-chris@chris-wilson.co.uk
Geminilake has a third sprite plane (or fourth universal plane) that is
independent from the cursor. Make sure that for_each_plane_id_on_crtc()
is aware of that extra plane so that the watermark code takes it into
account.
Fixes: e9c9882556 ("drm/i915/glk: Configure number of sprite planes properly")
Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: <drm-intel-fixes@lists.freedesktop.org>
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170223071600.14356-2-ander.conselvan.de.oliveira@intel.com
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJYoM2fAAoJEHm+PkMAQRiGr9MH/izEAMri7rJ0QMc3ejt+WmD0
8pkZw3+MVn71z6cIEgpzk4QkEWJd5rfhkETCeCp7qQ9V6cDW1FDE9+0OmPjiphDt
nnzKs7t7skEBwH5Mq5xygmIfkv+Z0QGHZ20gfQWY3F56Uxo+ARF88OBHBLKhqx3v
98C7YbMFLKBslKClA78NUEIdx0UfBaRqerlERx0Lfl9aoOrbBS6WI3iuREiylpih
9o7HTrwaGKkU4Kd6NdgJP2EyWPsd1LGalxBBjeDSpm5uokX6ALTdNXDZqcQscHjE
RmTqJTGRdhSThXOpNnvUJvk9L442yuNRrVme/IqLpxMdHPyjaXR3FGSIDb2SfjY=
=VMy8
-----END PGP SIGNATURE-----
Merge tag 'v4.10-rc8' into drm-next
Linux 4.10-rc8
Backmerge Linus rc8 to fix some conflicts, but also
to avoid pulling it in via a fixes pull from someone.
Flushing the cachelines for an object is slow, can be as much as 100ms
for a large framebuffer. We currently do this under the struct_mutex BKL
on execution or on pageflip. But now with the ability to add fences to
obj->resv for both flips and execbuf (and we naturally wait on the fence
before CPU access), we can move the clflush operation to a workqueue and
signal a fence for completion, thereby doing the work asynchronously and
not blocking the driver or its clients.
v2: Introduce i915_gem_clflush.h and use a new name, split out some
extras into separate patches.
Suggested-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-5-chris@chris-wilson.co.uk
So far the sync_hw hook wasn't called for power wells not belonging to
any power domain, that is the GEN9 PW1 and MISC_IO power wells. This
wasn't a problem so far since the goal of the sync_hw hook - to clear
the corresponding BIOS request bit - was guaranteed by clearing the
whole BIOS request register elsewhere. This will change with the next
patch, so fix up the inconsistency.
While at it clean up the power well iterator helpers and move them to
the rest of iterators.
v2:
- Clean up the power well iterator helpers. (Ander)
- Move the helpers to i915_drv.h.
Cc: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Cc: David Weinehall <david.weinehall@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1487345986-26511-3-git-send-email-imre.deak@intel.com
We flush the entire page every time we update a few bytes, making the
update of a page table many, many times slower than is required. If we
create a WC map of the page for our updates, we can avoid the clflush
but incur additional cost for creating the pagetable. We amoritize that
cost by reusing page vmappings, and only changing the page protection in
batches.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Once upon a time before we had automated GPU state capture upon hangs,
we had intel_gpu_dump. Now we come almost full circle and reinstate that
view of the current GPU queues and registers by using the error capture
facility to snapshot the GPU state when debugfs/.../i915_gpu_info is
opened - which should provided useful debugging to both the error
capture routines (without having to cause a hang and avoid the error
state being eaten by igt) and generally.
v2: Rename drm_i915_error_state to i915_gpu_state to alleviate some name
collisions between the error state dump and inspecting the gpu state.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170214164611.11381-1-chris@chris-wilson.co.uk
It is possible whilst allocating the page-directory tree for a ppgtt
bind that the shrinker may run and reap unused parts of the tree. If the
shrinker happens to remove a chunk of the tree that the
allocate_va_range has already processed, we may then try to insert into
the dangling tree. This test uses the fault-injection framework to force
the shrinker to be invoked before we allocate new pages, i.e. new chunks
of the PD tree.
References: https://bugs.freedesktop.org/show_bug.cgi?id=99295
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
This adds a file in i915's debugfs directory that allows userspace to
manually control HPD storm detection. This is mainly for hotplugging
tests, where we might want to test HPD storm functionality or disable
storm detection to speed up hotplugging tests without breaking anything.
Changes since v1:
- Make HPD storm interval configurable
- Misc code cleanup
Signed-off-by: Lyude <lyude@redhat.com>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Tomeu Vizoso <tomeu@tomeuvizoso.net>
There are currently 30 power domains, which puts us pretty close to the
limit with 32 bit masks. Prepare for the future and increase the limit
to 64 bit.
v2: Rebase
v3: s/unsigned long long/u64/ (Joonas)
Allow the 64th bit of the mask to be used. (Joonas)
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170209093121.24410-1-ander.conselvan.de.oliveira@intel.com
Currently we do a reset prepare/finish around the call to reset the GPU,
but it looks like we need a later stage after the hw has been
reinitialised to allow GEM to restart itself. Start by splitting the 2
GEM phases into 3:
prepare - before the reset, check if GEM recovered, then stop GEM
reset - after the reset, update GEM bookkeeping
finish - after the re-initialisation following the reset, restart GEM
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208143033.11651-2-chris@chris-wilson.co.uk
With the cdclk state, all the .modeset_commit_cdclk() hooks are
now pointless wrappers. Let's replace them with just a .set_cdclk()
function pointer. However let's wrap that in a small helper that
does the state comparison and prints a unified debug message across
all platforms. We didn't even have the debug print on all platforms
previously. This reduces the clutter in intel_atomic_commit_tail() a
little bit.
v2: Wrap .set_cdclk() in intel_set_cdclk()
v3: Add kernel-docs
v4: Deal with IS_GEN9_BC()
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170126195201.32638-1-ville.syrjala@linux.intel.com
The current dev_cdclk vs. cdclk vs. atomic_cdclk_freq is quite a mess.
So here I'm introducing the "actual" and "logical" naming for our
cdclk state. "actual" is what we'll bash into the hardware and "logical"
is what everyone should use for state computaion/checking and whatnot.
We'll track both using the intel_cdclk_state as both will need other
differing parameters than just the actual cdclk frequency.
While doing that we can at the same time unify the appearance of the
.modeset_calc_cdclk() implementations a little bit.
v2: Commit dev_priv->cdclk.actual since that already has the
new state by the time .modeset_commit_cdclk() is called.
v3: s/locical/logical/ and improve the docs a bit
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170120182205.8141-9-ville.syrjala@linux.intel.com
Introduce intel_cdclk state which for now will track the cdclk
frequency, the vco frequency and the reference frequency (not sure we
want the last one, but I put it there anyway). We'll also make the
.get_cdclk() function fill out this state structure rather than
just returning the current cdclk frequency.
One immediate benefit is that calling .get_cdclk() will no longer
clobber state stored under dev_priv unless ex[plicitly told to do
so. Previously it clobbered the vco and reference clocks stored
there on some platforms.
We'll expand the use of this structure to actually precomputing the
state and whatnot later.
v2: Constify intel_cdclk_state_compare()
v3: Document intel_cdclk_state_compare()
v4: Deal with i945gm_get_cdclk()
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170207183345.19763-1-ville.syrjala@linux.intel.com
Rename the .get_display_clock_speed() hook to .get_cdclk().
.get_cdclk() is more specific (which clock) and it's much
shorter.
v2: Deal with IS_GEN9_BC()
v3: Deal with i945gm_get_display_clock_speed()
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170207183146.19420-1-ville.syrjala@linux.intel.com
They include useful material such as what mode the VM address space is
running in, what submission mode, extra quirks, etc.
v2: Undef the right macro, use type specific pretty printers
v3: Use strcmp(TYPENAME) rather than creating per-type pretty printers
v4: Use __always_inline to force GCC to eliminate the calls to strcmp and
generate the right call to seq_printf for each parameter.
v5: With the strcmp elimination, we can now use BUILD_BUG to ensure
there are no unhandled types, also use __builtin_strcmp to make it look
even more magic.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170206213608.31328-3-chris@chris-wilson.co.uk
The LPE audio configuration depends on the pipe, thus we need to pass
the currently used pipe. It's now embedded in struct
intel_hdmi_lpe_audio_eld as well as port id.
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
If DisplayPort is detected, pass flag and link rate to audio driver
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Chris Wilson wants the new fence tracepoint added in
commit 8c96c67801
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Jan 24 11:57:58 2017 +0000
dma/fence: Export enable-signaling tracepoint for emission by drivers
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
The conversion of stolen to use phys_addr_t (from essentially u32)
sparked an interesting discussion. We treat stolen memory as only
accessible from the GPU (the DMA device) - an attempt to use it from the
CPU will generate a MCE on gen6 onwards, although it is in theory a
physical address that can be dereferenced from the CPU as demonstrated
by earlier generations. As such, using phys_addr_t has the wrong
connotations and as we pass the address into the DMA device via
dma_addr_t (through the scatterlists used to program the GTT entries),
we should treat it as dma_addr_t throughout.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170127165531.28135-1-chris@chris-wilson.co.uk
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Include extra information such as the user_handle and hw_id so that
userspace can identify which of their contexts hung, useful if they are
performing self-diagnositics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170129092433.10483-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
- cleanups&fixes for dw-hdmi bride driver (Laurent)
- updates for adv bridge driver (John Stultz) for nexus
- drm_crtc_from_index helper rollout (Shawn Guo)
- removing drm_framebuffer_unregister_private from drivers&core
- target_vblank (Andrey Grodzovsky)
- misc tiny stuff
* tag 'drm-misc-next-2017-01-23' of git://anongit.freedesktop.org/git/drm-misc: (49 commits)
drm: qxl: Open code teardown function for qxl
drm: qxl: Open code probing sequence for qxl
drm/bridge: adv7511: Re-write the i2c address before EDID probing
drm/bridge: adv7511: Reuse __adv7511_power_on/off() when probing EDID
drm/bridge: adv7511: Rework adv7511_power_on/off() so they can be reused internally
drm/bridge: adv7511: Enable HPD interrupts to support hotplug and improve monitor detection
drm/bridge: adv7511: Switch to using drm_kms_helper_hotplug_event()
drm/bridge: adv7511: Use work_struct to defer hotplug handing to out of irq context
drm: vc4: use crtc helper drm_crtc_from_index()
drm: tegra: use crtc helper drm_crtc_from_index()
drm: nouveau: use crtc helper drm_crtc_from_index()
drm: mediatek: use crtc helper drm_crtc_from_index()
drm: kirin: use crtc helper drm_crtc_from_index()
drm: exynos: use crtc helper drm_crtc_from_index()
dt-bindings: display: dw-hdmi: Clean up DT bindings documentation
drm: bridge: dw-hdmi: Assert SVSRET before resetting the PHY
drm: bridge: dw-hdmi: Fix the name of the PHY reset macros
drm: bridge: dw-hdmi: Define and use macros for PHY register addresses
drm: bridge: dw-hdmi: Detect PHY type at runtime
drm: bridge: dw-hdmi: Handle overflow workaround based on device version
...
The write to the punit may fail, so propagate the error code back to its
callers. Of particular interest are the RPS writes, so add appropriate
user error codes and logging.
v2: Add DEBUG for failed frequency changes during RPS.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170126101919.13211-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Notifiations like mode change, hot plug and edid to
the audio driver are added. This is inturn used by the
audio driver for its functionality.
A new interface file capturing the notifications needed by the
audio driver is added
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Jerome Anand <jerome.anand@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Enable support for HDMI LPE audio mode on Baytrail and
Cherrytrail when HDaudio controller is not detected
Setup minimum required resources during i915_driver_load:
1. Create a platform device to share MMIO/IRQ resources
2. Make the platform device child of i915 device for runtime PM.
3. Create IRQ chip to forward HDMI LPE audio irqs.
HDMI LPE audio driver (a standalone sound driver) probes the
LPE audio device and creates a new sound card.
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Jerome Anand <jerome.anand@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Along with GLK it was introduced the .is_lp and IS_GEN9_LP.
So, following the same simplification standard we can
put Skylake and Kabylake under the same bucket for most
of the things.
So let's add the IS_GEN9_BC for "Big Core" (non Atom based
platforms).
The i915_drv.c was let out of this patch on purpose
because that is really a decision per platform, just like
other cases where IS_KABYLAKE is different from IS_SKYLAKE.
v2: fix conflict with IS_LP and 3 new cases for this
big core bucket:
- intel_ddi.c: intel_ddi_get_link_dpll
- intel_fbc.c: find_compression_threshold
- i915_gem_gtt.c: gtt_write_workarounds
Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1485196357-30599-2-git-send-email-rodrigo.vivi@intel.com
In the next patch, we will use the irq_posted technique for another
engine interrupt, rather than use two members for the atomic updates, we
can use two bits of one instead. First, we need to update the
breadcrumbs to use the new common engine->irq_posted.
v2: Use set_bit() rather than __set_bit() to ensure atomicity with
respect to other bits in the mask
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170124151805.26146-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
The GPU may be in an unknown state following resume and module load. The
previous occupant may have left contexts loaded, or other dangerous
state, which can cause an immediate GPU hang for us. The only save
course of action is to reset the GPU prior to using it - similarly to
how we reset the GPU prior to unload (before a second user may be
affected by our leftover state).
We need to reset the GPU very early in our load/resume sequence so that
any stale HW pointers are revoked prior to any resource allocations we
make (that may conflict).
A reset should only be a couple of milliseconds on a slow device, a cost
we should easily be able to absorb into our initialisation times.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170124110135.6418-2-chris@chris-wilson.co.uk
In order to reset the GPU early on in the module load sequence, we need
to allocate the basic engine structs (to populate the mmio offsets etc).
Currently, the engine initialisation allocates both the base struct and
also allocate auxiliary objects, which depend upon state setup quite
late in the load sequence. We split off the allocation callback for
later and allow ourselves to allocate the engine structs themselves
early.
v2: Different paint for the unwind following error.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170124110135.6418-1-chris@chris-wilson.co.uk
Whilst writing testcases to exercise the VMA API, some oddities came to
light, such as i915_gem_obj_lookup_or_create(). Joonas suggested
i915_vma_instance() as a neat replacement, so rename them, move them to
i915_vma.c and add some kerneldoc as a sugary bonus.
s/i915_gem_obj_to_vma/i915_vma_lookup/
s/i915_gem_obj_lookup_or_create_vma/i915_vma_instance/
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170116152131.18089-2-chris@chris-wilson.co.uk
With atomic plane states we are able to track an allocation right from
preparation, during use and through to the final free after being
swapped out for a new plane. We can couple the VMA we pin for the
framebuffer (and its rotation) to this lifetime and avoid all the clumsy
lookups in between.
v2: Remove residual vma on plane cleanup (Chris)
v3: Add a description for the vma destruction in
intel_plane_destroy_state (Maarten)
References: https://bugs.freedesktop.org/show_bug.cgi?id=98829
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170116152131.18089-1-chris@chris-wilson.co.uk
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The HuC loading process is similar to GuC. The intel_uc_fw_fetch()
is used for both cases.
HuC loading needs to be before GuC loading. The WOPCM setting must
be done early before loading any of them.
v2: rebased on-top of drm-intel-nightly.
removed if(HAS_GUC()) before the guc call. (D.Gordon)
update huc_version number of format.
v3: rebased to drm-intel-nightly, changed the file name format to
match the one in the huc package.
Changed dev->dev_private to to_i915()
v4: moved function back to where it was.
change wait_for_atomic to wait_for.
v5: rebased. Changed the year in the copyright message to reflect
the right year.Correct the comments,remove the unwanted WARN message,
replace drm_gem_object_unreference() with i915_gem_object_put().Make the
prototypes in intel_huc.h non-extern.
v6: rebased. Update the file construction done by HuC. It is similar to
GuC.Adopted the approach used in-
https://patchwork.freedesktop.org/patch/104355/ <Tvrtko Ursulin>
v7: Change dev to dev_priv in macro definition.
Corrected comments.
v8: rebased on top of drm-tip. Updated functions intel_huc_load(),
intel_huc_init() and intel_uc_fw_fetch() to accept dev_priv instead of
dev. Moved contents of intel_huc.h to intel_uc.h.
v9: change SKL_FW_ to SKL_HUC_FW_. Add intel_ prefix to guc_wopcm_size().
Remove unwanted checks in intel_uc.h. Rename huc_fw in struct intel_huc to
simply fw to avoid redundency.
v10: rebased. Correct comments. Make intel_huc_fini() accept dev_priv
instead of dev like intel_huc_init() and intel_huc_load().Move definition
to i915_guc_reg.h from intel_uc.h. Clean DMA_CTRL bits after HuC DMA
transfer in huc_ucode_xfer() instead of guc_ucode_xfer(). Add suitable
WARNs to give extra info.
v11: rebased. Add proper bias for HuC and make sure there are
asserts on failure by using guc_ggtt_offset_vma(). Introduce
intel_huc.c and remove intel_huc_loader.c since it has functions that
do more than just loading.Correct year in copyright.
v12: remove invalidates that are not required anymore.
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Tested-by: Xiang Haihao <haihao.xiang@intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Peter Antoine <peter.antoine@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484755558-1234-1-git-send-email-anusha.srivatsa@intel.com
If we can't recover the GPU after the reset, mark it as wedged to cancel
the outstanding tasks and to prevent new users from trying to use the
broken GPU.
v2: Check the same ring is hung again before declaring the reset broken.
v3: use engine_stalled (Mika)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-6-git-send-email-mika.kuoppala@intel.com
As per edp1.4 spec , alpm is required for psr2 operation as it's
used for all psr2 main link power down management and alpm enable
bit must be set for psr2 operation.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Jim Bride <jim.bride@linux.intel.com>
Signed-off-by: vathsala nagaraju <vathsala.nagaraju@intel.com>
Signed-off-by: Patil Deepti <deepti.patil@intel.com>
Reviewed-by: Jim Bride <jim.bride@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1483356663-32668-6-git-send-email-vathsala.nagaraju@intel.com
The internal object is a collection of struct pages and so is
intrinsically linked to the available physical memory on the machine,
and not an arbitrary type from the uabi. Use phys_addr_t so the link
between size and memory consumption is clear, and then double check that
we don't overflow the maximum object size.
v2: Also assert that size is not zero - a mistake I made a few times
while writing selftests.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170112130431.1844-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The core provides now an ABI to userspace for generation of frame CRCs,
so implement the ->set_crc_source() callback and reuse as much code as
possible with the previous ABI implementation.
When handling the pageflip interrupt, we skip 1 or 2 frames depending on
the HW because they contain wrong values. For the legacy ABI for
generating frame CRCs, this was done in userspace but now that we have a
generic ABI it's better if it's not exposed by the kernel.
v2:
- Leave the legacy implementation in place as the ABI implementation
in the core is incompatible with it.
v3:
- Use the "cooked" vblank counter so we have a whole 32 bits.
- Make sure we don't mess with the state of the legacy CRC capture
ABI implementation.
v4:
- Keep use of get_vblank_counter as in the legacy code, will be
changed in a followup commit.
v5:
- Skip first frame or two as it's known that they contain wrong
data.
- A few fixes suggested by Emil Velikov.
v6:
- Rework programming of the HW registers to preserve previous
behavior.
v7:
- Address whitespace issue.
- Added a comment on why in the implementation of the new ABI we
skip the 1st or 2nd frames.
v9:
- Add stub for intel_crtc_set_crc_source.
v12:
- Rebased.
- Remove stub for intel_crtc_set_crc_source and instead set the
callback to NULL (Jani Nikula).
v15:
- Rebased.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Robert Foss <robert.foss@collabora.com>
irq
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110134305.26326-2-tomeu.vizoso@collabora.com
Continue to clean up drmP.h by moving the cache flushing functions into
it's own header file.
Compile-tested only
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109215649.6860-2-krisman@collabora.co.uk
Rename i915_gem_get_ggtt_size() and i915_gem_get_ggtt_alignment() to
i915_gem_fence_size() and i915_gem_fence_alignment() respectively to
better match usage. Similarly move the pair of functions into
i915_gem_tiling.c next to the fence restrictions.
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-6-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The fence size/alignment is a combination of the vma size plus object
tiling parameters. Those parameters are rarely changed, making the fence
size/alignemnt roughly constant for the lifetime of the VMA. We can
simplify subsequent calculations by precalculating the size/alignment
required for GGTT vma taking fencing into account (with an update if we
do change the tiling or stride).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-4-chris@chris-wilson.co.uk
Ensure the view occupies the full tile row so that reads/writes into the
VMA do not escape (via fenced detiling) into neighbouring objects - we
will pad the object with scratch pages to satisfy the fence. This
applies the lazy-tiling we employed on gen2/3 to gen4+.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-2-chris@chris-wilson.co.uk
Though we know the hw is limited to keeping stolen memory inside the
first 4GiB, it is clearer to the reader that we are handling physical
address if we use phys_addr_t to refer to the base of stolen memory.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152013.24684-2-chris@chris-wilson.co.uk
In order to silence sparse:
../drivers/gpu/drm/i915/i915_gpu_error.c:200:39: warning: Using plain integer as NULL pointer
add a helper to check whether we have sse4.1 and that the desired
alignment is valid for acceleration.
v2: Explain the macros and split the two use cases between
i915_has_memcpy_from_wc() and i915_can_memcpy_from_wc().
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152013.24684-1-chris@chris-wilson.co.uk
Now that we have split out a header file for simple macros (that maybe
we can promote into a core header), move a few macros across from
i915_drv.h
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170105164148.26875-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
In order to defeat some circular dependencies between headers to allow use
of e.g. range_overflows() in a header, move the simple independent macros
into their own header.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170105153023.30575-4-chris@chris-wilson.co.uk
The fence registers are clobbered by a GPU reset. If there is concurrent
user access to a fenced region via a GTT mmaping, the access will not be
fenced during the reset (until we restore the fences afterwards). In order
to prevent invalid access during the reset, before we clobber the fences
first we must invalidate the GTT mmapings. Access to the mmap will then
be forced to fault in the page, and in handling the fault, i915_gem_fault()
will take the struct_mutex and wait upon the reset to complete.
v2: Fix up commentary.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99274
Testcase: igt/gem_mmap_gtt/hang
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170104145110.1486-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Directly merge drm-misc into drm-intel since Dave is on vacation and
we need the various drm-misc patches (fb format rework, drm mm fixes,
selftest framework and others). Also pulled back -rc2 in first to
resync with drm-intel-fixes and make sure I can reuse the exact rerere
solutions from drm-tip for safety, and because I'm lazy.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
In future patches, we require greater flexibility in describing
the number of scalers available on each CRTC. To ease that transition
we move the current assignment to intel_device_info.
Scaler structure initialisation is done if scaler is available on the CRTC.
Gen9 check is not required as on depending upon numbers of scalers we
initialize scalers or return without doing anything in skl_init_scalers.
v3: Changed skl_init_scaler to intel_crtc_init_scalers
v2: Added Chris's comments.
Signed-off-by: Nabendu Maiti <nabendu.bikash.maiti@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1480398794-22741-1-git-send-email-nabendu.bikash.maiti@intel.com
The read of the page pin count and the bind count are unordered,
presenting races in the assert and it firing off incorrectly. Prevent
this by restricting the assert to the vma bind/unbind routines where we
have local cpu ordering between the two.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161231112012.29263-1-chris@chris-wilson.co.uk
GuC will validate the ring offset and fail if it is in the
[0, GUC_WOPCM_TOP) range. The bias is conditionally applied only
if GuC loading is enabled (we can't check for guc submission enabled as
in other cases because HuC loading requires this fix).
Note that the default context is processed before enable_guc_loading is
sanitized, so we might still apply the bias to its ring even if it is
not needed.
v2: compute the value during ctx init and pass it to
intel_ring_pin (Chris), updated commit message
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1482537382-28584-1-git-send-email-daniele.ceraolospurio@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The idle work handler is self-arming - if it detects that it needs to
run again it will queue itself from its work handler. Take greater care
when trying to drain the idle work, and double check that it is flushed.
The free worker has a similar issue where it is armed by an RCU task
which may be running concurrently with us.
This should hopefully help with the sporadic WARN_ON(dev_priv->gt.awake)
from i915_gem_suspend.
v2: Reuse drain_freed_objects.
v3: Don't try to flush the freed objects from the shrinker, as it may be
underneath the struct_mutex already.
v4: do while and comment upon the excess rcu_barrier in drain_freed_objects
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-2-chris@chris-wilson.co.uk
There is at least one APL based system using port A in DP mode
(connecting to an on-board DP->VGA adaptor). Atm we'll configure port A
unconditionally as eDP which is incorrect in this case. Fix this by
relying on the VBT DDI port 'internal port' flag instead on all ports on
DDI platforms. For now chicken out from using VBT for port A before
GEN9.
v2:
- Move the DDI port info lookup to intel_bios_is_port_edp() (David, Jani)
- Use the DDI port info on all DDI platforms starting from port B.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: David Weinehall <david.weinehall@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1)
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1482315444-24750-1-git-send-email-imre.deak@intel.com
commit 848496e590
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date: Wed Jul 13 16:32:03 2016 +0300
drm/i915: Wait up to 3ms for the pcu to ack the cdclk change request on SKL
increased the timeout to match the spec, but we still see a timeout on
at least one SKL. A CDCLK change request following the failed one will
succeed nevertheless.
I could reproduce this problem easily by running kms_pipe_crc_basic in a
loop. In all failure cases _wait_for() was pre-empted for >3ms and so in
the worst case - when the pre-emption happened right after calculating
timeout__ in _wait_for() - we called skl_cdclk_wait_for_pcu_ready() only
once which failed and so _wait_for() timed out. As opposed to this the
spec says to keep retrying the request for at most a 3ms period.
To fix this send the first request explicitly to guarantee that there is
3ms between the first and last request. Though this matches the spec, I
noticed that in rare cases this can still time out if we sent only a few
requests (in the worst case 2) _and_ PCODE is busy for some reason even
after a previous request and a 3ms delay. To work around this retry the
polling with pre-emption disabled to maximize the number of requests.
Also increase the timeout to 10ms to account for interrupts that could
reduce the number of requests. With this change I couldn't trigger
the problem.
v2:
- Use 1ms poll period instead of 10us. (Chris)
v3:
- Poll with pre-emption disabled to increase the number of request
attempts. (Ville, Chris)
- Factor out a helper to poll, it's also needed by the next patch.
v4:
- Pass reply_mask, reply to skl_pcode_request(), instead of assuming the
reply is generic. (Ville)
v5:
- List the request specific timeout values as code comment. (Ville)
v6:
- Try the poll first with preemption enabled.
- Add code comment about first request being queued by PCODE. (Art)
- Add timeout_base_ms argument. (Ville)
v7:
- Clarify code comment about first queued request. (Chris)
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Art Runyan <arthur.j.runyan@intel.com>
Cc: <stable@vger.kernel.org> # v4.2- : 3b2c171 : drm/i915: Wait up to 3ms
Cc: <stable@vger.kernel.org> # v4.2-
Fixes: 5d96d8afcf ("drm/i915/skl: Deinit/init the display at suspend/resume")
Reference: https://bugs.freedesktop.org/show_bug.cgi?id=97929
Testcase: igt/kms_pipe_crc_basic/suspend-read-crc-pipe-B
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1480955258-26311-1-git-send-email-imre.deak@intel.com
(cherry picked from commit a0b8a1fe34)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Valleyview/Baytrail (gen7_lp) and Cherryview/Braswell (gen8_lp)
are both Atom platforms like Broxton/Apollolake and Geminilake.
So let's expand this is_lp back to these platforms and
create the IS_LP(dev_priv) so we can start simplifying a bit
our if/else for platform lists.
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1482096988-400-1-git-send-email-rodrigo.vivi@intel.com
The kref_put_mutex() returns with the mutex held after freeing the
object - so we must remember to drop it...
Fixes: 69df05e11a ("drm/i915: Simplify releasing context reference")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161219101357.28140-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
A few users only take the struct_mutex in order to release a reference
to a context. We can expose a kref_put_mutex() wrapper in order to
simplify these users, and optimise taking of the mutex to the final
unref.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-4-chris@chris-wilson.co.uk
The requests conversion introduced a nasty bug where we could generate a
new request in the middle of constructing a request if we needed to idle
the system in order to evict space for a context. The request to idle
would be executed (and waited upon) before the current one, creating a
minor havoc in the seqno accounting, as we will consider the current
request to already be completed (prior to deferred seqno assignment) but
ring->last_retired_head would have been updated and still could allow
us to overwrite the current request before execution.
We also employed two different mechanisms to track the active context
until it was switched out. The legacy method allowed for waiting upon an
active context (it could forcibly evict any vma, including context's),
but the execlists method took a step backwards by pinning the vma for
the entire active lifespan of the context (the only way to evict was to
idle the entire GPU, not individual contexts). However, to circumvent
the tricky issue of locking (i.e. we cannot take struct_mutex at the
time of i915_gem_request_submit(), where we would want to move the
previous context onto the active tracker and unpin it), we take the
execlists approach and keep the contexts pinned until retirement.
The benefit of the execlists approach, more important for execlists than
legacy, was the reduction in work in pinning the context for each
request - as the context was kept pinned until idle, it could short
circuit the pinning for all active contexts.
We introduce new engine vfuncs to pin and unpin the context
respectively. The context is pinned at the start of the request, and
only unpinned when the following request is retired (this ensures that
the context is idle and coherent in main memory before we unpin it). We
move the engine->last_context tracking into the retirement itself
(rather than during request submission) in order to allow the submission
to be reordered or unwound without undue difficultly.
And finally an ulterior motive for unifying context handling was to
prepare for mock requests.
v2: Rename to last_retired_context, split out legacy_context tracking
for MI_SET_CONTEXT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk
In a number places we hand-roll the overflow sanity check for ranges, so
roll that into single macro, conceived by Chris, along with its typed
variant.
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161213203222.32564-3-matthew.auld@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Currently the backlight controller is taken as 0. It needs to derive
value from the VBT. Adding the necessary changes.
v2 by Jani:
- drop obsolete comments, drop redundant initialization (Bob)
- merge debug logging into one
Signed-off-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Vidya Srinivas <vidya.srinivas@intel.com>
Reviewed-by: Bob Paauwe <bob.j.paauwe@intel.com>
Tested-by: Bob Paauwe <bob.j.paauwe@intel.com>
Tested-by: Mika Kahola <mika.kahola@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1481189178-426-1-git-send-email-jani.nikula@intel.com
This adds a 'Perf' section to i915.rst with the following sub sections:
- Overview
- Comparison with Core Perf
- i915 Driver Entry Points
- i915 Perf Stream
- i915 Perf Observation Architecture Stream
- All i915 Perf Internals
v2:
section headers in i915.rst (Daniel Vetter)
missing symbol docs + other fixups (Matthew Auld)
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161207214033.3581-1-robert@sixbynine.org
commit 848496e590
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date: Wed Jul 13 16:32:03 2016 +0300
drm/i915: Wait up to 3ms for the pcu to ack the cdclk change request on SKL
increased the timeout to match the spec, but we still see a timeout on
at least one SKL. A CDCLK change request following the failed one will
succeed nevertheless.
I could reproduce this problem easily by running kms_pipe_crc_basic in a
loop. In all failure cases _wait_for() was pre-empted for >3ms and so in
the worst case - when the pre-emption happened right after calculating
timeout__ in _wait_for() - we called skl_cdclk_wait_for_pcu_ready() only
once which failed and so _wait_for() timed out. As opposed to this the
spec says to keep retrying the request for at most a 3ms period.
To fix this send the first request explicitly to guarantee that there is
3ms between the first and last request. Though this matches the spec, I
noticed that in rare cases this can still time out if we sent only a few
requests (in the worst case 2) _and_ PCODE is busy for some reason even
after a previous request and a 3ms delay. To work around this retry the
polling with pre-emption disabled to maximize the number of requests.
Also increase the timeout to 10ms to account for interrupts that could
reduce the number of requests. With this change I couldn't trigger
the problem.
v2:
- Use 1ms poll period instead of 10us. (Chris)
v3:
- Poll with pre-emption disabled to increase the number of request
attempts. (Ville, Chris)
- Factor out a helper to poll, it's also needed by the next patch.
v4:
- Pass reply_mask, reply to skl_pcode_request(), instead of assuming the
reply is generic. (Ville)
v5:
- List the request specific timeout values as code comment. (Ville)
v6:
- Try the poll first with preemption enabled.
- Add code comment about first request being queued by PCODE. (Art)
- Add timeout_base_ms argument. (Ville)
v7:
- Clarify code comment about first queued request. (Chris)
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Art Runyan <arthur.j.runyan@intel.com>
Cc: <stable@vger.kernel.org> # v4.2- : 3b2c171 : drm/i915: Wait up to 3ms
Cc: <stable@vger.kernel.org> # v4.2-
Fixes: 5d96d8afcf ("drm/i915/skl: Deinit/init the display at suspend/resume")
Reference: https://bugs.freedesktop.org/show_bug.cgi?id=97929
Testcase: igt/kms_pipe_crc_basic/suspend-read-crc-pipe-B
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1480955258-26311-1-git-send-email-imre.deak@intel.com
Pineview deserves to use its own platform enum (which was already added,
unused, previously). IS_G33() no longer matches Pineview, and gets
replaced by IS_G33() || IS_PINEVIEW() or equivalent. Pineview is no
longer an outlier among platform definitions.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1481143689-19672-1-git-send-email-jani.nikula@intel.com
This patch changes Watermak calculation to fixed point calculation.
Problem with current calculation is during plane_blocks_per_line
calculation we divide intermediate blocks with min_scanlines and
takes floor of the result because of integer operation.
hence we end-up assigning less blocks than required. Which leads to
flickers.
Changes since V1:
- Add fixed point data type as per Paulo's review
Changes since V2:
- use fixed_point instead of fp_16_16
Changes since V3:
- rebase
Changes since V4 (from Paulo):
- My original renaming suggestion was misunderstood, so implement it
- Simplify fixed_16_16_to_u32 implementation
- Fix indentation
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Mahesh Kumar <mahesh1.kumar@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161201154940.24446-6-mahesh1.kumar@intel.com
Display Workarounds #1135
If IPC is enabled in BXT, display underruns are observed.
WA: The Line Time programmed in the WM_LINETIME register should be
half of the actual calculated Line Time.
Programmed Line Time = 1/2*Calculated Line Time
Changes since V1:
- Add Workaround number in commit & code
Changes since V2 (from Paulo):
- Bikeshed white space and make the WA tag look like the others
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Mahesh Kumar <mahesh1.kumar@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161201154940.24446-3-mahesh1.kumar@intel.com
Each DSPARB register can house bits for two separate pipes, hence
we must protect the registers during reprogramming so that parallel
FIFO reconfigurations happening simultaneosly on multiple pipes won't
corrupt each others values.
We'll use a new spinlock for this instead of the wm_mutex since we'll
have to move the DSPARB programming to happen from the vblank evade
critical section, and we can't use mutexes in there.
v2: Document why we use a spinlock instead of a mutex (Maarten)
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1480947208-18468-1-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
The platform flags in device info are (mostly) mutually
exclusive. Replace the flags with an enum. Add the platform enum also
for platforms that previously didn't have a flag, and give them codename
logging in dmesg.
Pineview remains an exception, the platform being G33 for that.
v2: Sort enum by gen and date
v3: rebase on geminilake enabling
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1480596595-3278-1-git-send-email-jani.nikula@intel.com
Instead of being hidden in sanitize_enable_ppgtt.
It also seems to be the place to do so nowadays.
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Soft-pinning depends upon being able to check for availabilty of an
interval and evict overlapping object from a drm_mm range manager very
quickly. Currently it uses a linear list, and so performance is dire and
not suitable as a general replacement. Worse, the current code will oops
if it tries to evict an active buffer.
It also helps if the routine reports the correct error codes as expected
by its callers and emits a tracepoint upon use.
For posterity since the wrong patch was pushed (i.e. that missed these
key points and had known bugs), this is the changelog that should have
been on commit 506a8e87d8 ("drm/i915: Add soft-pinning API for
execbuffer"):
Userspace can pass in an offset that it presumes the object is located
at. The kernel will then do its utmost to fit the object into that
location. The assumption is that userspace is handling its own object
locations (for example along with full-ppgtt) and that the kernel will
rarely have to make space for the user's requests.
This extends the DRM_IOCTL_I915_GEM_EXECBUFFER2 to do the following:
* if the user supplies a virtual address via the execobject->offset
*and* sets the EXEC_OBJECT_PINNED flag in execobject->flags, then
that object is placed at that offset in the address space selected
by the context specifier in execbuffer.
* the location must be aligned to the GTT page size, 4096 bytes
* as the object is placed exactly as specified, it may be used by this
execbuffer call without relocations pointing to it
It may fail to do so if:
* EINVAL is returned if the object does not have a 4096 byte aligned
address
* the object conflicts with another pinned object (either pinned by
hardware in that address space, e.g. scanouts in the aliasing ppgtt)
or within the same batch.
EBUSY is returned if the location is pinned by hardware
EINVAL is returned if the location is already in use by the batch
* EINVAL is returned if the object conflicts with its own alignment (as meets
the hardware requirements) or if the placement of the object does not fit
within the address space
All other execbuffer errors apply.
Presence of this execbuf extension may be queried by passing
I915_PARAM_HAS_EXEC_SOFTPIN to DRM_IOCTL_I915_GETPARAM and checking for
a reported value of 1 (or greater).
v2: Combine the hole/adjusted-hole ENOSPC checks
v3: More color, more splitting, more blurb.
Fixes: 506a8e87d8 ("drm/i915: Add soft-pinning API for execbuffer")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161205142941.21965-2-chris@chris-wilson.co.uk
We need to distinguish between full i915_vma structs and simple
drm_mm_nodes when considering eviction (i.e. we must be careful not to
treat a mere drm_mm_node as a much larger i915_vma causing memory
corruption, if we are lucky). To do this, color these not-a-vma with -1
(I915_COLOR_UNEVICTABLE).
v2...v200: New name for -1.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161205142941.21965-1-chris@chris-wilson.co.uk
Before we attempt to turn any planes on or off we must first exit
csxr. That's due to cxsr effectively making the plane enable bits
read-only. Currently we achieve that with a vblank wait right after
toggling the cxsr enable bit. We do the vblank wait even if cxsr was
already off, which seems wasteful, so let's try to only do it when
absolutely necessary.
We could start tracking the cxsr state fully somewhere, but for now
it seems easiest to just have intel_set_memory_cxsr() return the
previous cxsr state.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1480354637-14209-11-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Store the vlv/chv watermark values in straight up arrays indexed by
enum plane_id. Avoids a lot of useless checks for the plane type when
we don't have to think which structure member we need to access.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1480354637-14209-7-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
dev_priv is more appropriate since it is used much more in these.
v2: Commit message and keep the local pdev variable. (Joonas Lahtinen)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Since it does not need dev at all.
Also change the stored pointer in struct i915_error_state_file_priv
to i915.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Simplify the code by passing the right argument in.
v2: Commit message. (Joonas Lahtinen)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
They are only used in i915_drv.c so a forward declaration is enough.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Simplifies the code to pass the right parameter in.
v2: Commit message. (Joonas Lahtinen)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Like GEM init, GUC init, MOCS init and context creation.
Enables them to lose dev_priv locals.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Makes all GEM object constructors consistent.
v2: Fix compilation in GVT code.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1)
Where it is more appropriate and also to be consistent with
the direction of the driver.
v2: Leave out object alloc/free inlining. (Joonas Lahtinen)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Broxton and Geminilake are both gen9lp platforms. To avoid adding
IS_GEMINILAKE() checks everywhere alongside the IS_BROXTON() ones, add a
IS_GEN9_LP() macro.
v2: Rename macro parameter to dev_priv. (Joonas)
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Geminilake is an Intel® Processor containing Intel® HD Graphics
following Broxton.
Let's start by adding the platform definition. PCI IDs and plaform
specific code will follow.
v2: Rebase (don't allow dev to be used with the new macro).
v3: Update ddb size. (Matt)
Rebase on s/preliminary_hw/alpha/
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1479133526-32389-1-git-send-email-ander.conselvan.de.oliveira@intel.com
GuC is not the only one micro controller we have.
There are also HuC and DMC.
Making the file more general will help with code organization.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1480096777-12573-2-git-send-email-arkadiusz.hiler@intel.com
The check in __intel_uncore_early_sanitize() to disable decoupled mmio
would disable it for every platform that is not broxton. While that's
not a problem now since only broxton supports that, simply setting
.has_decoupled_mmio in a new platform's device info wouldn't suffice. So
avoid future confusion and change the workaround to only change the
value of has_decoupled_mmio for broxton.
v2: git add compile fix. (Ander)
Cc: Praveen Paneri <praveen.paneri@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1479993807-29353-1-git-send-email-ander.conselvan.de.oliveira@intel.com
Pass dev_priv to intel_setup_outputs() and functions called by it, since
those are all intel i915 specific functions. Also, in the majority of
the functions dev_priv is used more often than dev. In the rare cases
where there are a few calls back into drm core, a local dev variable was
added.
v2: Don't convert dev to &dev_priv->drm in intel_dsi_init. (Ville)
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1479910904-11005-1-git-send-email-ander.conselvan.de.oliveira@intel.com
As i915.enable_cmd_parser is an unsafe option, make it read-only at
runtime. Now that it is constant, we can use the value determined during
initialisation as to whether we need the cmdparser at execbuffer time.
v2: Remove the inline for its single user, it is clear enough (and
shorter) without!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161124125851.6615-1-chris@chris-wilson.co.uk
No sense in keeping the cmd_descriptor and cmd_table structs in
i915_drv.h, now that they are no longer referenced externally.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1479942147-9837-1-git-send-email-matthew.auld@intel.com
A modeset on one pipe can update dev_priv->atomic_cdclk_freq without
actually touching the hardware, in which case we won't force a modeset
on all the pipes, and thus won't lock any of the other pipes either.
That means a parallel plane update on another pipe could be looking at
a stale dev_priv->atomic_cdcdlk_freq and thus fail to notice when the
plane configuration is invalid, or potentially reject a valid update.
To overcome this we must protect writes to atomic_cdclk_freq with
all the crtc locks, and thus for reads any single crtc lock will
be sufficient protection.
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1479141311-11904-3-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Add a mask of which planes are available for each pipe. This doesn't
quite work for old platforms with dynamic plane<->pipe assignment, but
as we don't support that sort of stuff (yet) we can get away with it.
The main use I have for this is the for_each_plane_id_on_crtc() macro
for iterating over all possible planes on the crtc. I suppose we could
not add the mask, and instead iterate by comparing intel_plane->pipe
but then we'd need a local intel_plane variable which is just
unnecessary clutter in some cases. But I'm not hung up on this, so if
people prefer the other option I could be convinced to use it.
v2: Use BIT() in the iterator macro too (Paulo)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1479830524-7882-3-git-send-email-ville.syrjala@linux.intel.com
As I told people in [1] we really should not be confusing enum plane
as a per-pipe plane identifier. Looks like that happened nonetheless, so
let's fix it up by splitting the two into two enums.
We'll also want something we just directly pass to various register
offset macros and whatnot on SKL+. So let's make this new thing work for that.
Currently we pass intel_plane->plane for the "sprites" and just a
hardcoded zero for the "primary" planes. We want to get rid of that
hardocoding so that we can share the same code for all planes (apart
from the legacy cursor of course).
[1] https://lists.freedesktop.org/archives/intel-gfx/2015-September/076082.html
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1479830524-7882-2-git-send-email-ville.syrjala@linux.intel.com
Consistent with the kernel.perf_event_paranoid sysctl option that can
allow non-root users to access system wide cpu metrics, this can
optionally allow non-root users to access system wide OA counter metrics
from Gen graphics hardware.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-9-robert@sixbynine.org
Each metric set is given a sysfs entry like:
/sys/class/drm/card0/metrics/<guid>/id
This allows userspace to enumerate the specific sets that are available
for the current system. The 'id' file contains an unsigned integer that
can be used to open the associated metric set via
DRM_IOCTL_I915_PERF_OPEN. The <guid> is a globally unique ID for a
specific OA unit register configuration that can be reliably used by
userspace as a key to lookup corresponding counter meta data and
normalization equations.
The guid registry is currently maintained as part of gputop along with
the XML metric set descriptions and code generation scripts, ref:
https://github.com/rib/gputop
> gputop-data/guids.xml
> scripts/update-guids.py
> gputop-data/oa-*.xml
> scripts/i915-perf-kernelgen.py
$ make -C gputop-data -f Makefile.xml SYSFS=1 WHITELIST=RenderBasic
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-8-robert@sixbynine.org
Gen graphics hardware can be set up to periodically write snapshots of
performance counters into a circular buffer via its Observation
Architecture and this patch exposes that capability to userspace via the
i915 perf interface.
v2:
Make sure to initialize ->specific_ctx_id when opening, without
relying on _pin_notify hook, in case ctx already pinned.
v3:
Revert back to pinning ctx upfront when opening stream, removing
need to hook in to pinning and to update OACONTROL on the fly.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-7-robert@sixbynine.org
Adds a static OA unit, MUX + B Counter configuration for basic render
metrics on Haswell. This is auto generated from an XML
description of metric sets, currently maintained in gputop, ref:
https://github.com/rib/gputop
> gputop-data/oa-*.xml
> scripts/i915-perf-kernelgen.py
$ make -C gputop-data -f Makefile.xml SYSFS=0 WHITELIST=RenderBasic
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-6-robert@sixbynine.org
Adds base i915 perf infrastructure for Gen performance metrics.
This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64
properties to configure a stream of metrics and returns a new fd usable
with standard VFS system calls including read() to read typed and sized
records; ioctl() to enable or disable capture and poll() to wait for
data.
A stream is opened something like:
uint64_t properties[] = {
/* Single context sampling */
DRM_I915_PERF_PROP_CTX_HANDLE, ctx_handle,
/* Include OA reports in samples */
DRM_I915_PERF_PROP_SAMPLE_OA, true,
/* OA unit configuration */
DRM_I915_PERF_PROP_OA_METRICS_SET, metrics_set_id,
DRM_I915_PERF_PROP_OA_FORMAT, report_format,
DRM_I915_PERF_PROP_OA_EXPONENT, period_exponent,
};
struct drm_i915_perf_open_param parm = {
.flags = I915_PERF_FLAG_FD_CLOEXEC |
I915_PERF_FLAG_FD_NONBLOCK |
I915_PERF_FLAG_DISABLED,
.properties_ptr = (uint64_t)properties,
.num_properties = sizeof(properties) / 16,
};
int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, ¶m);
Records read all start with a common { type, size } header with
DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records
contain an extensible number of fields and it's the
DRM_I915_PERF_PROP_SAMPLE_xyz properties given when opening that
determine what's included in every sample.
No specific streams are supported yet so any attempt to open a stream
will return an error.
v2:
use i915_gem_context_get() - Chris Wilson
v3:
update read() interface to avoid passing state struct - Chris Wilson
fix some rebase fallout, with i915-perf init/deinit
v4:
s/DRM_IORW/DRM_IOW/ - Emil Velikov
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-2-robert@sixbynine.org
If we have a bad client submitting unfavourably across different
contexts, creating new ones, the per context scoring of badness
doesn't remove the root cause, the offending client.
To counter, keep track of per client context bans. Deny access if
client is responsible for more than 3 context bans in
it's lifetime.
v2: move ban check to context create ioctl (Chris)
v3: add commentary about hangs needed to reach client ban (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Now when driver has per context scoring of 'hanging badness'
and also subsequent hangs during short windows are allowed,
if there is progress made in between, it does not make sense
to expose a ban timing window as a context parameter anymore.
Let the scoring be the sole indicator for ban policy and substitute
ban period context parameter as a boolean to get/set context
bannable property.
v2: allow non root to opt into being banned (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
As hangcheck score was removed, the active decay of score
was removed also. This removed feature for hangcheck to detect
if the gpu client was accidentally or maliciously causing intermittent
hangs. Reinstate the scoring as a per context property, so that if
one context starts to act unfavourably, ban it.
v2: ban_period_secs as a gate to score check (Chris)
v3: decay in proper spot. scores as tunables (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Hangcheck state accumulation has gained more steps
along the years, like head movement and more recently the
subunit inactivity check. As the subunit sampling is only
done if the previous state check showed inactivity, we
have added more stages (and time) to reach a hang verdict.
Asymmetric engine states led to different actual weight of
'one hangcheck unit' and it was demonstrated in some
hangs that due to difference in stages, simpler engines
were accused falsely of a hang as their scoring was much
more quicker to accumulate above the hang treshold.
To completely decouple the hangcheck guilty score
from the hangcheck period, convert hangcheck score to a
rough period of inactivity measurement. As these are
tracked as jiffies, they are meaningful also across
reset boundaries. This makes finding a guilty engine
more accurate across multi engine activity scenarios,
especially across asymmetric engines.
We lose the ability to detect cross batch malicious attempts
to hinder the progress. Plan is to move this functionality
to be part of context banning which is more natural fit,
later in the series.
v2: use time_before macros (Chris)
reinstate the pardoning of moving engine after hc (Chris)
v3: avoid global state for per engine stall detection (Chris)
v4: take timeline last retirement into account (Chris)
v5: do debug print on pardoning, split out retirement timestamp (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
And a little bit of function prototype changes.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
And a little bit of cascaded function prototype changes.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
And a little bit of cascaded function prototype changes.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Plus a small cascade of function prototype changes.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tvrtko needs
commit b3c11ac267
Author: Eric Engestrom <eric@engestrom.ch>
Date: Sat Nov 12 01:12:56 2016 +0000
drm: move allocation out of drm_get_format_name()
to be able to apply his patches without conflicts.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Decoupled MMIO is an alternative way to access forcewake domain
registers, which requires less cycles for a single read/write and
avoids frequent software forcewake.
This certainly gives advantage over the forcewake as this new
mechanism “decouples” CPU cycles and allow them to complete even
when GT is in a CPD (frequency change) or C6 state.
This can co-exist with forcewake and we will continue to use forcewake
as appropriate. E.g. 64-bit register writes to avoid writing 2 dwords
separately and land into funny situations.
v2:
- Moved platform check out of the function and got rid of duplicate
functions to find out decoupled power domain (Chris)
- Added a check for forcewake already held and skipped decoupled
access (Chris)
- Skipped writing 64 bit registers through decoupled MMIO (Chris)
v3:
- Improved commit message with more info on decoupled mmio (Tvrtko)
- Changed decoupled operation to enum and used u32 instead of
uint_32 data type for register offset (Tvrtko)
- Moved HAS_DECOUPLED_MMIO to device info (Tvrtko)
- Added lookup table for converting fw_engine to pd_engine (Tvrtko)
- Improved __gen9_decoupled_read and __gen9_decoupled_write
routines (Tvrtko)
v4:
- Fixed alignment and variable names (Chris)
- Write GEN9_DECOUPLED_REG0_DW1 register in just one go (Zhe Wang)
v5:
- Changed HAS_DECOUPLED_MMIO() argument name to dev_priv (Tvrtko)
- Sanitize info->had_decoupled_mmio at init (Chris)
Signed-off-by: Zhe Wang <zhe1.wang@intel.com>
Signed-off-by: Praveen Paneri <praveen.paneri@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1479230360-22395-1-git-send-email-praveen.paneri@intel.com
The watermark updates for SKL style watermarks are no longer done
in the plane callbacks, but are now called in a separate watermark
update function that's called during the same vblank evasion,
before the plane updates.
This also gets rid of the global skl_results, which was required for
keeping track of the current atomic commit.
Changes since v1:
- Move line unwrap to correct patch. (Lyude)
- Make sure we don't regress ILK watermarks. (Matt)
- Rephrase commit message. (Matt)
Changes since v2:
- Fix disable watermark check to use the correct way to determine single
step watermark support.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lyude <cpaul@redhat.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1478609742-13603-3-git-send-email-maarten.lankhorst@linux.intel.com
[mlankhorst: Small whitespace fix in skl_initial_wm]
Allow the driver to write watermarks during atomic evasion.
This will make it possible to write the watermarks in a cleaner
way on gen9+.
intel_atomic_state is not used here yet, but will be used when
we program all watermarks as a separate step during evasion.
This also writes linetime all the time, while before it was only
done during plane updates. This looks like this could be a bugfix,
but I'm not sure what it affects.
Changes since v1:
- Add comment about atomic evasion to commit message.
- Unwrap I915_WRITE call. (Lyude)
Changes since v2:
- Rename atomic_evade_watermarks to atomic_update_watermarks. (Ville)
- Add line wraps where appropriate, fix grammar in commit message. (Matt)
Changes since v3:
- Actually fix commit message. (Matt)
- Line wrap calls to watermark update functions. (Matt)
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1478609742-13603-2-git-send-email-maarten.lankhorst@linux.intel.com
Boost the priority of any rendering required to show the next pageflip
as we want to avoid missing the vblank by being delayed by invisible
workload. We prioritise avoiding jank and jitter in the GUI over
starving background tasks.
v2: Descend dma_fence_array when boosting priorities.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-10-chris@chris-wilson.co.uk
In order to support userspace defining different levels of importance to
different contexts, and in particular the preferred order of execution,
store a priority value on each context. By default, the kernel's
context, which is used for idling and other background tasks, is given
minimum priority (i.e. all user contexts will execute before the kernel).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-9-chris@chris-wilson.co.uk
The scheduler needs to know the dependencies of each request for the
lifetime of the request, as it may choose to reschedule the requests at
any time and must ensure the dependency tree is not broken. This is in
additional to using the fence to only allow execution after all
dependencies have been completed.
One option was to extend the fence to support the bidirectional
dependency tracking required by the scheduler. However the mismatch in
lifetimes between the submit fence and the request essentially meant
that we had to build a completely separate struct (and we could not
simply reuse the existing waitqueue in the fence for one half of the
dependency tracking). The extra dependency tracking simply did not mesh
well with the fence, and keeping it separate both keeps the fence
implementation simpler and allows us to extend the dependency tracking
into a priority tree (whilst maintaining support for reordering the
tree).
To avoid the additional allocations and list manipulations, the use of
the priotree is disabled when there are no schedulers to use it.
v2: Create a dedicated slab for i915_dependency.
Rename the lists.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-7-chris@chris-wilson.co.uk
The term "preliminary hardware support" has always caused confusion both
among users and developers. It has always been about preliminary driver
support for new hardware, and not so much about preliminary hardware. Of
course, initially both the software and hardware are in early stages,
but the distinction becomes more clear when the user picks up production
hardware and an older kernel to go with it, with just the early support
we had for the hardware at the time the kernel was released. The user
has to specifically enable the alpha quality *driver* support for the
hardware in that specific kernel version.
Rename preliminary_hw_support to alpha_support to emphasize that the
module parameter, config option, and flag are about software, not about
hardware. Improve the language in help texts and debug logging as well.
This appears to be a good time to do the change, as there are currently
no platforms with preliminary^W alpha support.
Cc: Rob Clark <robdclark@gmail.com>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477909108-18696-1-git-send-email-jani.nikula@intel.com
A small selection of macros which can only accept dev_priv from
now on and a resulting trickle of fixups.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
A small selection of macros which can only accept dev_priv from
now on and a resulting trickle of fixups.
v2: Keep original order. (Ville Syrjala)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
A small selection of macros which can only accept dev_priv from
now on and a resulting trickle of fixups.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
As a side product, had to split two other files;
- i915_gem_fence_reg.h
- i915_gem_object.h (only parts that needed immediate untanglement)
I tried to move code in as big chunks as possible, to make review
easier. i915_vma_compare was moved to a header temporarily.
v2:
- Use i915_gem_fence_reg.{c,h}
v3:
- Rebased
v4:
- Fix building when DEBUG_GEM is enabled by reordering a bit.
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1478861034-30643-1-git-send-email-joonas.lahtinen@linux.intel.com
- better atomic state debugging from Rob
- fence prep from gustavo
- sumits flushed out his backlog of pending dma-buf/fence patches from
various people
- drm_mm leak debugging plus trying to appease Kconfig (Chris)
- a few misc things all over
* tag 'topic/drm-misc-2016-11-10' of git://anongit.freedesktop.org/drm-intel: (35 commits)
drm: Make DRM_DEBUG_MM depend on STACKTRACE_SUPPORT
drm/i915: Restrict DRM_DEBUG_MM automatic selection
drm: Restrict stackdepot usage to builtin drm.ko
drm/msm: module param to dump state on error irq
drm/msm/mdp5: add atomic_print_state support
drm/atomic: add debugfs file to dump out atomic state
drm/atomic: add new drm_debug bit to dump atomic state
drm: add helpers to go from plane state to drm_rect
drm: add helper for printing to log or seq_file
drm: helper macros to print composite types
reservation: revert "wait only with non-zero timeout specified (v3)" v2
drm/ttm: fix ttm_bo_wait
dma-buf/fence: revert "don't wait when specified timeout is zero" (v2)
dma-buf/fence: make timeout handling in fence_default_wait consistent (v2)
drm/amdgpu: add the interface of waiting multiple fences (v4)
dma-buf: return index of the first signaled fence (v2)
MAINTAINERS: update Sync File Framework files
dma-buf/sw_sync: put fence reference from the fence creation
dma-buf/sw_sync: mark sync_timeline_create() static
drm: Add stackdepot include for DRM_DEBUG_MM
...
We always flush the chipset prior to executing with the GPU, so we can
skip the flush during ordinary domain management.
This should help mitigate some of the potential performance regressions,
but likely trivial, from doing the flush unconditionally before execbuf
introduced in commit dcd79934b0 ("drm/i915: Unconditionally flush any
chipset buffers before execbuf")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20161106130001.9509-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Move has_64bit_reloc into dev_priv->info. This will make it visible
in the feature listing debug output.
v2:
- Keep the struct member to keep GCC fragile but happy (Chris)
v3:
- More detailed commit message (Chris)
- Include forgotten CHV and BXT (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1478162386-5018-1-git-send-email-joonas.lahtinen@linux.intel.com
Create new file for hangcheck specific code, intel_hangcheck.c,
and move all related code in it.
v2: s/intel_engine_hangcheck/intel_engine (Chris)
No functional changes.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1478018583-5816-1-git-send-email-mika.kuoppala@intel.com
Commit 1bec9b0bda ("drm/i915/shrinker: Only shmemfs objects
are backed by swap") stopped considering the userptr objects
in shrinker callbacks.
Restore that so idle userptr objects can be discarded in order
to free up memory.
One change further to what was introduced in 1bec9b0bda is
to start considering userptr objects in oom but that should
also be a correct thing to do.
v2: Introduce I915_GEM_OBJECT_IS_SHRINKABLE. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 1bec9b0bda ("drm/i915/shrinker: Only shmemfs objects are backed by swap")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: <stable@vger.kernel.org>
Link: http://patchwork.freedesktop.org/patch/msgid/1478011450-6634-1-git-send-email-tvrtko.ursulin@linux.intel.com
For legacy contexts we employ an optimisation to only flush the context
when binding into the global GTT. This avoids stalling on the GPU when
reloading an active context. Wrap this detail up into a helper and
export it for a potential third user. (Longer term, context pinning
needs to be reworked as the current handling of switch context pins too
late and so risks eviction and corrupting the request. Plans, plans,
plans.)
v2: Expand the comment explaining the optimisation for avoiding the
stall on active contexts.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20161030132820.32163-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
The shrinker may appear to recurse into obj->mm.lock as the shrinker may
be called from a direct reclaim path whilst handling get_pages. We
filter out recursing on the same obj->mm.lock by inspecting
obj->mm.pages, but we do want to take the lock on a second object in
order to reap their pages. lockdep spots the recursion on the same
lockclass and needs annotation to avoid a false positive. To keep the
two paths distinct, create an enum to indicate which subclass of
obj->mm.lock we are using. This removes the false positive and avoids
masking real bugs.
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101121134.27504-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
With full-ppgtt one of the main bottlenecks is the lookup of the VMA
underneath the object. For execbuf there is merit in having a very fast
direct lookup of ctx:handle to the vma using a hashtree, but that still
leaves a large number of other lookups. One way to speed up the lookup
would be to use a rhashtable, but that requires extra allocations and
may exhibit poor worse case behaviour. An alternative is to use an
embedded rbtree, i.e. no extra allocations and deterministic behaviour,
but at the slight cost of O(lgN) lookups (instead of O(1) for
rhashtable). The major of such tree will be very shallow and so not much
slower, and still scales much, much better than the current unsorted
list.
v2: Bump vma_compare() to return a long, as we return the result of
comparing two pointers.
References: https://bugs.freedesktop.org/show_bug.cgi?id=87726
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101115400.15647-1-chris@chris-wilson.co.uk
If we have a tiled object and an unknown CPU swizzle pattern, we pin the
pages to prevent the object from being swapped out (and us corrupting
the contents as we do not know the access pattern and so cannot convert
it to linear and back to tiled on reuse). This requires us to remember
to drop the extra pinning when freeing the object, or else we trigger
warnings about the pin leak. In commit fbbd37b36f ("drm/i915: Move
object release to a freelist + worker"), the object free path was
deferred to a worker, but the unpinning of the quirk, along with marking
the object as reclaimable, was left on the immediate path (so that if
required we could reclaim the pages under memory pressure as early as
possible). However, this split introduced a bug where the pages were no
longer being unpinned if they were marked as unneeded.
[ 231.800401] WARNING: CPU: 1 PID: 90 at drivers/gpu/drm/i915/i915_gem.c:4275 __i915_gem_free_objects+0x326/0x3c0 [i915]
[ 231.800403] WARN_ON(i915_gem_object_has_pinned_pages(obj))
[ 231.800405] Modules linked in:
[ 231.800406] snd_hda_intel i915 snd_hda_codec_generic mei_me snd_hda_codec coretemp snd_hwdep mei lpc_ich snd_hda_core snd_pcm e1000e ptp pps_core [last unloaded: i915]
[ 231.800426] CPU: 1 PID: 90 Comm: kworker/1:4 Tainted: G U 4.9.0-rc2-CI-CI_DRM_1780+ #1
[ 231.800428] Hardware name: LENOVO 7465CTO/7465CTO, BIOS 6DET44WW (2.08 ) 04/22/2009
[ 231.800456] Workqueue: events __i915_gem_free_work [i915]
[ 231.800459] ffffc9000034fc80 ffffffff8142dd65 ffffc9000034fcd0 0000000000000000
[ 231.800465] ffffc9000034fcc0 ffffffff8107e4e6 000010b300000001 0000000000001000
[ 231.800469] ffff88011d3db740 ffff880130ef0000 0000000000000000 ffff880130ef5ea0
[ 231.800474] Call Trace:
[ 231.800479] [<ffffffff8142dd65>] dump_stack+0x67/0x92
[ 231.800484] [<ffffffff8107e4e6>] __warn+0xc6/0xe0
[ 231.800487] [<ffffffff8107e54a>] warn_slowpath_fmt+0x4a/0x50
[ 231.800491] [<ffffffff811d12ac>] ? kmem_cache_free+0x2dc/0x340
[ 231.800520] [<ffffffffa009ef36>] __i915_gem_free_objects+0x326/0x3c0 [i915]
[ 231.800548] [<ffffffffa009effe>] __i915_gem_free_work+0x2e/0x50 [i915]
[ 231.800552] [<ffffffff8109c27c>] process_one_work+0x1ec/0x6b0
[ 231.800555] [<ffffffff8109c1f6>] ? process_one_work+0x166/0x6b0
[ 231.800558] [<ffffffff8109c789>] worker_thread+0x49/0x490
[ 231.800561] [<ffffffff8109c740>] ? process_one_work+0x6b0/0x6b0
[ 231.800563] [<ffffffff8109c740>] ? process_one_work+0x6b0/0x6b0
[ 231.800566] [<ffffffff810a2aab>] kthread+0xeb/0x110
[ 231.800569] [<ffffffff810a29c0>] ? kthread_park+0x60/0x60
[ 231.800573] [<ffffffff818164a7>] ret_from_fork+0x27/0x40
Moving to a separate flag for tracking the quirked pin is overkill for
the bug (since we only have to interchange the two tests in
i915_gem_free_object) but it does reduce a complicated test on all
objects and provide a sanitycheck for uncommon code paths.
Fixes: fbbd37b36f ("drm/i915: Move object release to a freelist + worker")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101100317.11129-2-chris@chris-wilson.co.uk
Due to the plane->index not getting readjusted in drm_plane_cleanup(),
we can't continue initialization of some plane/crtc init fails.
Well, we sort of could I suppose if we left all initialized planes on
the list, but that would expose those planes to userspace as well.
But for crtcs the situation is even worse since we assume that
pipe==crtc index occasionally, so we can't really deal with a partially
initialize set of crtcs.
So seems safest to just abort the entire thing if anything goes wrong.
All the failure paths here are kmalloc()s anyway, so it seems unlikely
we'd get very far if these start failing.
v2: Add (enum plane) case to silence gcc
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477411083-19255-4-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the infrastructure converted over to tracking multiple timelines in
the GEM API whilst preserving the efficiency of using a single execution
timeline internally, we can now assign a separate timeline to every
context with full-ppgtt.
v2: Add a comment to indicate the xfer between timelines upon submission.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-35-chris@chris-wilson.co.uk
A restriction on our global seqno is that they cannot wrap, and that we
cannot use the value 0. This allows us to detect when a request has not
yet been submitted, its global seqno is still 0, and ensures that
hardware semaphores are monotonic as required by older hardware. To
meet these restrictions when we defer the assignment of the global
seqno, we must check that we have an available slot in the global seqno
space during request construction. If that test fails, we wait for all
requests to be completed and reset the hardware back to 0.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-33-chris@chris-wilson.co.uk
This will be used for communicating issues with this context to
userspace, so we want to identify the parent process and the individual
context. Note that the name isn't quite unique, it makes the presumption
of there only being a single device fd per process.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-31-chris@chris-wilson.co.uk
Currently we try to reduce the number of synchronisations (now the
number of requests we need to wait upon) by noting that if we have
earlier waited upon a request, all subsequent requests in the timeline
will be after the wait. This only applies to requests in this timeline,
as other timelines will not be ordered by that waiter.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-30-chris@chris-wilson.co.uk
Though we will have multiple timelines, we still have a single timeline
of execution. This we can use to provide an execution and retirement order
of requests. This keeps tracking execution of requests simple, and vital
for preserving a single waiter (i.e. so that we can order the waiters so
that only the earliest to wakeup need be woken). To accomplish this we
distinguish the seqno used to order requests per-context (external) and
that used internally for execution.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-26-chris@chris-wilson.co.uk
Our timelines are more than just a seqno. They also provide an ordered
list of requests to be executed. Due to the restriction of handling
individual address spaces, we are limited to a timeline per address
space but we use a fence context per engine within.
Our first step to introducing independent timelines per context (i.e. to
allow each context to have a queue of requests to execute that have a
defined set of dependencies on other requests) is to provide a timeline
abstraction for the global execution queue.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-23-chris@chris-wilson.co.uk
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
Having moved the locked phase of freeing an object to a separate worker,
we can now declare to the core that we only need the unlocked variant of
driver->gem_free_object, and can use the simple unreference internally.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-20-chris@chris-wilson.co.uk
We want to hide the latency of releasing objects and their backing
storage from the submission, so we move the actual free to a worker.
This allows us to switch to struct_mutex freeing of the object in the
next patch.
Furthermore, if we know that the object we are dereferencing remains valid
for the duration of our access, we can forgo the usual synchronisation
barriers and atomic reference counting. To ensure this we defer freeing
an object til after an RCU grace period, such that any lookup of the
object within an RCU read critical section will remain valid until
after we exit that critical section. We also employ this delay for
rate-limiting the serialisation on reallocation - we have to slow down
object creation in order to prevent resource starvation (in particular,
files).
v2: Return early in i915_gem_tiling() ioctl to skip over superfluous
work on error.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-19-chris@chris-wilson.co.uk
Break the allocation of the backing storage away from struct_mutex into
a per-object lock. This allows parallel page allocation, provided we can
do so outside of struct_mutex (i.e. set-domain-ioctl, pwrite, GTT
fault), i.e. before execbuf! The increased cost of the atomic counters
are hidden behind i915_vma_pin() for the typical case of execbuf, i.e.
as the object is typically bound between execbufs, the page_pin_count is
static. The cost will be felt around set-domain and pwrite, but offset
by the improvement from reduced struct_mutex contention.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-14-chris@chris-wilson.co.uk
The plan is to move obj->pages out from under the struct_mutex into its
own per-object lock. We need to prune any assumption of the struct_mutex
from the get_pages/put_pages backends, and to make it easier we pass
around the sg_table to operate on rather than indirectly via the obj.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-13-chris@chris-wilson.co.uk
The plan is to make obtaining the backing storage for the object avoid
struct_mutex (i.e. use its own locking). The first step is to update the
API so that normal users only call pin/unpin whilst working on the
backing storage.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-12-chris@chris-wilson.co.uk
A while ago we switched from a contiguous array of pages into an sglist,
for that was both more convenient for mapping to hardware and avoided
the requirement for a vmalloc array of pages on every object. However,
certain GEM API calls (like pwrite, pread as well as performing
relocations) do desire access to individual struct pages. A quick hack
was to introduce a cache of the last access such that finding the
following page was quick - this works so long as the caller desired
sequential access. Walking backwards, or multiple callers, still hits a
slow linear search for each page. One solution is to store each
successful lookup in a radix tree.
v2: Rewrite building the radixtree for clarity, hopefully.
v3: Rearrange execbuf to avoid calling i915_gem_object_get_sg() from
within an atomic section and so relax the allocation context to a simple
GFP_KERNEL and mutex.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-10-chris@chris-wilson.co.uk
Quite a few of our objects used for internal hardware programming do not
benefit from being swappable or from being zero initialised. As such
they do not benefit from using a shmemfs backing storage and since they
are internal and never directly exposed to the user, we do not need to
worry about providing a filp. For these we can use an
drm_i915_gem_object wrapper around a sg_table of plain struct page. They
are not swap backed and not automatically pinned. If they are reaped
by the shrinker, the pages are released and the contents discarded. For
the internal use case, this is fine as for example, ringbuffers are
pinned from being written by a request to be read by the hardware. Once
they are idle, they can be discarded entirely. As such they are a good
match for execlist ringbuffers and a small variety of other internal
objects.
In the first iteration, this is limited to the scratch batch buffers we
use (for command parsing and state initialisation).
v2: Allocate physically contiguous pages, where possible.
v3: Reduce maximum order on subsequent requests following an allocation
failure.
v4: Fix up mismatch between swiotlb segment size and page count (it
counts in 2k units, not 4k pages)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-7-chris@chris-wilson.co.uk
We only need the active reference to keep the object alive after the
handle has been deleted (so as to prevent a synchronous gem_close). Why
then pay the price of a kref on every execbuf when we can insert that
final active ref just in time for the handle deletion?
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-6-chris@chris-wilson.co.uk
Our low-level wait routine has evolved from our generic wait interface
that handled unlocked, RPS boosting, waits with time tracking. If we
push our GEM fence tracking to use reservation_objects (required for
handling multiple timelines), we lose the ability to pass the required
information down to i915_wait_request(). However, if we push the extra
functionality from i915_wait_request() to the individual callsites
(i915_gem_object_wait_rendering and i915_gem_wait_ioctl) that make use
of those extras, we can both simplify our low level wait and prepare for
extending the GEM interface for use of reservation_objects.
v2: Rewrite i915_wait_request() kerneldocs
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-4-chris@chris-wilson.co.uk
We will need to wait on DMA completion (as signaled via struct fence)
before executing our i915_gem_request. Therefore we want to expose a
method for adding the await on the fence itself to the request.
v2: Add a comment detailing a failure to handle a signal-on-any
fence-array.
v3: Pretend that magic numbers don't exist.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-1-chris@chris-wilson.co.uk
This macro's name is a bit misleading; it doesn't actually iterate over
all planes since it omits the cursor plane. Its only uses are in gen9
code which is using it to iterate over the universal planes (which we
treat as primary+sprites); in these cases the legacy cursor registers
are programmed independently if necessary. The macro's iterator value
(0 for primary plane, spritenum+1 for each secondary plane) also isn't
meaningful outside the gen9 context where the hardware considers them to
all be "universal" planes that follow this numbering.
This is just a renaming/clarification patch with no functional change.
However it will make the subsequent patches more clear.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477522291-10874-2-git-send-email-matthew.d.roper@intel.com
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
The port registers related to the phys in broxton map to different
channels and specific phys. Make that mapping explicit.
v2: Pass enum dpio_phy to macros instead of mmio base. (Imre)
v3: Fix typo in macros. (Imre)
v4: Also change variables from u32 to enum dpio_phy. (Imre)
Remove leftovers from previous version. (Imre)
v5: Actually git add the changes.
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476863940-6019-1-git-send-email-ander.conselvan.de.oliveira@intel.com
Comment mentioned use of intel_uncore_forcewake_irq{unlock, lock}
functions which are nonexistent (and never were).
The description was also incomplete and could cause confusion. Updated
comment is more elaborate on usage and caveats.
v2: mention __locked variant of intel_uncore_forcewake_{get,put} instead
of plain ones
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilsono.c.uk>
[Mika: removed two superfluous lines on comment noted by Chris]
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477399682-3133-1-git-send-email-arkadiusz.hiler@intel.com
As well as knowing when the error occurred, it is more interesting to me
to know how long after booting the error occurred, and for good measure
record the time since last hw initialisation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161025121602.1457-1-chris@chris-wilson.co.uk
Added the dump of GuC log buffer to i915 error state, as the contents of
GuC log buffer would also be useful to determine that why the GPU reset
was triggered.
v2:
- For uniformity use existing helper function print_error_obj() to
dump out contents of GuC log buffer, pretty printing is better left
to userspace. (Chris)
- Skip the dumping of GuC log buffer when logging is disabled as it
won't be of any use.
- Rebase.
v3: Rebase.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
There are certain types of interrupts which Host can receive from GuC.
GuC ukernel sends an interrupt to Host for certain events, like for
example retrieve/consume the logs generated by ukernel.
This patch adds support to receive interrupts from GuC but currently
enables & partially handles only the interrupt sent by GuC ukernel.
Future patches will add support for handling other interrupt types.
v2:
- Use common low level routines for PM IER/IIR programming (Chris)
- Rename interrupt functions to gen9_xxx from gen8_xxx (Chris)
- Replace disabling of wake ref asserts with rpm get/put (Chris)
v3:
- Update comments for more clarity. (Tvrtko)
- Remove the masking of GuC interrupt, which was kept masked till the
start of bottom half, its not really needed as there is only a
single instance of work item & wq is ordered. (Tvrtko)
v4:
- Rebase.
- Rename guc_events to pm_guc_events so as to be indicative of the
register/control block it is associated with. (Chris)
- Add handling for back to back log buffer flush interrupts.
v5:
- Move the read & clearing of register, containing Guc2Host message
bits, outside the irq spinlock. (Tvrtko)
v6:
- Move the log buffer flush interrupt related stuff to the following
patch so as to do only generic bits in this patch. (Tvrtko)
- Rebase.
v7:
- Remove the interrupts_enabled check from gen9_guc_irq_handler, want to
process that last interrupt also before disabling the interrupt, sync
against the work queued by irq handler will be done by caller disabling
the interrupt.
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
So far PM IER/IIR/IMR registers were being used only for Turbo related
interrupts. But interrupts coming from GuC also use the same set.
As a precursor to supporting GuC interrupts, added new low level routines
so as to allow sharing the programming of PM IER/IIR/IMR registers between
Turbo & GuC.
Also similar to PM IMR, maintaining a bitmask for PM IER register, to allow
easy sharing of it between Turbo & GuC without involving a rmw operation.
v2:
- For appropriateness & avoid any ambiguity, rename old functions
enable/disable pm_irq to mask/unmask pm_irq and rename new functions
enable/disable pm_interrupts to enable/disable pm_irq. (Tvrtko)
- Use u32 in place of uint32_t. (Tvrtko)
v3:
- Rename the fields pm_irq_mask & pm_ier_mask and do some cleanup. (Chris)
- Rebase.
v4: Fix the inadvertent disabling of User interrupt for VECS ring causing
failure for certain IGTs.
v5: Use dev_priv with HAS_VEBOX macro. (Tvrtko)
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
At the moment, we have dependency on the RPM as a barrier itself in both
i915_gem_release_all_mmaps() and i915_gem_restore_fences().
i915_gem_restore_fences() is also called along !runtime pm paths, but we
can move the markup of lost fences alongside releasing the mmaps into a
common i915_gem_runtime_suspend(). This has the advantage of locating
all the tricky barrier dependencies into one location.
v2: Just mark the fence as invalid (fence->dirty) so that upon waking we
will be sure to clear the fence after use, or restore it to the correct
value before use. This makes sure that if the fence is left intact
across the sleep, we do not leave it pointing to a region of GTT for the
next unsuspecting user.
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Imre Deak <imre.deak@linux.intel.com>
Reviewed-by: Imre Deak <imre.deak@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-5-chris@chris-wilson.co.uk
We only used the RPM sequence checking inside the lowlevel GTT
accessors, when we had to rely on callers taking the wakeref on our
behalf. Now that we take the RPM wakeref inside the GTT management
routines themselves, we can forgo the sanitycheck of the callers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-4-chris@chris-wilson.co.uk
Now that we have reduced the access to the list to either (a) under the
struct_mutex whilst holding the RPM wakeref (so that concurrent writers to
the list are serialised by struct_mutex) and (b) under the atomic
runtime suspend (which cannot run concurrently with any other accessor due
to the atomic nature of the runtime suspend) we can remove the extra
locking around the list itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-3-chris@chris-wilson.co.uk
We want to decouple RPM and struct_mutex, but currently RPM has to walk
the list of bound objects and remove userspace mmapping before we
suspend (otherwise userspace may continue to access the GTT whilst it is
powered down). This currently requires the struct_mutex to walk the
bound_list, but if we move that to a separate list and lock we can take
the first step towards removing the struct_mutex.
v2: Split runtime suspend unmapping vs regular unmapping, to make the
locking (and barriers) clearer. Add the object to the userfault_list
prior to inserting the first PTE, the race between add/revoke depends
upon struct_mutex for regular unmappings and rpm for runtime-suspend.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> #v1
Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-1-chris@chris-wilson.co.uk
gvt-next-fix-2016-10-20
This contains fix for first pull request.
- clean up header mess between i915 core and gvt
- new MAINTAINERS item
- new kernel-doc section
- fix compiling warnings
- gvt gem fix series from Chris
- fix for i915 intel_engine_cs change
- some sparse fixes from Changbin
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
If the kernel is old, more than a few releases old, chances are that the
user is using an old kernel for a good reason, despite there being GPU
hangs. After 180days since driver release stop suggesting that they
should send those reports upstream.
[Since Daniel acked this I expect he will pick up the dim patch to
automatically update the DRIVER_TIMESTAMP everytime we tag a new
release.]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161014134428.29582-1-chris@chris-wilson.co.uk
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
i915 core should only call functions and structures exposed through
intel_gvt.h. Remove internal gvt.h and i915_pvinfo.h.
Change for internal intel_gvt structure as private handler which
not requires to expose gvt internal structure for i915 core.
v2: Fix per Chris's comment
- carefully handle dev_priv->gvt assignment
- add necessary bracket for macro helper
- forward declartion struct intel_gvt
- keep free operation within same file handling alloc
v3: fix use after free and remove intel_gvt.initialized
v4: change to_gvt() to an inline
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Now that we've make skl_wm_levels make a little more sense, we can
remove all of the redundant wm information. Up until now we'd been
storing two copies of all of the skl watermarks: one being the
skl_pipe_wm structs, the other being the global wm struct in
drm_i915_private containing the raw register values. This is confusing
and problematic, since it means we're prone to accidentally letting the
two copies go out of sync. So, get rid of all of the functions
responsible for computing the register values and just use a single
helper, skl_write_wm_level(), to convert and write the new watermarks on
the fly.
Changes since v1:
- Fixup skl_write_wm_level()
- Fixup skl_wm_level_from_reg_val()
- Don't forget to copy *active to intel_crtc->wm.active.skl
Changes since v2:
- Fix usage of wrong cstate
Changes since v3 (by Paulo):
- Rebase
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> (v2)
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Lyude <cpaul@redhat.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476814189-6062-1-git-send-email-paulo.r.zanoni@intel.com
In many places, we try to count pages using a 32 bit integer. That
implies if we are asked to create an object larger than 43bits, we will
subtly crash much later. Catch this on the boundary, and add a warning
to remind ourselves later on our exabyte systems.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161018120251.25043-2-chris@chris-wilson.co.uk
Internally we allow for using more objects than a single process can
allocate, i.e. we allow for a 64bit GPU address space even on a 32bit
system. Using size_t may oveerflow.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161018120251.25043-1-chris@chris-wilson.co.uk
Many GEN9 boards come with on-board lspcon cards.
Fot these boards, VBT configuration should properly point out
if a particular port contains lspcon device, so that driver can
initialize it properly.
This patch adds a utility function, which checks the VBT flag
for lspcon bit, and tells us if a port is configured to have a
lspcon device or not.
V2: Fixed review comments from Ville
- Do not forget PORT_D while checking lspcon for GEN9
V3: Addressed review comments from Rodrigo
- Create a HAS_LSPCON() macro for better use case handling.
- Do not dump warnings for non-gen-9 platforms, it will be noise.
V4: Rebase
V5: Rebase
V6: Pass dev_priv to HAS_LSPCON() macro
Signed-off-by: Shashank Sharma <shashank.sharma@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476455212-27893-4-git-send-email-shashank.sharma@intel.com
Having skl_wm_level contain all of the watermarks for each plane is
annoying since it prevents us from having any sort of object to
represent a single watermark level, something we take advantage of in
the next commit to cut down on all of the copy paste code in here.
Changes since v1:
- Style nitpicks
- Fix accidental usage of i vs. PLANE_CURSOR
- Split out skl_pipe_wm_active_state simplification into separate patch
Signed-off-by: Lyude <cpaul@redhat.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Next part of cleaning up the watermark code for skl. This is easy, since
it seems that we never actually needed to keep track of the linetime in
the skl_wm_values struct anyway.
Signed-off-by: Lyude <cpaul@redhat.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
First part of cleaning up all of the skl watermark code. This moves the
structures for storing the ddb allocations of each pipe into
intel_crtc_state, along with moving the structures for storing the
current ddb allocations active on hardware into intel_crtc.
Changes since v1:
- Don't replace alloc->start = alloc->end = 0;
Signed-off-by: Lyude <cpaul@redhat.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Saves 944 bytes of .rodata strings and 128 bytes of .text.
v2: Add parantheses around dev_priv. (Ville Syrjala)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Saves 864 bytes of .rodata strings and ~100 of .text.
v2: Add parantheses around dev_priv. (Ville Syrjala)
v3: Rebase.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Saves 1392 bytes of .rodata strings.
Also change a few function/macro prototypes in i915_gem_gtt.c
from dev to dev_priv where it made more sense to do so.
v2: Add parantheses around dev_priv. (Ville Syrjala)
v3: Mention function prototype changes. (David Weinehall)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Saves 1016 bytes of .rodata strings and couple hundred of .text.
v2: Add parantheses around dev_priv. (Ville Syrjala)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
This saves 3248 bytes of .rodata strings.
v2: Add parantheses around dev_priv. (Ville Syrjala)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
Mika wanted to know what requests were pending at the time of a hang as
we now track which requests we have submitted to the hardware.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161013101815.26978-1-chris@chris-wilson.co.uk
Our error states are quickly growing, pinning kernel memory with them.
The majority of the space is taken up by the error objects. These
compress well using zlib and without decode are mostly meaningless, so
encoding them does not hinder quickly parsing the error state for
familiarity.
v2: Make the zlib dependency optional
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161012090522.367-6-chris@chris-wilson.co.uk
The error state is purposefully racy as we expect it to be called at any
time and so have avoided any locking whilst capturing the crash dump.
However, with multi-engine GPUs and multiple CPUs, those races can
manifest into OOPSes as we attempt to chase dangling pointers freed on
other CPUs. Under discussion are lots of ways to slow down normal
operation in order to protect the post-mortem error capture, but what it
we take the opposite approach and freeze the machine whilst the error
capture runs (note the GPU may still running, but as long as we don't
process any of the results the driver's bookkeeping will be static).
Note that by of itself, this is not a complete fix. It also depends on
the compiler barriers in list_add/list_del to prevent traversing the
lists into the void. We also depend that we only require state from
carefully controlled sources - i.e. all the state we require for
post-mortem debugging should be reachable from the request itself so
that we only have to worry about retrieving the request carefully. Once
we have the request, we know that all pointers from it are intact.
v2: Avoid drm_clflush_pages() inside stop_machine() as it may use
stop_machine() itself for its wbinvd fallback.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161012090522.367-3-chris@chris-wilson.co.uk
We currently capture the GPU state after we detect a hang. This is vital
for us to both triage and debug hangs in the wild (post-mortem
debugging). However, it comes at the cost of running some potentially
dangerous code (since it has to make very few assumption about the state
of the driver) that is quite resource intensive.
This patch introduces both a method to disable error capture at runtime
(for users who hit bugs at runtime and need a workaround) and to disable
error capture at compiletime (for realtime users who want to minimise
any possible latency, and never require error capture, saving ~30k of
code). The cost is that we now have to be wary of (and test!) a kconfig
flag and a module parameter. The effect of the module parameter is easy
to verify through code inspection and runtime testing, but a kconfig flag
needs regular compile checking.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch
Link: http://patchwork.freedesktop.org/patch/msgid/20161012090522.367-2-chris@chris-wilson.co.uk
In the next patch, I want to conditionally compile i915_gpu_error.c and
that requires moving the functions used by debug out of
i915_gpu_error.c!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161012090522.367-1-chris@chris-wilson.co.uk
Get rid of SEP_SEMICOLON and SEP_BLANK in DEV_INFO_FOR_EACH_FLAG.
Consolidate the debug output so that instead of one huge line with
"cap1,cap2,capN" each capability is split to own line and displayed
as "capN: [yes|no]" to make the dumps more historically informative.
v2:
- Do not break auto-indent by keeping semicolon after macro (Jani)
- Consolidate and use yesno() in all locations (Chris)
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Include the position of the active request in the ring, and display that
alongside the current RING registers (on a GPU hang).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161004201132.21801-6-chris@chris-wilson.co.uk
If we store this in the uncore structure we are on a good way to
show more commonality between the per-platform implementations.
v2: Constify table pointer and correct coding style. (Chris Wilson)
v3: Rebase.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
There are current places in the code, and there will be more in the
future, which iterate the forcewake domains to find out which ones
are currently active.
To save them from doing this iteration, we can cheaply keep a mask
of active domains in dev_priv->uncore.fw_domains_active.
This has no cost in terms of object size, even manages to shrink it
overall by 368 bytes on my config.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: "Paneri, Praveen" <praveen.paneri@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
The plan is to introduce intel_has_sagv() and then use it to discover
which platforms actually support it.
I thought about keeping the functions with their current skl names,
but found two problems: (i) skl_has_sagv() would become a very
confusing name, and (ii) intel_atomic_commit_tail() doesn't seem to be
calling any functions whose name start with a platform name, so the
"intel_" naming scheme seems make more sense than the "firstplatorm_"
naming scheme here.
Cc: stable@vger.kernel.org
Reviewed-by: Lyude <cpaul@redhat.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1474578035-424-2-git-send-email-paulo.r.zanoni@intel.com
Ever since I started working on FBC I was already aware that FBC can
really amplify the FIFO underrun symptoms. On systems where FIFO
underruns were harmless error messages, enabling FBC would cause the
underruns to give black screens.
We recently tried to enable FBC on Haswell and got reports of a system
that would hang after some hours of uptime, and the first bad commit
was the one that enabled FBC. We also observed that this system had
FIFO underrun error messages on its dmesg. Although we don't have any
evidence that fixing the underruns would solve the bug and make FBC
work properly on this machine, IMHO it's better if we minimize the
amount of possible problems by just giving up FBC whenever we detect
an underrun.
v2: New version, different implementation and commit message.
v3: Clarify the fact that we run from an IRQ handler (Chris).
v4: Also add the underrun_detected check at can_choose() to avoid
misleading dmesg messages (DK).
v5: Fix Engrish, use READ_ONCE on the unlocked read (Chris).
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Lyude <cpaul@redhat.com>
Cc: stevenhoneyman@gmail.com <stevenhoneyman@gmail.com>
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1473773937-19758-1-git-send-email-paulo.r.zanoni@intel.com
DP MST provides the capability to send multiple video and audio streams
through a single port. This requires the API's between i915 and audio
drivers to distinguish between multiple audio capable displays that can be
connected to a port. Currently only the port identity is shared in the
APIs. This patch adds support for MST with an additional parameter
'int pipe'. The existing parameter 'port' does not change it's meaning.
pipe =
MST : display pipe that the stream originates from
Non-MST : -1
Affected APIs:
struct i915_audio_component_ops
- int (*sync_audio_rate)(struct device *, int port, int rate);
+ int (*sync_audio_rate)(struct device *, int port, int pipe,
+ int rate);
- int (*get_eld)(struct device *, int port, bool *enabled,
- unsigned char *buf, int max_bytes);
+ int (*get_eld)(struct device *, int port, int pipe,
+ bool *enabled, unsigned char *buf, int max_bytes);
struct i915_audio_component_audio_ops
- void (*pin_eld_notify)(void *audio_ptr, int port);
+ void (*pin_eld_notify)(void *audio_ptr, int port, int pipe);
This patch makes dummy changes in the audio drivers (thanks Libin) for
build to succeed. The audio side drivers will send the right 'pipe' values
for MST in patches that will follow.
v2:
Renamed the new API parameter from 'dev_id' to 'pipe'. (Jim, Ville)
Included Asoc driver API compatibility changes from Jeeja.
Added WARN_ON() for invalid pipe in get_saved_encoder(). (Takashi)
Added comment for av_enc_map[] definition. (Takashi)
v3:
Fixed logic error introduced while renaming 'dev_id' as 'pipe' (Ville)
Renamed get_saved_encoder() to get_saved_enc() to reduce line length
v4:
Rebased.
Parameter check for pipe < -1 values in get_saved_enc() (Ville)
Switched to for_each_pipe() in get_saved_enc() (Ville)
Renamed 'pipe' to 'dev_id' in audio side code (Takashi)
v5:
Included a comment for the dev_id arg. (Libin)
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1474488168-2343-1-git-send-email-dhinakaran.pandiyan@intel.com
Fix sparse warnings:
drivers/gpu/drm/i915/i915_drv.c:1179:5: warning: symbol
'i915_driver_load' was not declared. Should it be static?
drivers/gpu/drm/i915/i915_drv.c:1267:6: warning: symbol
'i915_driver_unload' was not declared. Should it be static?
drivers/gpu/drm/i915/i915_drv.c:2444:25: warning: symbol 'i915_pm_ops'
was not declared. Should it be static?
Fixes: 42f5551d27 ("drm/i915: Split out the PCI driver interface to i915_pci.c")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1473946137-1931-3-git-send-email-jani.nikula@intel.com
Storing the port enum in intel_encoder makes it convenient to know the
port attached to an encoder. Moving the port information up from
intel_digital_port to intel_encoder avoids unecessary intel_digital_port
access and handles MST encoders cleanly without requiring conditional
checks for them (thanks danvet).
v2:
Renamed the port enum member from 'attached_port' to 'port' (danvet)
Fixed missing initialization of port in intel_sdvo.c (danvet)
v3:
Fixed missing initialization of port in intel_crt.c (Ville)
v4:
Storing port for DVO encoders too.
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Lyude <cpaul@redhat.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1474334681-22690-3-git-send-email-dhinakaran.pandiyan@intel.com
Consolidate the instdone logic so we can get a bit fancier. This patch also
removes the duplicated print of INSTDONE[0].
v2: (Imre)
- Rebased on top of hangcheck INSTDONE changes.
- Move all INSTDONE registers into a single struct, store it within the
engine error struct during error capturing.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1474379673-28326-1-git-send-email-imre.deak@intel.com
Adding the ddb size into the devide info will avoid
platform checks while computing wm.
v2: Added comment and WARN_ON if ddb size is zero.(Jani)
v3: Added WARN_ON at the right place.(Jani)
Suggested-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Signed-off-by: Deepak M <m.deepak@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1473931870-7724-1-git-send-email-m.deepak@intel.com
No functional changes; just renaming a bit, tweaking a datatype,
prettifying layout, and adding comments, in particular in the
GuC setup code that touches this data.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1473711577-11454-2-git-send-email-david.s.gordon@intel.com
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We are about to specialize object synchronisation to enable nonblocking
execbuf submission. First we make a copy of the current object
synchronisation for execbuffer. The general i915_gem_object_sync() will
be removed following the removal of CS flips in the near future.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: John Harrison <john.c.harrison@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-16-chris@chris-wilson.co.uk
Update reset path in preparation for engine reset which requires
identification of incomplete requests and associated context and fixing
their state so that engine can resume correctly after reset.
The request that caused the hang will be skipped and head is reset to the
start of breadcrumb. This allows us to resume from where we left-off.
Since this request didn't complete normally we also need to cleanup elsp
queue manually. This is vital if we employ nonblocking request
submission where we may have a web of dependencies upon the hung request
and so advancing the seqno manually is no longer trivial.
ABI: gem_reset_stats / DRM_IOCTL_I915_GET_RESET_STATS
We change the way we count pending batches. Only the active context
involved in the reset is marked as either innocent or guilty, and not
mark the entire world as pending. By inspection this only affects
igt/gem_reset_stats (which assumes implementation details) and not
piglit.
ARB_robustness gives this guide on how we expect the user of this
interface to behave:
* Provide a mechanism for an OpenGL application to learn about
graphics resets that affect the context. When a graphics reset
occurs, the OpenGL context becomes unusable and the application
must create a new context to continue operation. Detecting a
graphics reset happens through an inexpensive query.
And with regards to the actual meaning of the reset values:
Certain events can result in a reset of the GL context. Such a reset
causes all context state to be lost. Recovery from such events
requires recreation of all objects in the affected context. The
current status of the graphics reset state is returned by
enum GetGraphicsResetStatusARB();
The symbolic constant returned indicates if the GL context has been
in a reset state at any point since the last call to
GetGraphicsResetStatusARB. NO_ERROR indicates that the GL context
has not been in a reset state since the last call.
GUILTY_CONTEXT_RESET_ARB indicates that a reset has been detected
that is attributable to the current GL context.
INNOCENT_CONTEXT_RESET_ARB indicates a reset has been detected that
is not attributable to the current GL context.
UNKNOWN_CONTEXT_RESET_ARB indicates a detected graphics reset whose
cause is unknown.
The language here is explicit in that we must mark up the guilty batch,
but is loose enough for us to relax the innocent (i.e. pending)
accounting as only the active batches are involved with the reset.
In the future, we are looking towards single engine resetting (with
minimal locking), where it seems inappropriate to mark the entire world
as innocent since the reset occurred on a different engine. Reducing the
information available means we only have to encounter the pain once, and
also reduces the information leaking from one context to another.
v2: Legacy ringbuffer submission required a reset following hibernation,
or else we restore stale values to the RING_HEAD and walked over
stolen garbage.
v3: GuC requires replaying the requests after a reset.
v4: Restore engine IRQ after reset (so waiters will be woken!)
Rearm hangcheck if resetting with a waiter.
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-13-chris@chris-wilson.co.uk
Since we have a cooperative mode now with a direct reset, we can avoid
the contention on struct_mutex and instead try then sleep on the
I915_RESET_IN_PROGRESS bit. If the mutex is held and that bit is
cleared, all is fine. Otherwise, we sleep for a bit and try again. In
the worst case we sleep for an extra second waiting for the mutex to be
released (no one touching the GPU is allowed the struct_mutex whilst the
I915_RESET_IN_PROGRESS bit is set). But when we have a direct reset,
this allows us to clean up the reset worker faster.
v2: Remember to call wake_up_bit() after changing (for the faster wakeup
as promised)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-12-chris@chris-wilson.co.uk
If a waiter is holding the struct_mutex, then the reset worker cannot
reset the GPU until the waiter returns. We do not want to return -EAGAIN
form i915_wait_request as that breaks delicate operations like
i915_vma_unbind() which often cannot be restarted easily, and returning
-EIO is just as useless (and has in the past proven dangerous). The
remaining WARN_ON(i915_wait_request) serve as a valuable reminder that
handling errors from an indefinite wait are tricky.
We can keep the current semantic that knowing after a reset is complete,
so is the request, by performing the reset ourselves if we hold the
mutex.
uevent emission is still handled by the reset worker, so it may appear
slightly out of order with respect to the actual reset (and concurrent
use of the device).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-11-chris@chris-wilson.co.uk
In preparation for introducing a per-engine reset, we can first separate
the mixing of the reset state from the global reset counter.
The loss of atomicity in updating the reset state poses a small problem
for handling the waiters. For requests, this is solved by advancing the
seqno so that a waiter waking up after the reset knows the request is
complete. For pending flips, we still rely on the increment of the
global reset epoch (as well as the reset-in-progress flag) to signify
when the hardware was reset.
The advantage, now that we do not inspect the reset state during reset
itself i.e. we no longer emit requests during reset, is that we can use
the atomic updates of the state flags to ensure that only one reset
worker is active.
v2: Mika spotted that I transformed the i915_gem_wait_for_error() wakeup
into a waiter wakeup.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470414607-32453-6-git-send-email-arun.siluvery@linux.intel.com
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-7-chris@chris-wilson.co.uk
Moving all GPU features to the platform definition allows for
- standard place when adding new features from new platform
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Make the .hws_needs_physical the exception by switching the flag
on earlier platforms since they are fewer to support. Remove the flag on
later GPUs hardware since they all use GTT hws by default.
Switch the logic as well in the driver to reflect this change
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform struct definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform struct definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform struct definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform struct definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform struct definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform struct definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
No need for HAS_CORE_RING_FREQ as that flag is actually the same as
.has_llc. Feedback from V. Syrjala.
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform struct definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[patch series] Moving all GPU features to the platform struct definition
allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct definition
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We don't have safe 64-bit mmio writes as they are really split into
2x32-bit writes. This tearing is dangerous as the hardware *will*
operate on the intermediate value, requiring great care when assigning.
(See, for example, i965_write_fence_reg.) As such we don't currently use
them and strongly advise not to us them. Go one step further and remove
the 64-bit write vfuncs.
v2: Add some more details to the comment about why WRITE64 is absent,
and why you need to think twice before using READ64.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160906144538.4204-1-chris@chris-wilson.co.uk
In an upcoming patch we'll need the actual mask of subslices in addition
to their count, so convert the subslice_per_slice field to a mask.
Also we can easily calculate subslice_total from the other fields, so
instead of storing a cached version of this, add a helper to calculate
it.
v2:
- Use hweight8() on u8 typed vars instead of hweight32(). (Ben)
Reviewed-by: Robert Bragg <robert@sixbynine.org> (v1)
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Tested-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Signed-off-by: Imre Deak <imre.deak@intel.com>
In an upcoming patch we'll need the actual mask of slices in addition to
their count, so replace the count field with a mask.
v2:
- Use hweight8() on u8 typed vars instead of hweight32(). (Ben)
Reviewed-by: Robert Bragg <robert@sixbynine.org> (v1)
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Tested-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1472659987-10417-5-git-send-email-imre.deak@intel.com
Move all slice/subslice/eu related properties to the sseu_dev_info
struct.
No functional change.
v2:
- s/info/sseu/ based on the new struct name. (Ben)
Reviewed-by: Robert Bragg <robert@sixbynine.org> (v1)
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Tested-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Signed-off-by: Imre Deak <imre.deak@intel.com>
The data in this struct is provided both by getting the
slice/subslice/eu features available on a given platform and the actual
runtime state of these same features which depends on the HW's current
power saving state.
Atm members of this struct are duplicated in sseu_dev_status and
intel_device_info. For clarity and code reuse we can share one struct
for both of the above purposes. This patch only moves the struct to the
header file, the next patch will convert users of intel_device_info to
use this struct too.
Instead of unsigned int u8 is used now, which is big enough and is used
anyway in intel_device_info.
No functional change.
v2:
- s/stat/sseu/ based on the new struct name (Ben)
Reviewed-by: Robert Bragg <robert@sixbynine.org> (v1)
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Tested-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1472659987-10417-2-git-send-email-imre.deak@intel.com
Use atomic type and operands for dev_priv->mm.bsd_engine_dispatch_index
to avoid one struct_mutex locking scenario.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1472731101-21982-1-git-send-email-joonas.lahtinen@linux.intel.com
The mentioned commit changes intel_display_crc_init to take a dev_priv,
but forgets to change the stub.
Cc: David Weinehall <david.weinehall@linux.intel.com>
Fixes: 36cdd0138b ("drm/i915: debugfs spring cleaning")
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reported-and-by: Kim Lidström <kim@dxtr.im>
Link: http://patchwork.freedesktop.org/patch/msgid/1472116022-17598-1-git-send-email-maarten.lankhorst@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Rather than walk the full array of engines checking whether each is in
the mask in turn, we can use the mask to jump to the right engines. This
should quicker for a sparse array of engines or mask, whilst generating
smaller code:
text data bss dec hex filename
1251010 4579 800 1256389 132bc5 drivers/gpu/drm/i915/i915.ko
1250530 4579 800 1255909 1329e5 drivers/gpu/drm/i915/i915.ko
The downside is that we have to pass in a temporary, alas no C99
iterators yet.
[P.S. Joonas doesn't like having to pass extra temporaries into the
macro, and even less that I called them tmp. As yet, we haven't found a
macro that avoids passing in a temporary that is smaller. We probably
will get C99 iterators first!]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160827075401.16470-2-chris@chris-wilson.co.uk
Now that we have working partial VMA and faulting support for all
objects, including fence support, advertise to userspace that it can
take advantage of unlimited GGTT mmaps.
v2: Make room in the kerneldoc for a more detailed explanation of the
limitations of the GTT mmap interface.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20160825180519.11341-1-chris@chris-wilson.co.uk
Since we have to write ddb allocations at the same time as we do other
plane updates, we're going to need to be able to control the order in
which we execute modesets on each pipe. The easiest way to do this is to
just factor this section of intel_atomic_commit_tail()
(intel_atomic_commit() for stable branches) into it's own function, and
add an appropriate display function hook for it.
Based off of Matt Rope's suggestions
Changes since v1:
- Drop pipe_config->base.active check in intel_update_crtcs() since we
check that before calling the function
Signed-off-by: Lyude <cpaul@redhat.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
[omitting CC for stable, since this patch will need to be changed for
such backports first]
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Lyude <cpaul@redhat.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471961565-28540-1-git-send-email-cpaul@redhat.com
This is required for supporting nonblocking modesets. Iterating over
the connector list will no longer be allowed when we don't hold
connection_mutex, so we have to use the atomic state.
Fix disable_noatomic by populating the minimal state required to
disable a connector.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470755054-32699-3-git-send-email-maarten.lankhorst@linux.intel.com
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Just like with sysfs, we do some major overhaul.
Pass dev_priv instead of dev to all feature macros (IS_, HAS_,
INTEL_, etc.). This has the side effect that a bunch of functions
now get dev_priv passed instead of dev.
All calls to INTEL_INFO()->gen have been replaced with
INTEL_GEN().
We want access to to_i915(node->minor->dev) in a lot of places,
so add the node_to_i915() helper to accommodate for this.
Finally, we have quite a few cases where we get a void * pointer,
and need to cast it to drm_device *, only to run to_i915() on it.
Add cast_to_i915() to do this.
v2: Don't introduce extra dev (Chris)
v3: Make pipe_crc_info have a pointer to drm_i915_private instead of
drm_device. This saves a bit of space, since we never use
drm_device anywhere in these functions.
Also some minor fixup that I missed in the previous version.
v4: Changed the code a bit so that dev_priv is passed directly
to various functions, thus removing the need for the
cast_to_i915() helper. Also did some additional cleanup.
v5: Additional cleanup of newly introduced changes.
v6: Rebase again because of conflict.
Signed-off-by: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160822105931.pcbe2lpsgzckzboa@boom
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Various cleanup for i915_sysfs.c; we now use dev_priv whenever
possible. The kdev_to_drm_minor() helper function has been
replaced by one that converts from struct device *
to struct drm_i915_private *.
We already have a seemingly identical helper (kdev_to_i915())
in i915_drv.h. But that one cannot be used here.
Unlike the version in i915_drv.h, this helper
reaches i915 through drm_minor.
v2: Rename kdev_to_i915_dm() to kdev_minor_to_i915() (Chris)
Signed-off-by: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160822103245.24069-4-david.weinehall@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We currently have a mix of struct device *device, struct device *kdev,
and struct device *dev (the latter forcing us to refer to
struct drm_device as something else than the normal dev).
To simplify things, always use kdev when referring to struct device.
v2: Replace the dev_to_drm_minor() macro with the inline function
kdev_to_drm_minor().
Signed-off-by: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160822103245.24069-3-david.weinehall@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Since the watermark calculations for Skylake are still broken, we're apt
to hitting underruns very easily under multi-monitor configurations.
While it would be lovely if this was fixed, it's not. Another problem
that's been coming from this however, is the mysterious issue of
underruns causing full system hangs. An easy way to reproduce this with
a skylake system:
- Get a laptop with a skylake GPU, and hook up two external monitors to
it
- Move the cursor from the built-in LCD to one of the external displays
as quickly as you can
- You'll get a few pipe underruns, and eventually the entire system will
just freeze.
After doing a lot of investigation and reading through the bspec, I
found the existence of the SAGV, which is responsible for adjusting the
system agent voltage and clock frequencies depending on how much power
we need. According to the bspec:
"The display engine access to system memory is blocked during the
adjustment time. SAGV defaults to enabled. Software must use the
GT-driver pcode mailbox to disable SAGV when the display engine is not
able to tolerate the blocking time."
The rest of the bspec goes on to explain that software can simply leave
the SAGV enabled, and disable it when we use interlaced pipes/have more
then one pipe active.
Sure enough, with this patchset the system hangs resulting from pipe
underruns on Skylake have completely vanished on my T460s. Additionally,
the bspec mentions turning off the SAGV with more then one pipe enabled
as a workaround for display underruns. While this patch doesn't entirely
fix that, it looks like it does improve the situation a little bit so
it's likely this is going to be required to make watermarks on Skylake
fully functional.
This will still need additional work in the future: we shouldn't be
enabling the SAGV if any of the currently enabled planes can't enable WM
levels that introduce latencies >= 30 µs.
Changes since v11:
- Add skl_can_enable_sagv()
- Make sure we don't enable SAGV when not all planes can enable
watermarks >= the SAGV engine block time. I was originally going to
save this for later, but I recently managed to run into a machine
that was having problems with a single pipe configuration + SAGV.
- Make comparisons to I915_SKL_SAGV_NOT_CONTROLLED explicit
- Change I915_SAGV_DYNAMIC_FREQ to I915_SAGV_ENABLE
- Move printks outside of mutexes
- Don't print error messages twice
Changes since v10:
- Apparently sandybridge_pcode_read actually writes values and reads
them back, despite it's misleading function name. This means we've
been doing this mostly wrong and have been writing garbage to the
SAGV control. Because of this, we no longer attempt to read the SAGV
status during initialization (since there are no helpers for this).
- mlankhorst noticed that this patch was breaking on some very early
pre-release Skylake machines, which apparently don't allow you to
disable the SAGV. To prevent machines from failing tests due to SAGV
errors, if the first time we try to control the SAGV results in the
mailbox indicating an invalid command, we just disable future attempts
to control the SAGV state by setting dev_priv->skl_sagv_status to
I915_SKL_SAGV_NOT_CONTROLLED and make a note of it in dmesg.
- Move mutex_unlock() a little higher in skl_enable_sagv(). This
doesn't actually fix anything, but lets us release the lock a little
sooner since we're finished with it.
Changes since v9:
- Only enable/disable sagv on Skylake
Changes since v8:
- Add intel_state->modeset guard to the conditional for
skl_enable_sagv()
Changes since v7:
- Remove GEN9_SAGV_LOW_FREQ, replace with GEN9_SAGV_IS_ENABLED (that's
all we use it for anyway)
- Use GEN9_SAGV_IS_ENABLED instead of 0x1 for clarification
- Fix a styling error that snuck past me
Changes since v6:
- Protect skl_enable_sagv() with intel_state->modeset conditional in
intel_atomic_commit_tail()
Changes since v5:
- Don't use is_power_of_2. Makes things confusing
- Don't use the old state to figure out whether or not to
enable/disable the sagv, use the new one
- Split the loop in skl_disable_sagv into it's own function
- Move skl_sagv_enable/disable() calls into intel_atomic_commit_tail()
Changes since v4:
- Use is_power_of_2 against active_crtcs to check whether we have > 1
pipe enabled
- Fix skl_sagv_get_hw_state(): (temp & 0x1) indicates disabled, 0x0
enabled
- Call skl_sagv_enable/disable() from pre/post-plane updates
Changes since v3:
- Use time_before() to compare timeout to jiffies
Changes since v2:
- Really apply minor style nitpicks to patch this time
Changes since v1:
- Added comments about this probably being one of the requirements to
fixing Skylake's watermark issues
- Minor style nitpicks from Matt Roper
- Disable these functions on Broxton, since it doesn't have an SAGV
Signed-off-by: Lyude <cpaul@redhat.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471463761-26796-3-git-send-email-cpaul@redhat.com
[mlankhorst: ENOSYS -> ENXIO, whitespace fixes]
Very old numbers indicate this is a 66% improvement when remapping the
entire object for fence contention - due to the elimination of
track_pfn_insert and its strcmp.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Testcase: igt/gem_fence_upload/performance
Testcase: igt/gem_mmap_gtt
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-6-chris@chris-wilson.co.uk
In the recent patch
bc3d674 drm/i915: Allow userspace to request no-error-capture upon ...
the final version moved the flags and the associated #defines around
so they were adjacent; unfortunately, they ended up between a comment
and the thing (hw_id) to which the comment applies :(
So this patch reshuffles the comment and subject back together.
Also, as we're touching 'hw_id', let's change it from just 'unsigned'
to a fully-specified 'unsigned int', because some code checking tools
(including checkpatch) object to plain 'unsigned'.
Fixes: bc3d674462 ("drm/i915: Allow userspace to request no-error-capture...")
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1471616622-6919-1-git-send-email-david.s.gordon@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the developer adds a register in the wrong order, we BUG during boot.
That makes development and testing very difficult. Let's be a bit more
friendly and disable the command parser with a big warning if the tables
are invalid.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-31-chris@chris-wilson.co.uk
If FBC is set on a framebuffer that is unmapped, all GTT faults will be
from a partial mapping. Writes by the user through the partial VMA are
then untracked by the FBC and so we must use the ORIGIN_CPU when flushing
the I915_GEM_DOMAIN_GTT.
v2: Keep ORIGIN_CPU for set-to-domain(.write=CPU)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: "Zanoni, Paulo R" <paulo.r.zanoni@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-25-chris@chris-wilson.co.uk
In order to handle tiled partial GTT mmappings, we need to associate the
fence with an individual vma.
v2: A couple of silly drops replaced spotted by Joonas
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-21-chris@chris-wilson.co.uk
Our current practice is to only name the actual list (here
dev_priv->fence_list) using "list", and elements upon that list are
referred to as "link". Further, the lru nature is of the list and not of
the node and including in the name does not disambiguate the link from
anything else.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-20-chris@chris-wilson.co.uk
By moving map-and-fenceable tracking from the object to the VMA, we gain
fine-grained tracking and the ability to track individual fences on the VMA
(subsequent patch).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-16-chris@chris-wilson.co.uk
This is a companion to i915_gem_obj_prepare_shmem_read() that prepares
the backing storage for direct writes. It first serialises with the GPU,
pins the backing storage and then indicates what clfushes are required in
order for the writes to be coherent.
Whilst here, fix support for ancient CPUs without clflush for which we
cannot do the GTT+clflush tricks.
v2: Add i915_gem_obj_finish_shmem_access() for symmetry
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-8-chris@chris-wilson.co.uk
Since vfree() now likes to WARN when passed a non-page-aligned pointer,
we need to discard the low bits to comply with it.
Fixes: d31d7cb146 ("drm/i915: Support for creating write combined type vmaps")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-3-chris@chris-wilson.co.uk
If userspace is asynchronously streaming into the batch or other
execobjects, we may not flush those writes along with a change in cache
domain (as there is no change). Therefore those writes may end up in
internal chipset buffers and not visible to the GPU upon execution. We
must issue a flush command or otherwise we encounter incoherency in the
batchbuffers and the GPU executing invalid commands (i.e. hanging) quite
regularly.
v2: Throw a paranoid wmb() into the general flush so that we remain
consistent with before.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90841
Fixes: 1816f92363 ("drm/i915: Support creation of unbound wc user...")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Tested-by: Matti Hämäläinen <ccr@tnsp.org>
Cc: stable@vger.kernel.org
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-1-chris@chris-wilson.co.uk
It is useful when looking at captured error states to check the recorded
BBADDR register (the address of the last batchbuffer instruction loaded)
against the expected offset of the batch buffer, and so do a quick check
that (a) the capture is true or (b) HEAD hasn't wandered off into the
badlands.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-30-git-send-email-chris@chris-wilson.co.uk
Since contexts are not currently shared between userspace processes, we
have an exact correspondence between context creator and guilty batch
submitter. Therefore we can save some per-batch work by inspecting the
context->pid upon error instead. Note that we take the context's
creator's pid rather than the file's pid in order to better track fd
passed over sockets.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-29-git-send-email-chris@chris-wilson.co.uk
This little helper only exists to safely discard the upper unused 32bits
of the general 64-bit VMA address - as we know that all Global GTT
currently are less than 4GiB in size and so that the upper bits must be
zero. In many places, we use a u32 for the global GTT offset and we want
to document where we are discarding the full VMA offset.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-28-git-send-email-chris@chris-wilson.co.uk
Treat the VMA as the primary struct responsible for tracking bindings
into the GPU's VM. That is we want to treat the VMA returned after we
pin an object into the VM as the cookie we hold and eventually release
when unpinning. Doing so eliminates the ambiguity in pinning the object
and then searching for the relevant pin later.
v2: Joonas' stylistic nitpicks, a fun rebase.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-27-git-send-email-chris@chris-wilson.co.uk
When working with contexts, we most frequently want the GGTT VMA for the
context state, first and foremost. Since the object is available via the
VMA, we need only then store the VMA.
v2: Formatting tweaks to debugfs output, restored some comments removed
in the next patch
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-15-git-send-email-chris@chris-wilson.co.uk
The VMA are unreferenced, they belong to the object and live until they
are closed. However, if we want to use the VMA as a cookie and use it to
keep the object alive, we want to hold onto a reference to the object
for the lifetime of the VMA cookie. To facilitate this, add a couple of
simple wrappers for managing the reference count on the object owning the
VMA.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-11-git-send-email-chris@chris-wilson.co.uk
A simple little macro to clear a pointer and return the old value. This
is useful for writing
value = *ptr;
if (!value)
return;
*ptr = 0;
...
free(value);
in a slightly more concise form:
value = fetch_and_zero(ptr);
if (!value)
return;
...
free(value);
with the idea that this establishes a pattern that may be extended for
atomic use (using xchg or cmpxchg) i.e. atomic_fetch_and_zero() and
similar to llist.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-10-git-send-email-chris@chris-wilson.co.uk
No longer is knowing how much of the GTT (both mappable aperture and
beyond) relevant, and the output clutters the real information - that is
how many objects are allocated and bound (and by who) so that we can
quickly grasp if there is a leak.
v2: Relent, and rename pinned to indicate display only. Since the
display objects are semi-static and are of variable size, they are the
interesting objects to watch over time for aperture leaking. The other
pins are either static (such as the scratch page) or very short lived
(such as execbuf) and not part of the precious GGTT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-6-git-send-email-chris@chris-wilson.co.uk
When capturing the error state, we do not need to know about every
address space - just those that are related to the error. We know which
context is active at the time, therefore we know which VM are implicated
in the error. We can then restrict the VM which we report to the
relevant subset.
v2: s/i/count_active/ (and similar)
Rewrite label generation for "Buffers"
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-2-git-send-email-chris@chris-wilson.co.uk
This patch provides the infrastructure for performing a 16-byte aligned
read from WC memory using non-temporal instructions introduced with sse4.1.
Using movntdqa we can bypass the CPU caches and read directly from memory
and ignoring the page attributes set on the CPU PTE i.e. negating the
impact of an otherwise UC access. Copying using movntdqa from WC is almost
as fast as reading from WB memory, modulo the possibility of both hitting
the CPU cache or leaving the data in the CPU cache for the next consumer.
(The CPU cache itself my be flushed for the region of the movntdqa and on
later access the movntdqa reads from a separate internal buffer for the
cacheline.) The write back to the memory is however cached.
This will be used in later patches to accelerate accessing WC memory.
v2: Report whether the accelerated copy is successful/possible.
v3: Function alignment override was only necessary when using the
function target("sse4.1") - which is not necessary for emitting movntdqa
from __asm__.
v4: Improve notes on CPU cache behaviour vs non-temporal stores.
v5: Fix byte offsets for unrolled moves.
v6: Find all remaining typos of "movntqda", use kernel_fpu_begin.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-2-git-send-email-chris@chris-wilson.co.uk
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
Until now code was calling hweight32 to figure out the
number from device_info->ring_mask at runtime. Instead
we can cache it at engine init time and use directly.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1470842530-35854-1-git-send-email-tvrtko.ursulin@linux.intel.com
In the preceding patches we made sure that:
- the LVDS encoder takes care of reiniting both the LVDS register
and its PPS
- the eDP encoder takes care of reiniting its PPS
- the PPS register unlocking workaround is applied explicitly whenever
the PPS context is lost
Based on the above we can safely remove the opaque LVDS and PPS save /
restore from generic code.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470827254-21954-6-git-send-email-imre.deak@intel.com
The PPS registers are pretty much the same everywhere, the differences
being:
- Register fields appearing, disappearing from one platform to the
next: panel-reset-on-powerdown, backlight-on, panel-port,
register-unlock
- Different register base addresses
- Different number of PPS instances: 2 on VLV/CHV/BXT, 1 everywhere
else.
We can merge the separate set of PPS definitions by extending the PPS
instance argument to all platforms and using instance 0 on platforms
with a single instance. This means we'll need to calculate the register
addresses dynamically based on the given platform and PPS instance.
v2:
- Simplify if ladder in intel_pps_get_registers(). (Ville)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470827254-21954-1-git-send-email-imre.deak@intel.com
The bottom-half we use for processing the breadcrumb interrupt is a
task, which is an RCU protected struct. When accessing this struct, we
need to be holding the RCU read lock to prevent it disappearing beneath
us. We can use the RCU annotation to mark our irq_seqno_bh pointer as
being under RCU guard and then use the RCU accessors to both provide
correct ordering of access through the pointer.
Most notably, this fixes the access from hard irq context to use the RCU
read lock, which both Daniel and Tvrtko complained about.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470761272-1245-3-git-send-email-chris@chris-wilson.co.uk
This function would call drm_modeset_lock_all, while the suspend/resume
functions already have their own locking. Fix this by factoring out
__intel_display_resume, and calling the atomic helpers for duplicating
atomic state and disabling all crtc's during suspend.
Changes since v1:
- Deal with -EDEADLK right after lock_all and clean up calls
to hw readout.
- Always take all modeset locks so updates during gpu reset are blocked.
Changes since v2:
- Fix deadlock in intel_update_primary_planes.
- Move WARN_ON(EDEADLK) to __intel_display_resume.
- pctx -> ctx
- only call __intel_display_resume on success in intel_display_resume.
Changes since v3:
- Rebase on top of dev_priv -> dev change.
- Use drm_modeset_lock_all_ctx instead of drm_modeset_lock_all.
Changes since v4 [by vsyrjala]:
- Deal with skip_intermediate_wm
- Update comment w.r.t. mode_config.mutex vs. ->detect()
- Rebase due to INTEL_GEN() etc.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Fixes: e2c8b8701e ("drm/i915: Use atomic helpers for suspend, v2.")
Cc: stable@vger.kernel.org
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470428910-12125-2-git-send-email-ville.syrjala@linux.intel.com
In the previous commit, we moved the obj->tiling_mode out of a bitfield
and into its own integer so that we could safely use READ_ONCE(). Let us
now repair some of that damage by sharing the tiling_mode with its
companion, the fence stride.
v2: New magic
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-18-git-send-email-chris@chris-wilson.co.uk
Since we are not concerned with userspace racing itself with set-tiling
(the order is indeterminant even if we take a lock), then we can safely
read back the single obj->tiling_mode and do the static lookup of
swizzle mode without having to take a lock.
get-tiling is reasonably frequent due to the back-channel passing around
of tiling parameters in DRI2/DRI3.
v2: Make tiling_mode a full unsigned int so that we can trivially use it
with READ_ONCE(). Separating it out into manual control over the flags
field was too noisy for a simple patch. Note that we could use the lower
bits of obj->stride for the tiling mode.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-16-git-send-email-chris@chris-wilson.co.uk
The principal motivation for this was to try and eliminate the
struct_mutex from i915_gem_suspend - but we still need to hold the mutex
current for the i915_gem_context_lost(). (The issue there is that there
may be an indirect lockdep cycle between cpu_hotplug (i.e. suspend) and
struct_mutex via the stop_machine().) For the moment, enabling last
request tracking for the engine, allows us to do busyness checking and
waiting without requiring the struct_mutex - which is useful in its own
right.
As a side-effect of having a robust means for tracking engine busyness,
we can replace our other busyness heuristic, that of comparing against
the last submitted seqno. For paranoid reasons, we have a semi-ordered
check of that seqno inside the hangchecker, which we can now improve to
an ordered check of the engine's busyness (removing a locked xchg in the
process).
v2: Pass along "bool interruptible" as being unlocked we cannot rely on
i915->mm.interruptible being stable or even under our control.
v3: Replace check Ironlake i915_gpu_busy() with the common precalculated value
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-6-git-send-email-chris@chris-wilson.co.uk
Before suspending (or unloading), we would first wait upon all rendering
to be completed and then disable the rings. This later step is a remanent
from DRI1 days when we did not use request tracking for all operations
upon the ring. Now that we are sure we are waiting upon the very last
operation by the engine, we can forgo clobbering the ring registers,
though we do keep the assert that the engine is indeed idle before
sleeping.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-5-git-send-email-chris@chris-wilson.co.uk
We are motivated to avoid using a bitfield for obj->active for a couple
of reasons. Firstly, we wish to document our lockless read of obj->active
using READ_ONCE inside i915_gem_busy_ioctl() and that requires an
integral type (i.e. not a bitfield). Secondly, gcc produces abysmal code
when presented with a bitfield and that shows up high on the profiles of
request tracking (mainly due to excess memory traffic as it converts
the bitfield to a register and back and generates frequent AGI in the
process).
v2: BIT, break up a long line in compute the other engines, new paint
for i915_gem_object_is_active (now i915_gem_object_get_active).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-23-git-send-email-chris@chris-wilson.co.uk
The individual bits inside obj->frontbuffer_bits are protected by each
plane->mutex, but the whole bitfield may be accessed by multiple KMS
operations simultaneously and so the RMW need to be under atomics.
However, for updating the single field we do not need to mandate that it
be under the struct_mutex, one more step towards its removal as the de
facto BKL.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-21-git-send-email-chris@chris-wilson.co.uk
We only need a very lightweight mechanism here as the locking is only
used for co-ordinating a bitfield.
v2: Move the cheap unlikely tests into the caller
v3: Move the kerneldoc into the header (now separated out into
intel_fronbuffer.h for better kerneldoc and readability)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtien <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-20-git-send-email-chris@chris-wilson.co.uk
Since i915_gem_obj_ggtt_pin() is an idiom breaking curry function for
i915_gem_object_ggtt_pin(), spare us the confusion and remove it.
Removing it now simplifies later patches to change the i915_vma_pin()
(and friends) interface.
v2: Add a redundant GEM_BUG_ON(!view) to
i915_gem_obj_lookup_or_create_ggtt_vma()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-18-git-send-email-chris@chris-wilson.co.uk
During execbuffer we look up the i915_vma in order to reserve them in
the VM. However, we then do a double lookup of the vma in order to then
pin them, all because we lack the necessary interfaces to operate on
i915_vma - so introduce i915_vma_pin()!
v2: Tidy parameter lists to remove one level of redirection in the hot
path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-15-git-send-email-chris@chris-wilson.co.uk
For consistency, internal functions should take drm_i915_private rather
than drm_device. Now that we are subclassing drm_device, there are no
more size wins, but being consistent is its own blessing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-12-git-send-email-chris@chris-wilson.co.uk
In order to be consistent with other address space functions, we want to
pass around 64-bit sizes, even though all known global GTT are limited
to 4GiB. Similarly, we are trying to be consistent in using the _ggtt_
nomenclature when referring to the special global GTT.
v2: Update docs to consistently state "global GTT".
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-11-git-send-email-chris@chris-wilson.co.uk
Our GPUs impose certain requirements upon buffers that depend upon how
exactly they are used. Typically this is expressed as that they require
a larger surface than would be naively computed by pitch * height.
Normally such requirements are hidden away in the userspace driver, but
when we accept pointers from strangers and later impose extra conditions
on them, the original client allocator has no idea about the
monstrosities in the GPU and we require the userspace driver to inform
the kernel how many padding pages are required beyond the client
allocation.
v2: Long time, no see
v3: Try an anonymous union for uapi struct compatibility
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-7-git-send-email-chris@chris-wilson.co.uk
This is not the full fix, as we are required to percolate the u64 nature
down through the drm_mm stack, but this is required now to prevent
explosions due to mismatch between execbuf (eb_vma_misplaced) and vma
binding (i915_vma_misplaced) - and reduces the risk of spurious changes
as we adjust the vma interface in the next patches.
v2: long long casts not required for u64 printk (%llx)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-6-git-send-email-chris@chris-wilson.co.uk
This reimplements the denial-of-service protection against igt from
commit 227f782e46 ("drm/i915: Retire requests before creating a new
one") and transfers the stall from before each batch into get_pages().
The issue is that the stall is increasing latency between batches which
is detrimental in some cases (especially coupled with execlists) to
keeping the GPU well fed. Also we have made the observation that retiring
requests can of itself free objects (and requests) and therefore makes
a good first step when shrinking.
v2: Recycle objects prior to i915_gem_object_get_pages()
v3: Remove the reference to the ring from i915_gem_requests_ring() as it
operates on an intel_engine_cs.
v4: Since commit 9b5f4e5ed6 ("drm/i915: Retire oldest completed request
before allocating next") we no longer need the safeguard to retire
requests before get_pages(). We no longer see the huge latencies when
hitting the shrinker between allocations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-4-git-send-email-chris@chris-wilson.co.uk
This reverts commit e9f24d5fb7.
The patch was only a stop-gap measure that fixed half the problem - the
leak of the fbcon when restarting X. A complete solution required
releasing the VMA when the object itself was closed rather than rely on
file/process exit. The previous patches add the VMA tracking necessary
to do close them along with the object, context or file, and so the time
has come to remove the partial fix.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-28-git-send-email-chris@chris-wilson.co.uk
When the user closes the context mark it and the dependent address space
as closed. As we use an asynchronous destruct method, this has two
purposes. First it allows us to flag the closed context and detect
internal errors if we to create any new objects for it (as it is removed
from the user's namespace, these should be internal bugs only). And
secondly, it allows us to immediately reap stale vma.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-27-git-send-email-chris@chris-wilson.co.uk
In order to prevent a leak of the vma on shared objects, we need to
hook into the object_close callback to destroy the vma on the object for
this file. However, if we destroyed that vma immediately we may cause
unexpected application stalls as we try to unbind a busy vma - hence we
defer the unbind to when we retire the vma.
v2: Keep vma allocated until closed. This is useful for a later
optimisation, but it is required now in order to handle potential
recursion of i915_vma_unbind() by retiring itself.
v3: Comments are important.
Testcase: igt/gem_ppggtt/flink-and-close-vma-leak
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-26-git-send-email-chris@chris-wilson.co.uk
This patch is broken out of the next just to remove the code motion from
that patch and make it more readable. What we do here is move the
i915_vma_move_to_active() to i915_gem_execbuffer.c and put the three
stages (read, write, fenced) together so that future modifications to
active handling are all located in the same spot. The importance of this
is so that we can more simply control the order in which the requests
are place in the retirement list (i.e. control the order at which we
retire and so control the lifetimes to avoid having to hold onto
references).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-24-git-send-email-chris@chris-wilson.co.uk
With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.
Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.
Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.
All told, less code, simpler and faster, and more extensible.
v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk
In the next patch, request tracking is made more generic and for that we
need a new expanded struct and to separate out the logic changes from
the mechanical churn, we split out the structure renaming into this
patch.
v2: Writer's block. Add some spiel about why we track requests.
v3: Now i915_gem_active.
v4: Now with i915_gem_active_set() for attaching to the active request.
v5: Use i915_gem_active_set() from inside the retirement handlers
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-10-git-send-email-chris@chris-wilson.co.uk
When we call i915_vma_unbind(), we will wait upon outstanding rendering.
This will also trigger a retirement phase, which may update the object
lists. If, we extend request tracking to the VMA itself (rather than
keep it at the encompassing object), then there is a potential that the
obj->vma_list be modified for other elements upon i915_vma_unbind(). As
a result, if we walk over the object list and call i915_vma_unbind(), we
need to be prepared for that list to change.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-8-git-send-email-chris@chris-wilson.co.uk
Since we may have VMA allocated for an object, but we interrupted their
binding, there is a disparity between have elements on the obj->vma_list
and being bound. i915_gem_obj_bound_any() does this check, but this is
not rigorously observed - add an explicit count to make it easier.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-7-git-send-email-chris@chris-wilson.co.uk
For the global GTT (and aliasing GTT), the address space is owned by the
device (it is a global resource) and so the per-file owner field is
NULL. For per-process GTT (where we create an address space per
context), each is owned by the opening file. We can use this ownership
information to both distinguish GGTT and ppGTT address spaces, as well
as occasionally inspect the owner.
v2: Whitespace, tells us who owns i915_address_space
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-6-git-send-email-chris@chris-wilson.co.uk
The state stored in this struct is not only the information about the
buffer object, but the ring used to communicate with the hardware. Using
buffer here is overly specific and, for me at least, conflates with the
notion of buffer objects themselves.
s/struct intel_ringbuffer/struct intel_ring/
s/enum intel_ring_hangcheck/enum intel_engine_hangcheck/
s/describe_ctx_ringbuf()/describe_ctx_ring()/
s/intel_ring_get_active_head()/intel_engine_get_active_head()/
s/intel_ring_sync_index()/intel_engine_sync_index()/
s/intel_ring_init_seqno()/intel_engine_init_seqno()/
s/ring_stuck()/engine_stuck()/
s/intel_cleanup_engine()/intel_engine_cleanup()/
s/intel_stop_engine()/intel_engine_stop()/
s/intel_pin_and_map_ringbuffer_obj()/intel_pin_and_map_ring()/
s/intel_unpin_ringbuffer()/intel_unpin_ring()/
s/intel_engine_create_ringbuffer()/intel_engine_create_ring()/
s/intel_ring_flush_all_caches()/intel_engine_flush_all_caches()/
s/intel_ring_invalidate_all_caches()/intel_engine_invalidate_all_caches()/
s/intel_ringbuffer_free()/intel_ring_free()/
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-15-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-4-git-send-email-chris@chris-wilson.co.uk
Now that PCU communication is reasonably fast, we do not need to defer
RC6 initialisation to a workqueue.
References: https://bugs.freedesktop.org/show_bug.cgi?id=97017
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Add this workaround to prevent hang when in place compression
is used.
References: HSD#2135774
Cc: stable@vger.kernel.org
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Since commit a6f766f397 ("drm/i915: Limit ring synchronisation (sw
sempahores) RPS boosts") and commit bcafc4e38b ("drm/i915: Limit mmio
flip RPS boosts") we have limited the waitboosting for semaphores and
flips. Ideally we do not want to boost in either of these instances as no
userspace consumer is waiting upon the results (though a userspace producer
may be stalled trying to submit an execbuf - but in this case the
producer is being throttled due to the engine being saturated with
work). With the introduction of NO_WAITBOOST in the previous patch, we
can finally disable these needless boosts.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-6-git-send-email-chris@chris-wilson.co.uk
Migrate the request operations out of the main body of i915_gem.c and
into their own C file for easier expansion.
v2: Move __i915_add_request() across as well
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-1-git-send-email-chris@chris-wilson.co.uk
Before suspend, and especially before building the hibernation image, we
need to context image to be coherent in memory. To do this we require
that we perform a context switch to a disposable context (i.e. the
dev_priv->kernel_context) - when that switch is complete, all other
context images will be complete. This leaves the kernel_context image as
incomplete, but fortunately that is disposable and we can do a quick
fixup of the logical state after resuming.
v2: Share the nearly identical code to switch to the kernel context with
eviction.
v3: Explain why we need the switch and reset.
Testcase: igt/gem_exec_suspend # bsw
References: https://bugs.freedesktop.org/show_bug.cgi?id=96526
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468590980-6186-2-git-send-email-chris@chris-wilson.co.uk
Currently execlists is exempt from emitting a request to switch each
ring away from the current context over to the dev_priv->kernel_context
(for whatever reason, just under execlists the GGTT is unlikely to be as
fragmented, however the switch may help in some extreme cases). Extract
the switcher and enable it for execlsts as well, as we need to do so in
a later patch to force the context switch before suspend. (And since for
that switch we explicitly require the disposable kernel context, rename
the extracted function.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468590980-6186-1-git-send-email-chris@chris-wilson.co.uk
Unfortunately, there's two situations where we lose hpd right now:
- Runtime suspend
- When we've shut off all of the power wells on Valleyview/Cherryview
While it would be nice if this didn't cause issues, this has the
ability to get us in some awkward states where a user won't be able to
get their display to turn on. For instance; if we boot a Valleyview
system without any monitors connected, it won't need any of it's power
wells and thus shut them off. Since this causes us to lose HPD, this
means that unless the user knows how to ssh into their machine and do a
manual reprobe for monitors, none of the monitors they connect after
booting will actually work.
Eventually we should come up with a better fix then having to enable
polling for this, since this makes rpm a lot less useful, but for now
the infrastructure in i915 just isn't there yet to get hpd in these
situations.
Changes since v1:
- Add comment explaining the addition of the if
(!mode_config->poll_running) in intel_hpd_init()
- Remove unneeded if (!dev->mode_config.poll_enabled) in
i915_hpd_poll_init_work()
- Call to drm_helper_hpd_irq_event() after we disable polling
- Add cancel_work_sync() call to intel_hpd_cancel_work()
Changes since v2:
- Apparently dev->mode_config.poll_running doesn't actually reflect
whether or not a poll is currently in progress, and is actually used
for dynamic module paramter enabling/disabling. So now we instead
keep track of our own poll_running variable in dev_priv->hotplug
- Clean i915_hpd_poll_init_work() a little bit
Changes since v3:
- Remove the now-redundant connector loop in intel_hpd_init(), just
rely on intel_hpd_poll_enable() for setting connector->polled
correctly on each connector
- Get rid of poll_running
- Don't assign enabled in i915_hpd_poll_init_work before we actually
lock dev->mode_config.mutex
- Wrap enabled assignment in i915_hpd_poll_init_work() in READ_ONCE()
for doc purposes
- Do the same for dev_priv->hotplug.poll_enabled with WRITE_ONCE in
intel_hpd_poll_enable()
- Add some comments about racing not mattering in intel_hpd_poll_enable
Changes since v4:
- Rename intel_hpd_poll_enable() to intel_hpd_poll_init()
- Drop the bool argument from intel_hpd_poll_init()
- Remove redundant calls to intel_hpd_poll_init()
- Rename poll_enable_work to poll_init_work
- Add some kerneldoc for intel_hpd_poll_init()
- Cross-reference intel_hpd_poll_init() in intel_hpd_init()
- Just copy the loop from intel_hpd_init() in intel_hpd_poll_init()
Changes since v5:
- Minor kerneldoc nitpicks
Cc: stable@vger.kernel.org
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Lyude <cpaul@redhat.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
One of the things preventing us from using polling is the fact that
calling valleyview_crt_detect_hotplug() when there's a VGA cable
connected results in sending another hotplug. With polling enabled when
HPD is disabled, this results in a scenario like this:
- We enable power wells and reset the ADPA
- output_poll_exec does force probe on VGA, triggering a hpd
- HPD handler waits for poll to unlock dev->mode_config.mutex
- output_poll_exec shuts off the ADPA, unlocks dev->mode_config.mutex
- HPD handler runs, resets ADPA and brings us back to the start
This results in an endless irq storm getting sent from the ADPA
whenever a VGA connector gets detected in the middle of polling.
Somewhat based off of the "drm/i915: Disable CRT HPD around force
trigger" patch Ville Syrjälä sent a while back
Cc: stable@vger.kernel.org
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Lyude <cpaul@redhat.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Some hardware requires a valid render context before it can initiate
rc6 power gating of the GPU; the default state of the GPU is not
sufficient and may lead to undefined behaviour. The first execution of
any batch will load the "golden render state", at which point it is safe
to enable rc6. As we do not forcibly load the kernel context at resume,
we have to hook into the batch submission to be sure that the render
state is setup before enabling rc6.
However, since we don't enable powersaving until that first batch, we
queued a delayed task in order to guarantee that the batch is indeed
submitted.
v2: Rearrange intel_disable_gt_powersave() to match.
v3: Apply user specified cur_freq (or idle_freq if not set).
v4: Give in, and supply a delayed work to autoenable rc6
v5: Mika suggested a couple of better names for delayed_resume_work
v6: Rebalance rpm_put around the autoenable task
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468397438-21226-7-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
To allow the user finer control over waitboosting, allow them to set the
frequency we request for the boost. This also them allows to effectively
disable the boosting by setting the boost request to a low frequency.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1468397438-21226-5-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
With the unified common engine setup done, and the execlist engine
initialization loop clearly split into two phases, we can eliminate
the separate legacy engine initialization code.
v2: Fix cleanup path for legacy.
v3: Rename constructors. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris-wilson.co.uk>
Make sure we keep kbuilder happy in all of its random configs by
providing argument names for compile-time stubs.
In file included from drivers/gpu/drm/i915/intel_dp_mst.c:27:0:
drivers/gpu/drm/i915/i915_drv.h: In function 'i915_debugfs_register':
>> drivers/gpu/drm/i915/i915_drv.h:3612:48: error: parameter name omitted
static inline int i915_debugfs_register(struct drm_i915_private *) {return 0;}
^~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/i915_drv.h: In function 'i915_debugfs_unregister':
drivers/gpu/drm/i915/i915_drv.h:3613:51: error: parameter name omitted
static inline void i915_debugfs_unregister(struct drm_i915_private *) {}
Reported-by: 0day
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1468324529-20461-1-git-send-email-chris@chris-wilson.co.uk
One of the numerous VT-d workarounds we require is that the display
hardware reads past the end of the buffer triggering VT-d faults. This
is acknowledged in the code as being safe "since we fill the unused
portions of the GGTT with the scratch page". Alas, that is no longer
always true and so we trigger DMAR read faults.
Skylake also requires another workaround to avoid mixing VT-d and
unpopulated PTE, and so there we also need to ensure we fill unused
entries with the scratch page.
Reported-by: Mike Lothian <mike@fireburn.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96584
Fixes: f7770bfd9f ("drm/i915: Skip clearing the GGTT on full-ppgtt systems")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Weinehall <david.weinehall@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466773634-8106-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: David Weinehall <david.weinehall@intel.com>
Some Kabylake SKUs are going to use Kabypoint PCH.
It is mainly for Halo and DT ones.
>From our specs it doesn't seem that KBP brings
any change on the display south engine. So let's consider
this as a continuation of SunrisePoint, i.e., SPT+.
Since it is easy to get confused by a letter change:
KBL = Kabylake - CPU/GPU codename.
KBP = Kabypoint - PCH codename.
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96826
Link: http://patchwork.freedesktop.org/patch/msgid/1467418032-15167-1-git-send-email-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
As we inspect both the tasklet (to check for an active bottom-half) and
set the irq-posted flag at the same time (both in the interrupt handler
and then in the bottom-halt), group those two together into the same
cacheline. (Not having total control over placement of the struct means
we can't guarantee the cacheline boundary, we need to align the kmalloc
and then each struct, but the grouping should help.)
v2: Try a couple of different names for the state touched by the user
interrupt handler.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467805142-22219-3-git-send-email-chris@chris-wilson.co.uk
Following on from the scenario Tvrtko envisioned to explain a hard-to-hit
race with multiple first waiters, we could also then race in the
__i915_request_irq_complete() and the bottom-half may miss the vital
irq-seqno barrier and so go to sleep not noticing their seqno is
complete.
v2: unlock, not double lock the rcu_read_lock.
Fixes: 3d5564e910 ("drm/i915: Only apply one barrier after...")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467805142-22219-2-git-send-email-chris@chris-wilson.co.uk
Since drm_i915_private is now a subclass of drm_device we do not need to
chase the drm_i915_private->dev backpointer and can instead simply
access drm_i915_private->drm directly.
text data bss dec hex filename
1068757 4565 416 1073738 10624a drivers/gpu/drm/i915/i915.ko
1066949 4565 416 1071930 105b3a drivers/gpu/drm/i915/i915.ko
Created by the coccinelle script:
@@
struct drm_i915_private *d;
identifier i;
@@
(
- d->dev->i
+ d->drm.i
|
- d->dev
+ &d->drm
)
and for good measure the dev_priv->dev backpointer was removed entirely.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467711623-2905-4-git-send-email-chris@chris-wilson.co.uk
Some IS_ and HAS_ macros can return any non-zero value for true.
One potential problem with that is that someone could assign
them to integers and be surprised with the result. Therefore it
is probably safer to do the conversion to 0/1 in the macros
themselves.
Luckily this does not seem to have an effect on code size.
Only one call site was getting bit by this and a patch for
that has been sent as "drm/i915/guc: Protect against HAS_GUC_*
returning true values other than one".
v2: Added some extra braces as suggested by checkpatch.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1467643823-9798-1-git-send-email-tvrtko.ursulin@linux.intel.com
igt likes to inject GPU hangs into its command streams. However, as we
expect these hangs, we don't actually want them recorded in the dmesg
output or stored in the i915_error_state (usually). To accommodate this
allow userspace to set a flag on the context that any hang emanating
from that context will not be recorded. We still do the error capture
(otherwise how do we find the guilty context and know its intent?) as
part of the reason for random GPU hang injection is to exercise the race
conditions between the error capture and normal execution.
v2: Split out the request->ringbuf error capture changes.
v3: Move the flag defines next to the intel_context->flags definition
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-9-git-send-email-chris@chris-wilson.co.uk
Now that we have (near) universal GPU recovery code, we can inject a
real hang from userspace and not need any fakery. Not only does this
mean that the testing is far more realistic, but we can simplify the
kernel in the process.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-7-git-send-email-chris@chris-wilson.co.uk
The retire worker is a low frequency task that makes sure we retire
outstanding requests if userspace is being lax. We only need to start it
once as it remains active until the GPU is idle, so do a cheap test
before the more expensive queue_work(). A consequence of this is that we
need correct locking in the worker to make the hot path of request
submission cheap. To keep the symmetry and keep hangcheck strictly bound
by the GPU's wakelock, we move the cancel_sync(hangcheck) to the idle
worker before dropping the wakelock.
v2: Guard against RCU fouling the breadcrumbs bottom-half whilst we kick
the waiter.
v3: Remove the wakeref assertion squelching (now we hold a wakeref for
the hangcheck, any rpm error there is genuine).
v4: To prevent excess work when retiring requests, we split the busy
flag into two, a boolean to denote whether we hold the wakeref and a
bitmask of active engines.
v5: Reorder cancelling hangcheck upon idling to avoid a race where we
might cancel a hangcheck after being preempted by a new task
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
References: https://bugs.freedesktop.org/show_bug.cgi?id=88437
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-1-git-send-email-chris@chris-wilson.co.uk
Under the assumption that enabling signaling will be a frequent
operation, lets preallocate our attachments for signaling inside the
(rather large) request struct (and so benefiting from the slab cache).
v2: Convert from void * to more meaningful names and types.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-17-git-send-email-chris@chris-wilson.co.uk
If we convert the tracing over from direct use of ring->irq_get() and
over to the breadcrumb infrastructure, we only have a single user of the
ring->irq_get and so we will be able to simplify the driver routines
(eliminating the redundant validation and irq refcounting).
Process context is preferred over softirq (or even hardirq) for a couple
of reasons:
- we already utilize process context to have fast wakeup of a single
client (i.e. the client waiting for the GPU inspects the seqno for
itself following an interrupt to avoid the overhead of a context
switch before it returns to userspace)
- engine->irq_seqno() is not suitable for use from an softirq/hardirq
context as we may require long waits (100-250us) to ensure the seqno
write is posted before we read it from the CPU
A signaling framework is a requirement for enabling dma-fences.
v2: Move to a signaling framework based upon the waiter.
v3: Track the first-signal to avoid having to walk the rbtree everytime.
v4: Mark the signaler thread as RT priority to reduce latency in the
indirect wakeups.
v5: Make failure to allocate the thread fatal.
v6: Rename kthreads to i915/signal:%u
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-16-git-send-email-chris@chris-wilson.co.uk
If we flag the seqno as potentially stale upon receiving an interrupt,
we can use that information to reduce the frequency that we apply the
heavyweight coherent seqno read (i.e. if we wake up a chain of waiters).
v2: Use cmpxchg to replace READ_ONCE/WRITE_ONCE for more explicit
control of the ordering wrt to interrupt generation and interrupt
checking in the bottom-half.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-14-git-send-email-chris@chris-wilson.co.uk
If we have multiple waiters, we may find that many complete on the same
wake up. If we first inspect the seqno from the CPU cache, we may reduce
the number of heavyweight coherent seqno reads we require.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-13-git-send-email-chris@chris-wilson.co.uk
By using the same address for storing the HWS on every platform, we can
remove the platform specific vfuncs and reduce the get-seqno routine to
a single read of a cached memory location.
v2: Fix semaphore_passed() to look at the signaling engine (not the
waiter's)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-8-git-send-email-chris@chris-wilson.co.uk
When waiting for an interrupt (waiting for the engine to complete some
work), we know we are the only waiter to be woken on this engine. We also
know when the GPU has nearly completed our request (or at least started
processing it), so after being woken and we detect that the GPU is
active and working on our request, allow us the bottom-half (the first
waiter who wakes up to handle checking the seqno after the interrupt) to
spin for a very short while to reduce client latencies.
The impact is minimal, there was an improvement to the realtime-vs-many
clients case, but exporting the function proves useful later. However,
it is tempting to adjust irq_seqno_barrier to include the spin. The
problem is first ensuring that the "start-of-request" seqno is coherent
as we use that as our basis for judging when it is ok to spin. If we
could, spinning there could dramatically shorten some sleeps, and allow
us to make the barriers more conservative to handle missed seqno writes
on more platforms (all gen7+ are known to have the occasional issue, at
least).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-7-git-send-email-chris@chris-wilson.co.uk
One particularly stressful scenario consists of many independent tasks
all competing for GPU time and waiting upon the results (e.g. realtime
transcoding of many, many streams). One bottleneck in particular is that
each client waits on its own results, but every client is woken up after
every batchbuffer - hence the thunder of hooves as then every client must
do its heavyweight dance to read a coherent seqno to see if it is the
lucky one.
Ideally, we only want one client to wake up after the interrupt and
check its request for completion. Since the requests must retire in
order, we can select the first client on the oldest request to be woken.
Once that client has completed his wait, we can then wake up the
next client and so on. However, all clients then incur latency as every
process in the chain may be delayed for scheduling - this may also then
cause some priority inversion. To reduce the latency, when a client
is added or removed from the list, we scan the tree for completed
seqno and wake up all the completed waiters in parallel.
Using igt/benchmarks/gem_latency, we can demonstrate this effect. The
benchmark measures the number of GPU cycles between completion of a
batch and the client waking up from a call to wait-ioctl. With many
concurrent waiters, with each on a different request, we observe that
the wakeup latency before the patch scales nearly linearly with the
number of waiters (before external factors kick in making the scaling much
worse). After applying the patch, we can see that only the single waiter
for the request is being woken up, providing a constant wakeup latency
for every operation. However, the situation is not quite as rosy for
many waiters on the same request, though to the best of my knowledge this
is much less likely in practice. Here, we can observe that the
concurrent waiters incur extra latency from being woken up by the
solitary bottom-half, rather than directly by the interrupt. This
appears to be scheduler induced (having discounted adverse effects from
having a rbtree walk/erase in the wakeup path), each additional
wake_up_process() costs approximately 1us on big core. Another effect of
performing the secondary wakeups from the first bottom-half is the
incurred delay this imposes on high priority threads - rather than
immediately returning to userspace and leaving the interrupt handler to
wake the others.
To offset the delay incurred with additional waiters on a request, we
could use a hybrid scheme that did a quick read in the interrupt handler
and dequeued all the completed waiters (incurring the overhead in the
interrupt handler, not the best plan either as we then incur GPU
submission latency) but we would still have to wake up the bottom-half
every time to do the heavyweight slow read. Or we could only kick the
waiters on the seqno with the same priority as the current task (i.e. in
the realtime waiter scenario, only it is woken up immediately by the
interrupt and simply queues the next waiter before returning to userspace,
minimising its delay at the expense of the chain, and also reducing
contention on its scheduler runqueue). This is effective at avoid long
pauses in the interrupt handler and at avoiding the extra latency in
realtime/high-priority waiters.
v2: Convert from a kworker per engine into a dedicated kthread for the
bottom-half.
v3: Rename request members and tweak comments.
v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
v5: Fix race in locklessly checking waiter status and kicking the task on
adding a new waiter.
v6: Fix deciding when to force the timer to hide missing interrupts.
v7: Move the bottom-half from the kthread to the first client process.
v8: Reword a few comments
v9: Break the busy loop when the interrupt is unmasked or has fired.
v10: Comments, unnecessary churn, better debugging from Tvrtko
v11: Wake all completed waiters on removing the current bottom-half to
reduce the latency of waking up a herd of clients all waiting on the
same request.
v12: Rearrange missed-interrupt fault injection so that it works with
igt/drv_missed_irq_hang
v13: Rename intel_breadcrumb and friends to intel_wait in preparation
for signal handling.
v14: RCU commentary, assert_spin_locked
v15: Hide BUG_ON behind the compiler; report on gem_latency findings.
v16: Sort seqno-groups by priority so that first-waiter has the highest
task priority (and so avoid priority inversion).
v17: Add waiters to post-mortem GPU hang state.
v18: Return early for a completed wait after acquiring the spinlock.
Avoids adding ourselves to the tree if the is already complete, and
skips the awkward question of why we don't do completion wakeups for
waits earlier than or equal to ourselves.
v19: Prepare for init_breadcrumbs to fail. Later patches may want to
allocate during init, so be prepared to propagate back the error code.
Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_latency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk
Currently __i915_wait_request uses a per-engine wait_queue_t for the dual
purpose of waking after the GPU advances or for waking after an error.
In the future, we may add even more wake sources and require greater
separation, but for now we can conceptually simplify wakeups by separating
the two sources. In particular, this allows us to use different wait-queues
(e.g. one on the engine advancement, a global one for errors and one on
each requests) without any hassle.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-5-git-send-email-chris@chris-wilson.co.uk
The queue only ever contains at most one item and has no special flags.
It is just a very simple wrapper around the system-wq - a complication
with no benefits.
v2: Use the system_long_wq as we may wish to capture the error state
after detecting the hang - which may take a bit of time.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-3-git-send-email-chris@chris-wilson.co.uk
Ville Syrjälä reported that in the majority of wait_for(I915_READ()) he
inspect, most completed within the first couple of reads and that the
delay between those wait_for() reads was the ratelimiting step for many
code paths. For example, __gen6_update_ring_freq() was blamed for
slowing down boot by many milliseconds, but under Ville's scrutiny the
issue was just excessive delay waiting for sandybridge_pcode_write().
We can eliminate the wait by initially using a busyspin upon the register
read and only fallback to the sleeping loop in cases where the hardware
is indeed too slow. A threshold of 2 microseconds is used as the initial
ballpark.
To avoid excessive code bloating from converting every wait_for() into a
hybrid busy/sleep loop, we extend wait_for_register_fw() and export it
for use by other callers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467297225-21379-1-git-send-email-chris@chris-wilson.co.uk
Since commit 2ed53a94d8 ("drm/i915: On GPU reset, set the HWS
breadcrumb to the last seqno") once a hang is completed, the seqno is
advanced past all current requests. With this we know that if we wake up
from waiting for a request, if a hang has occurred and reset completed,
our request will be considered complete (i.e.
i915_gem_request_completed() returns true). Therefore we only need to
worry about the situation where a hang has occurred, but not yet reset,
where we may need to release our struct_mutex. Since we don't need to
detect the completed reset using the global gpu_error->reset_counter
anymore, we do not need to track the reset_counter epoch inside the
request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467211874-11552-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Fix build errors when ACPI is not enabled by adding function stubs:
../drivers/gpu/drm/i915/i915_drv.c: In function 'i915_drm_suspend':
../drivers/gpu/drm/i915/i915_drv.c:635:2: error: implicit declaration of function 'intel_opregion_unregister' [-Werror=implicit-function-declaration]
intel_opregion_unregister(dev_priv);
../drivers/gpu/drm/i915/i915_drv.c: In function 'i915_drm_resume':
../drivers/gpu/drm/i915/i915_drv.c:798:2: error: implicit declaration of function 'intel_opregion_register' [-Werror=implicit-function-declaration]
intel_opregion_register(dev_priv);
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Fixes: 03d92e4779 ("drm/i915/opregion: Rename init/fini functions to register/unregister")
Cc: drm-intel-fixes@lists.freedesktop.org
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[Jani: dropped the stale init/fini declarations]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467028399-9965-1-git-send-email-jani.nikula@intel.com
We only need to force a switch to the kernel context placeholder during
eviction. All other uses of i915_gpu_idle() just want to wait until
existing work on the GPU is idle. Rename i915_gpu_idle() to
i915_gem_wait_for_idle() to avoid any implications about "parking" the
context first.
v2: Tweak an error message if the wait fails for the ilk vtd w/a
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466776558-21516-6-git-send-email-chris@chris-wilson.co.uk
This is so that we have symmetry with intel_lrc.c and avoid a source of
if (i915.enable_execlists) layering violation within i915_gem_context.c -
that is we move the specific handling of the dev_priv->kernel_context
for legacy submission into the legacy submission code.
This depends upon the init/fini ordering between contexts and engines
already defined by intel_lrc.c, and also exporting the context alignment
required for pinning the legacy context.
v2: Separate out pin/unpin context funcs for greater symmetry with
intel_lrc. One more step towards unifying behaviour between the two
classes of engines and towards fixing another bug in i915_switch_context
vs requests.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466776558-21516-2-git-send-email-chris@chris-wilson.co.uk
i915_dma.c used to contain the DRI1/UMS horror show, but now all that
remains are the out-of-place driver level interfaces (such as
allocating, initialising and registering the driver). These should be in
i915_drv.c alongside similar routines for suspend/resume.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466773227-7994-10-git-send-email-chris@chris-wilson.co.uk
drm_connector_register_all() is now automatically called by
drm_dev_register(), and so we no longer have to do so ourselves (via
intel_modeset_register() after calling drm_dev_register()). Similarly
for unregistering.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1466773227-7994-8-git-send-email-chris@chris-wilson.co.uk
Take control over allocating, loading and registering the driver from the
DRM midlayer by performing it manually from i915_pci_probe. This allows
us to carefully control the order of when we setup the hardware vs when
it becomes visible to third parties (including userspace). The current
ordering makes the driver visible to userspace first (in order to
coordinate with removed DRI1 userspace), but that ordering incurs risk.
The risk increases as we strive for more asynchronous loading.
One side effect of controlling the allocation is that we can allocate
both the drm_device + drm_i915_private in one block, the next step
towards subclassing.
Unload is still left as before, a mix of midlayer and driver.
v2: After drm_dev_init(), we should call drm_dev_unref() so that we call
drm_dev_release() and free everything from drm_dev_init().
v3: Fixup missed error code for failing to allocate dev_priv
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1466773227-7994-6-git-send-email-chris@chris-wilson.co.uk
Currently debugfs files are created before the driver is even loads.
This gives the opportunity for userspace to open that interface and poke
around before the backing data structures are initialised - with the
possibility of oopsing or worse.
Move the creation of the debugfs files to our registration phase, where
we announce our presence to the world when we are ready, i.e the
sequence changes from
drm_dev_register()
-> drm_minor_register()
-> drm_debugfs_init()
-> i915_debugfs_init()
-> i915_driver_load()
to
drm_dev_register()
-> drm_minor_register()
-> drm_debugfs_init()
-> i915_driver_load()
-> i915_debugfs_register()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1466773227-7994-5-git-send-email-chris@chris-wilson.co.uk
Currently the backlight is being registered in the load phase (before
the display and its objects are registered). Move the backlight
registration into the analogous phase by performing it from the
connector registration, just after its creation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1466773227-7994-3-git-send-email-chris@chris-wilson.co.uk
Effectively removes one layer of indirection between the mask of
possible engines and the engine constructors. Instead of spelling
out in code the mapping of HAS_<engine> to constructors, makes
more use of the recently added data driven approach by putting
engine constructor vfuncs into the table as well.
Effect is fewer lines of source and smaller binary.
At the same time simplify the error handling since engine
destructors can run on unitialized engines anyway.
Similar approach could be done for legacy submission is wanted.
v2: Removed ugly BUILD_BUG_ONs in favour of newly introduced
ENGINE_MASK and HAS_ENGINE macros.
Also removed the forward declarations by shuffling functions
around.
v3: Warn when logical_rings table does not contain enough data
and disable the engines which could not be initialized.
(Chris Wilson)
v4: Chris Wilson suggested a nicer engine init loop.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1466689961-23232-1-git-send-email-tvrtko.ursulin@linux.intel.com
Backmerge drm-next for the reworked device register/unregistering.
Chris Wilson needs that to be able to land his i915 load/unload
demidlayering.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
- Infrastructure for GVT-g (paravirtualized gpu on gen8+), from Zhi Wang
- another attemp at nonblocking atomic plane updates
- bugfixes and refactoring for GuC doorbell code (Dave Gordon)
- GuC command submission enabled by default, if fw available (Dave Gordon)
- more bxt w/a (Arun Siluvery)
- bxt phy improvements (Imre Deak)
- prep work for stolen objects support (Ankitprasa Sharma & Chris Wilson)
- skl/bkl w/a update from Mika Kuoppala
- bunch of small improvements and fixes all over, as usual
* tag 'drm-intel-next-2016-06-20' of git://anongit.freedesktop.org/drm-intel: (81 commits)
drm/i915: Update DRIVER_DATE to 20160620
drm/i915: Introduce GVT context creation API
drm/i915: Support LRC context single submission
drm/i915: Introduce execlist context status change notification
drm/i915: Make addressing mode bits in context descriptor configurable
drm/i915: Make ring buffer size of a LRC context configurable
drm/i915: gvt: Introduce the basic architecture of GVT-g
drm/i915: Fold vGPU active check into inner functions
drm/i915: Use offsetof() to calculate the offset of members in PVINFO page
drm/i915: Factor out i915_pvinfo.h
drm/i915: Serialise presentation with imported dmabufs
drm/i915: Use atomic commits for legacy page_flips
drm/i915: Move fb_bits updating later in atomic_commit
drm/i915: nonblocking commit
Reapply "drm/i915: Pass atomic states to fbc update, functions."
drm/i915: Roll out the helper nonblock tracking
drm/i915: Signal drm events for atomic
drm/i915/ilk: Don't disable SSC source if it's in use
drm/i915/guc: (re)initialise doorbell h/w when enabling GuC submission
drm/i915/guc: replace assign_doorbell() with select_doorbell_register()
...
Also extract drm_auth.h for nicer grouping.
v2: Nuke the other comments since they don't really explain a lot, and
within the drm core we generally only document functions exported to
drivers: The main audience for these docs are driver writers.
v3: Limit the exposure of drm_master internals by only including
drm_auth.h where it is neede (Chris).
v4: Spelling polish (Emil).
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
No need for local struct drm_device * since dev_priv is the
correct thing to pass in to NEEDS_WaRsDisableCoarsePowerGating
anyway. Changed the macro definition for the latter to reflect
that as well.
v2: Alignment bikeshed.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1466518034-24838-1-git-send-email-tvrtko.ursulin@linux.intel.com
... instead of the previous ORIGIN_GTT. This should actually
invalidate FBC once something is written on the frontbuffer using WC
mmaps. The problem with ORIGIN_GTT is that the automatic hardware
tracking is not able to detect the WC writes as it can detect the GTT
writes.
This should help fix the SKL bug where nothing happens when you type
your username/password on lightdm.
This patch was originally pasted on an email by Chris and converted to
an actual git patch by Paulo.
v2 (from Paulo):
- Make it a full variable instead of a bit-field (Daniel)
- Use WRITE_ONCE (Chris)
v3 (from Paulo):
- Remove huge comment since now we have WRITE_ONCE (Chris)
- Remove uneeded new line (Chris)
- Add Chris' Signed-off-by, authorized via IRC
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466185599-26401-1-git-send-email-paulo.r.zanoni@intel.com
Currently to see if an object is backed by struct pages (as opposed to
being a simple pointer to stolen memory, for example) we do a manual
check on the obj->ops->flags. This is quite shouty and before adding
more checks in future, we should make it a bit calmer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466431552-17860-1-git-send-email-chris@chris-wilson.co.uk
We now have a connector->func that serves the same purpose as our own
intel_connector->unregister vfunc allowing us to unwrap ourselves and
use drm_connector_register() (and friends) as the central function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1466160034-12173-2-git-send-email-chris@chris-wilson.co.uk
GVT workload scheduler needs special host LRC contexts, the so called
"shadow LRC context" to submit guest workload to host i915. During the
guest workload submission, workload scheduler fills the shadow LRC
context with the content of guest LRC context: engine context is copied
without changes, ring context is mostly owned by host i915.
v8:
- Remove the graph temporarily. (Chris)
- Use interruptible mutex_lock. (Chris)
- Rename the function name of creating a GVT context. (Chris)
- Add the missing declaration in i915_drv.h (Chris)
v7:
- Move chart to a better place. (Joonas)
v6:
- Make GVT code as dead code when !CONFIG_DRM_I915_GVT. (Chris)
v5:
- Only compile this feature when CONFIG_DRM_I915_GVT is enabled. (Tvrtko)
- Rebase the code into new repo.
- Add a comment about the ring buffer size. (Joonas)
v2:
Mostly based on Daniel's idea. Call the refactored core logic of GEM
context creation service and LRC context creation service to create the GVT
context.
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1466078825-6662-10-git-send-email-zhi.a.wang@intel.com
This patch introduces the support of LRC context single submission.
As GVT context may come from different guests, which require different
configuration of render registers. It can't be combined into a dual ELSP
submission combo.
Only GVT-g will create this kinds of GEM context currently.
v8:
- Rename the data member in struct i915_gem_context. (Chris)
v7:
- Fix typos in commit message. (Joonas)
v6:
- Make GVT code as dead code when !CONFIG_DRM_I915_GVT. (Chris)
v5:
- Only compile this feature when CONFIG_DRM_I915_GVT=y. (Tvrtko)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1466078825-6662-9-git-send-email-zhi.a.wang@intel.com
This patch introduces an approach to track the execlist context status
change.
GVT-g uses GVT context as the "shadow context". The content inside GVT
context will be copied back to guest after the context is idle. And GVT-g
has to know the status of the execlist context.
This function is configurable when creating a new GEM context. Currently,
Only GVT-g will create the "status-change-notification" enabled GEM
context.
v10:
- Fix the identation. (Joonas)
v8:
- Remove the boolean flag in struct i915_gem_context. (Joonas)
v7:
- Remove per-engine ctx status notifiers. Use one status notifier for all
engines. (Joonas)
- Add prefix "INTEL_" for related definitions. (Joonas)
- Refine the comments in execlists_context_status_change(). (Joonas)
v6:
- When !CONFIG_DRM_I915_GVT, make GVT code as dead code then compiler
could automatically eliminate them for us. (Chris)
- Always initialize the notifier header, so it could be switched on/off
at runtime. (Chris)
v5:
- Only compile this feature when CONFIG_DRM_I915_GVT is enabled.(Tvrtko)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v8)
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466078825-6662-8-git-send-email-zhi.a.wang@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Currently the addressing mode bit in context descriptor is statically
generated from the configuration of system-wide PPGTT usage model.
GVT-g will load the PPGTT shadow page table by itself and probably one
guest is using a different addressing mode with i915 host. The addressing
mode bits of a LRC context should be configurable under this case.
v10:
- Fix the identation. (Joonas)
v9:
- Rename the data member in struct i915_gem_context. (Chris)
v8:
- Rename the data member in struct i915_gem_context. (Chris)
v7:
- Move context addressing mode bit into i915_reg.h. (Joonas/Chris)
- Add prefix "INTEL_" for related definitions. (Joonas)
v6:
- Directly save the addressing mode bits inside i915_gem_context. (Chris)
- Move the LRC context addressing mode bits into intel_lrc.h. (Chris)
v5:
- Change USES_FULL_48BIT(dev) to USES_FULL_48BIT(dev_priv) (Tvrtko)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v9)
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1466078825-6662-7-git-send-email-zhi.a.wang@intel.com
This patch introduces an option for configuring the ring buffer size
of a LRC context after the context creation.
v9:
- Fix an identation issue. (Chris)
v8:
- Rename the data member in i915_gem_context. (Chris)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1466078825-6662-6-git-send-email-zhi.a.wang@intel.com
This patch introduces the very basic framework of GVT-g device model,
includes basic prototypes, definitions, initialization.
v12:
- Call intel_gvt_init() in driver early initialization stage. (Chris)
v8:
- Remove the GVT idr and mutex in intel_gvt_host. (Joonas)
v7:
- Refine the URL link in Kconfig. (Joonas)
- Refine the introduction of GVT-g host support in Kconfig. (Joonas)
- Remove the macro GVT_ALIGN(), use round_down() instead. (Joonas)
- Make "struct intel_gvt" a data member in struct drm_i915_private.(Joonas)
- Remove {alloc, free}_gvt_device()
- Rename intel_gvt_{create, destroy}_gvt_device()
- Expost intel_gvt_init_host()
- Remove the dummy "struct intel_gvt" declaration in intel_gvt.h (Joonas)
v6:
- Refine introduction in Kconfig. (Chris)
- The exposed API functions will take struct intel_gvt * instead of
void *. (Chris/Tvrtko)
- Remove most memebers of strct intel_gvt_device_info. Will add them
in the device model patches.(Chris)
- Remove gvt_info() and gvt_err() in debug.h. (Chris)
- Move GVT kernel parameter into i915_params. (Chris)
- Remove include/drm/i915_gvt.h, as GVT-g will be built within i915.
- Remove the redundant struct i915_gvt *, as the functions in i915
will directly take struct intel_gvt *.
- Add more comments for reviewer.
v5:
Take Tvrtko's comments:
- Fix the misspelled words in Kconfig
- Let functions take drm_i915_private * instead of struct drm_device *
- Remove redundant prints/local varible initialization
v3:
Take Joonas' comments:
- Change file name i915_gvt.* to intel_gvt.*
- Move GVT kernel parameter into intel_gvt.c
- Remove redundant debug macros
- Change error handling style
- Add introductions for some stub functions
- Introduce drm/i915_gvt.h.
Take Kevin's comments:
- Move GVT-g host/guest check into intel_vgt_balloon in i915_gem_gtt.c
v2:
- Introduce i915_gvt.c.
It's necessary to introduce the stubs between i915 driver and GVT-g host,
as GVT-g components is configurable in kernel config. When disabled, the
stubs here do nothing.
Take Joonas' comments:
- Replace boolean return value with int.
- Replace customized info/warn/debug macros with DRM macros.
- Document all non-static functions like i915.
- Remove empty and unused functions.
- Replace magic number with marcos.
- Set GVT-g in kernel config to "n" by default.
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466078825-6662-5-git-send-email-zhi.a.wang@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This mode allows to assign EUs to pools which can process work collectively.
The command to enable this mode should be issued as part of context initialization.
The pooled mode is global, once enabled it has to stay the same across all
contexts until HW reset hence this is sent in auxiliary golden context batch.
Thanks to Mika for the preliminary review and comments.
v2: explain why this is enabled in golden context, use feature flag while
enabling the support (Chris)
v3: Include only kernel support as userspace support is not available yet.
User space clients need to know when the pooled EU feature is present
and enabled on the hardware so that they can adapt work submissions.
Create a new device info flag for this purpose.
Set has_pooled_eu to true in the Broxton static device info - Broxton
supports the feature in hardware and the driver will enable it by
default.
We need to add getparam ioctls to enable userspace to query availability of
this feature and to retrieve min. no of eus in a pool but we will expose
them once userspace support is available. Opensource users for this feature
are mesa, libva and beignet.
Beignet team is currently working on adding userspace support.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Cc: Winiarski, Michal <michal.winiarski@intel.com>
Cc: Zou, Nanhai <nanhai.zou@intel.com>
Cc: Yang, Rong R <rong.r.yang@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Armin Reese <armin.c.reese@intel.com>
Cc: Tim Gore <tim.gore@intel.com>
Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
This utility function is a companion to i915_gem_object_get_page() that
uses the same cached iterator for the scatterlist to perform fast
sequential lookup of the dma address associated with any page within the
object.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Apparently some CHV boards failed to hook up the port presence straps
for HDMI ports as well (earlier we assumed this problem only affected
eDP ports). So let's check the VBT in addition to the strap, and if
either one claims that the port is present go ahead and register the
relevant connector.
While at it, change port D to register DP before HDMI as we do for ports
B and C since
commit 457c52d87e ("drm/i915: Only ignore eDP ports that are connected")
Also print a debug message when we register a HDMI connector to aid
in diagnosing missing/incorrect ports. We already had such a print for
DP/eDP.
v2: Improve the comment in the code a bit, note the port D change in
the commit message
Cc: Radoslav Duda <radosd@radosd.com>
Tested-by: Radoslav Duda <radosd@radosd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96321
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464945463-14364-1-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Git got absolutely destroyed with all our cherry-picking from
drm-intel-next-queued to various branches. It ended up inserting
intel_crtc_page_flip 2x even in intel_display.c.
Backmerge to get back to sanity.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
drm-intel-next-2016-05-22:
- cmd-parser support for direct reg->reg loads (Ken Graunke)
- better handle DP++ smart dongles (Ville)
- bxt guc fw loading support (Nick Hoathe)
- remove a bunch of struct typedefs from dpll code (Ander)
- tons of small work all over to avoid casting between drm_device and the i915
dev struct (Tvrtko&Chris)
- untangle request retiring from other operations, also fixes reset stat corner
cases (Chris)
- skl atomic watermark support from Matt Roper, yay!
- various wm handling bugfixes from Ville
- big pile of cdclck rework for bxt/skl (Ville)
- CABC (Content Adaptive Brigthness Control) for dsi panels (Jani&Deepak M)
- nonblocking atomic commits for plane-only updates (Maarten Lankhorst)
- bunch of PSR fixes&improvements
- untangle our map/pin/sg_iter code a bit (Dave Gordon)
drm-intel-next-2016-05-08:
- refactor stolen quirks to share code between early quirks and i915 (Joonas)
- refactor gem BO/vma funcstion (Tvrtko&Dave)
- backlight over DPCD support (Yetunde Abedisi)
- more dsi panel sequence support (Jani)
- lots of refactoring around handling iomaps, vma, ring access and related
topics culmulating in removing the duplicated request tracking in the execlist
code (Chris & Tvrtko) includes a small patch for core iomapping code
- hw state readout for bxt dsi (Ramalingam C)
- cdclk cleanups (Ville)
- dedupe chv pll code a bit (Ander)
- enable semaphores on gen8+ for legacy submission, to be able to have a direct
comparison against execlist on the same platform (Chris) Not meant to be used
for anything else but performance tuning
- lvds border bit hw state checker fix (Jani)
- rpm vs. shrinker/oom-notifier fixes (Praveen Paneri)
- l3 tuning (Imre)
- revert mst dp audio, it's totally non-functional and crash-y (Lyude)
- first official dmc for kbl (Rodrigo)
- and tons of small things all over as usual
* 'drm-intel-next' of git://anongit.freedesktop.org/drm-intel: (194 commits)
drm/i915: Revert async unpin and nonblocking atomic commit
drm/i915: Update DRIVER_DATE to 20160522
drm/i915: Inline sg_next() for the optimised SGL iterator
drm/i915: Introduce & use new lightweight SGL iterators
drm/i915: optimise i915_gem_object_map() for small objects
drm/i915: refactor i915_gem_object_pin_map()
drm/i915/psr: Implement PSR2 w/a for gen9
drm/i915/psr: Use ->get_aux_send_ctl functions
drm/i915/psr: Order DP aux transactions correctly
drm/i915/psr: Make idle_frames sensible again
drm/i915/psr: Try to program link training times correctly
drm/i915/userptr: Convert to drm_i915_private
drm/i915: Allow nonblocking update of pageflips.
drm/i915: Check for unpin correctness.
Reapply "drm/i915: Avoid stalling on pending flips for legacy cursor updates"
drm/i915: Make unpin async.
drm/i915: Prepare connectors for nonblocking checks.
drm/i915: Pass atomic states to fbc update functions.
drm/i915: Remove reset_counter from intel_crtc.
drm/i915: Remove queue_flip pointer.
...
This reverts the following patches:
d55dbd06bb drm/i915: Allow nonblocking update of pageflips.
15c86bdb76 drm/i915: Check for unpin correctness.
95c2ccdc82 Reapply "drm/i915: Avoid stalling on pending flips for legacy cursor updates"
a6747b7304 drm/i915: Make unpin async.
03f476e1fc drm/i915: Prepare connectors for nonblocking checks.
2099deffef drm/i915: Pass atomic states to fbc update functions.
ee7171af72 drm/i915: Remove reset_counter from intel_crtc.
2ee004f7c5 drm/i915: Remove queue_flip pointer.
b8d2afae55 drm/i915: Remove use_mmio_flip kernel parameter.
8dd634d922 drm/i915: Remove cs based page flip support.
143f73b3bf drm/i915: Rework intel_crtc_page_flip to be almost atomic, v3.
84fc494b64 drm/i915: Add the exclusive fence to plane_state.
6885843ae1 drm/i915: Convert flip_work to a list.
aa420ddd8e drm/i915: Allow mmio updates on all platforms, v2.
afee4d8707 Revert "drm/i915: Avoid stalling on pending flips for legacy cursor updates"
"drm/i915: Allow nonblocking update of pageflips" should have been
split up, misses a proper commit message and seems to cause issues in
the legacy page_flip path as demonstrated by kms_flip.
"drm/i915: Make unpin async" doesn't handle the unthrottled cursor
updates correctly, leading to an apparent pin count leak. This is
caught by the WARN_ON in i915_gem_object_do_pin which screams if we
have more than DRM_I915_GEM_OBJECT_MAX_PIN_COUNT pins.
Unfortuantely we can't just revert these two because this patch series
came with a built-in bisect breakage in the form of temporarily
removing the unthrottled cursor update hack for legacy cursor ioctl.
Therefore there's no other option than to revert the entire pile :(
There's one tiny conflict in intel_drv.h due to other patches, nothing
serious.
Normally I'd wait a bit longer with doing a maintainer revert, but
since the minimal set of patches we need to revert (due to the bisect
breakage) is so big, time is running out fast. And very soon
(especially after a few attempts at fixing issues) it'll be really
hard to revert things cleanly.
Lessons learned:
- Not a good idea to rush the review (done by someone fairly new to
the area) and not make sure domain experts had a chance to read it.
- Patches should be properly split up. I only looked at the two
patches that should be reverted in detail, but both look like the
mix up different things in one patch.
- Patches really should have proper commit messages. Especially when
doing more than one thing, and especially when touching critical and
tricky core code.
- Building a patch series and r-b stamping it when it has a built-in
bisect breakage is not a good idea.
- I also think we need to stop building up technical debt by
postponing atomic igt testcases even longer. I think it's clear that
there's enough corner cases in this beast that we really need to
have the testcases _before_ the next step lands.
(cherry picked from commit 5a21b6650a
from drm-intel-next-queeud)
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
On Loading, GuC sets PM interrupts routing (bit 31) and clears ARAT
expired interrupt (bit 9). Host turbo also updates this register
in RPS flows. This patch ensures bit 31 and bit 9 setup by GuC persists.
ARAT timer interrupt is needed in GuC for various features. It also
facilitates halting GuC and hence achieving RC6. PM interrupt routing
will not impact RPS interrupt reception by host as GuC will redirect
them.
This patch fixes igt test pm_rc6_residency that was failing with guc
load/submission enabled. Tested with SKL GuC v6.1 and BXT GuC v5.1 and v8.7.
v2: i915_irq/i915_pm decoupling from intel_guc. (ChrisW)
v3: restructuring the mask update and rebase w.r.t Ville's patch. (ChrisW)
v4: Updating the pm_intr_keep during direct_interrupts_to_guc. (Sagar)
Cc: Chris Harris <chris.harris@intel.com>
Cc: Zhe Wang <zhe1.wang@intel.com>
Cc: Deepak S <deepak.s@intel.com>
Cc: Satyanantha, Rama Gopal M <rama.gopal.m.satyanantha@intel.com>
Cc: Akash Goel <akash.goel@intel.com>
Testcase: igt/pm_rc6_residency
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Tested-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464683307-19475-1-git-send-email-sagar.a.kamble@intel.com
I see the main drm pull got merged, here's the first batch of fixes for
4.7 already. Fixes all around, a large portion cc: stable stuff.
[airlied: the DP++ stuff is a regression fix].
* tag 'drm-intel-next-fixes-2016-05-25' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Stop automatically retiring requests after a GPU hang
drm/i915: Unify intel_ring_begin()
drm/i915: Ignore stale wm register values on resume on ilk-bdw (v2)
drm/i915/psr: Try to program link training times correctly
drm/i915/bxt: Adjusting the error in horizontal timings retrieval
drm/i915: Don't leave old junk in ilk active watermarks on readout
drm/i915: s/DPPL/DPLL/ for SKL DPLLs
drm/i915: Fix gen8 semaphores id for legacy mode
drm/i915: Set crtc_state->lane_count for HDMI
drm/i915/BXT: Retrieving the horizontal timing for DSI
drm/i915: Protect gen7 irq_seqno_barrier with uncore lock
drm/i915: Re-enable GGTT earlier during resume on pre-gen6 platforms
drm/i915: Determine DP++ type 1 DVI adaptor presence based on VBT
drm/i915: Enable/disable TMDS output buffers in DP++ adaptor as needed
drm/i915: Respect DP++ adaptor TMDS clock limit
drm: Add helper for DP++ adaptors
This reverts the following patches:
d55dbd06bb drm/i915: Allow nonblocking update of pageflips.
15c86bdb76 drm/i915: Check for unpin correctness.
95c2ccdc82 Reapply "drm/i915: Avoid stalling on pending flips for legacy cursor updates"
a6747b7304 drm/i915: Make unpin async.
03f476e1fc drm/i915: Prepare connectors for nonblocking checks.
2099deffef drm/i915: Pass atomic states to fbc update functions.
ee7171af72 drm/i915: Remove reset_counter from intel_crtc.
2ee004f7c5 drm/i915: Remove queue_flip pointer.
b8d2afae55 drm/i915: Remove use_mmio_flip kernel parameter.
8dd634d922 drm/i915: Remove cs based page flip support.
143f73b3bf drm/i915: Rework intel_crtc_page_flip to be almost atomic, v3.
84fc494b64 drm/i915: Add the exclusive fence to plane_state.
6885843ae1 drm/i915: Convert flip_work to a list.
aa420ddd8e drm/i915: Allow mmio updates on all platforms, v2.
afee4d8707 Revert "drm/i915: Avoid stalling on pending flips for legacy cursor updates"
"drm/i915: Allow nonblocking update of pageflips" should have been
split up, misses a proper commit message and seems to cause issues in
the legacy page_flip path as demonstrated by kms_flip.
"drm/i915: Make unpin async" doesn't handle the unthrottled cursor
updates correctly, leading to an apparent pin count leak. This is
caught by the WARN_ON in i915_gem_object_do_pin which screams if we
have more than DRM_I915_GEM_OBJECT_MAX_PIN_COUNT pins.
Unfortuantely we can't just revert these two because this patch series
came with a built-in bisect breakage in the form of temporarily
removing the unthrottled cursor update hack for legacy cursor ioctl.
Therefore there's no other option than to revert the entire pile :(
There's one tiny conflict in intel_drv.h due to other patches, nothing
serious.
Normally I'd wait a bit longer with doing a maintainer revert, but
since the minimal set of patches we need to revert (due to the bisect
breakage) is so big, time is running out fast. And very soon
(especially after a few attempts at fixing issues) it'll be really
hard to revert things cleanly.
Lessons learned:
- Not a good idea to rush the review (done by someone fairly new to
the area) and not make sure domain experts had a chance to read it.
- Patches should be properly split up. I only looked at the two
patches that should be reverted in detail, but both look like the
mix up different things in one patch.
- Patches really should have proper commit messages. Especially when
doing more than one thing, and especially when touching critical and
tricky core code.
- Building a patch series and r-b stamping it when it has a built-in
bisect breakage is not a good idea.
- I also think we need to stop building up technical debt by
postponing atomic igt testcases even longer. I think it's clear that
there's enough corner cases in this beast that we really need to
have the testcases _before_ the next step lands.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
struct intel_context contains two substructs, one for the legacy RCS and
one for every execlists engine. Since legacy RCS is a subset of the
execlists engine support, just combine the two substructs.
v2: Only pin the default context for legacy mode (the object only exists
for legacy, but adding i915.enable_execlists provides symmetry with the
cleanup functions).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464098023-3294-8-git-send-email-chris@chris-wilson.co.uk
Just move the kernel_context member of drm_i915_private next to the
engines it is associated with.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464098023-3294-7-git-send-email-chris@chris-wilson.co.uk
We want to give a name to the currently anonymous per-engine struct
inside the context, so that we can assign it to a local variable and
save clumsy typing. The name we have chosen is intel_context as it
reflects the HW facing portion of the context state (the logical context
state, the registers, the ringbuffer etc).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464098023-3294-4-git-send-email-chris@chris-wilson.co.uk
i915_gem_context_get() is a very simple wrapper around idr_find(), so
simple that it would be smaller to do the lookup inline. Also we use the
verb 'lookup' to return a pointer from a handle, freeing 'get' to imply
obtaining a reference to the context.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464098023-3294-3-git-send-email-chris@chris-wilson.co.uk
Our goal is to rename the anonymous per-engine struct beneath the
current intel_context. However, after a lively debate resolving around
the confusion between intel_context_engine and intel_engine_context, the
realisation is that the two structs target different users. The outer
struct is API / user facing, and so carries the higher level GEM
information. The inner struct is hw facing. Thus we want to name the
inner struct intel_context and the outer one i915_gem_context. As the
first step, we need to rename the current struct:
s/struct intel_context/struct i915_gem_context/
which fits much better with its constructors already conveying the
i915_gem_context prefix!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464098023-3294-1-git-send-email-chris@chris-wilson.co.uk
Pull drm updates from Dave Airlie:
"Here's the main drm pull request for 4.7, it's been a busy one, and
I've been a bit more distracted in real life this merge window. Lots
more ARM drivers, not sure if it'll ever end. I think I've at least
one more coming the next merge window.
But changes are all over the place, support for AMD Polaris GPUs is in
here, some missing GM108 support for nouveau (found in some Lenovos),
a bunch of MST and skylake fixes.
I've also noticed a few fixes from Arnd in my inbox, that I'll try and
get in asap, but I didn't think they should hold this up.
New drivers:
- Hisilicon kirin display driver
- Mediatek MT8173 display driver
- ARC PGU - bitstreamer on Synopsys ARC SDP boards
- Allwinner A13 initial RGB output driver
- Analogix driver for DisplayPort IP found in exynos and rockchip
DRM Core:
- UAPI headers fixes and C++ safety
- DRM connector reference counting
- DisplayID mode parsing for Dell 5K monitors
- Removal of struct_mutex from drivers
- Connector registration cleanups
- MST robustness fixes
- MAINTAINERS updates
- Lockless GEM object freeing
- Generic fbdev deferred IO support
panel:
- Support for a bunch of new panels
i915:
- VBT refactoring
- PLL computation cleanups
- DSI support for BXT
- Color manager support
- More atomic patches
- GEM improvements
- GuC fw loading fixes
- DP detection fixes
- SKL GPU hang fixes
- Lots of BXT fixes
radeon/amdgpu:
- Initial Polaris support
- GPUVM/Scheduler/Clock/Power improvements
- ASYNC pageflip support
- New mesa feature support
nouveau:
- GM108 support
- Power sensor support improvements
- GR init + ucode fixes.
- Use GPU provided topology information
vmwgfx:
- Add host messaging support
gma500:
- Some cleanups and fixes
atmel:
- Bridge support
- Async atomic commit support
fsl-dcu:
- Timing controller for LCD support
- Pixel clock polarity support
rcar-du:
- Misc fixes
exynos:
- Pipeline clock support
- Exynoss4533 SoC support
- HW trigger mode support
- export HDMI_PHY clock
- DECON5433 fixes
- Use generic prime functions
- use DMA mapping APIs
rockchip:
- Lots of little fixes
vc4:
- Render node support
- Gamma ramp support
- DPI output support
msm:
- Mostly cleanups and fixes
- Conversion to generic struct fence
etnaviv:
- Fix for prime buffer handling
- Allow hangcheck to be coalesced with other wakeups
tegra:
- Gamme table size fix"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (1050 commits)
drm/edid: add displayid detailed 1 timings to the modelist. (v1.1)
drm/edid: move displayid validation to it's own function.
drm/displayid: Iterate over all DisplayID blocks
drm/edid: move displayid tiled block parsing into separate function.
drm: Nuke ->vblank_disable_allowed
drm/vmwgfx: Report vmwgfx version to vmware.log
drm/vmwgfx: Add VMWare host messaging capability
drm/vmwgfx: Kill some lockdep warnings
drm/nouveau/gr/gf100-: fix race condition in fecs/gpccs ucode
drm/nouveau/core: recognise GM108 chipsets
drm/nouveau/gr/gm107-: fix touching non-existent ppcs in attrib cb setup
drm/nouveau/gr/gk104-: share implementation of ppc exception init
drm/nouveau/gr/gk104-: move rop_active_fbps init to nonctx
drm/nouveau/bios/pll: check BIT table version before trying to parse it
drm/nouveau/bios/pll: prevent oops when limits table can't be parsed
drm/nouveau/volt/gk104: round up in gk104_volt_set
drm/nouveau/fb/gm200: setup mmu debug buffer registers at init()
drm/nouveau/fb/gk20a,gm20b: setup mmu debug buffer registers at init()
drm/nouveau/fb/gf100-: allocate mmu debug buffers
drm/nouveau/fb: allow chipset-specific actions for oneinit()
...
We'll want to store the cdclk PLL (whatever PLL that is in reality) vco
frequency somewhere on other platforms too, so let's rename the
skl_vco_freq to cdclk_pll.vco, and let's store it in kHz instead of MHz
to match most of the other clocks.
v2: Drop the spurious > vs != change (Imre)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463172100-24715-14-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
Now that skl_vco_freq tracks the actual DPLL0 vco frequency, we'll need
something that keeps track of which vco frequency we want to use in case
the current vco is 0. This would be important across supend/resume since
we'll disable DPLL0 around those parts.
We'll also update our idea of max cdclk/dotclock when the preferred
vco changes. That could happen if out initial guess was wrong, and
later eDP would force us to change it. One issue here could be that
changing the max dotclock could cause our mode list to change during
next time the displays get probed. But I don't see a good way to avoid
that, except perhaps by allowing either vco frequency to be used as
needed. But the docs suggest that such usage wasn't really inteded.
Also need to make sure we don't update our max_cdclk value before we
have a preferred vco value, which means moving that to happen after
the cdclk sanitation.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463172100-24715-9-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
WARNING: Using ChromeOS with an eDP panel and a 4K@60 DP monitor connected
to DDI1 the system will hard hang during a cold boot. Occurs when DDI1
is enabled when the cdclk is less then required. DP connected to DDI2
and HPD on either port works correctly.
Set cdclk based on the max required pixel clock based on VCO
selected. Track boot vco instead of boot cdclk.
The vco is now tracked at the atomic level and all CRTCs updated if
the required vco is changed. Not tested with eDP v1.4 panels that
require 8640 vco due to availability.
V1: initial version
V2: add vco tracking in intel_dp_compute_config(), rename
skl_boot_cdclk.
V3: rebase, V2 feedback not possible as encoders are not aware of
atomic.
V4: track target vco is atomic state. modeset all CRTCs if vco changes
V5: rename atomic variable, cleaner if/else logic, use existing vco if
encoder does not return a new vco value. check_patch.pl cleanup
V6: simplify logic in intel_modeset_checks.
V7: reorder an IF for readability and whitespace fix.
V8: use dev_cdclk for tracking new cdclk during atomic
V9: correctly handle vco 8640 when crtcs==0
V10: Clean up if else in crtcs==0
V11: Rebase for new intel_dpll_mgr.c
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Clint Taylor <clinton.a.taylor@intel.com>
[vsyrjala: rebased due to churn]
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463172100-24715-3-git-send-email-ville.syrjala@linux.intel.com
Current intel_opregion_init is called during the driver registration
phase and intel_opregion_fini from the unregistration phase. Rename the
functions so that this is clear from their names. The phases tell us
what we expect the existing hw state to be, e.g. whether interrupts are
still enabled etc.
It should be noted that the opregion init/fini routines are asymmetric
and this is carried across into their new names. Indeed, their new names
make it even clearer that perhaps all is not well in the opregion
suspend/resume sequence (as well in the module unload).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464012490-30961-2-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Jani Nikula <jani.nikula@linux.intel.com>
Prefer passing struct drm_i915_private to internal interfaces as this
saves us having to dance between drm_device and our native struct. The
savings hare are small (only 70 bytes of unrequired dancing), but
progressive!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464012490-30961-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Jani Nikula <jani.nikula@linux.intel.com>
For now, anything with a GuC requires uCode loading, and then supports
command submission once loaded. But these are logically distinct from
simply "having a GuC", so we need a separate macro for the latter. Then,
various tests should use this new macro rather than HAS_GUC_UCODE() or
testing enable_guc_submission.
v4:
Added a couple more uses of the new macro.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
DP dual mode type 1 DVI adaptors aren't required to implement any
registers, so it's a bit hard to detect them. The best way would
be to check the state of the CONFIG1 pin, but we have no way to
do that. So as a last resort, check the VBT to see if the HDMI
port is in fact a dual mode capable DP port.
v2: Deal with VBT code reorganization
Deal with DRM_DP_DUAL_MODE_UNKNOWN
Reduce DEVICE_TYPE_DP_DUAL_MODE_BITS a bit
Accept both DP and HDMI dvo_port in VBT as my BSW
at least declare its DP port as HDMI :(
v3: Ignore DEVICE_TYPE_NOT_HDMI_OUTPUT (Shashank)
Cc: stable@vger.kernel.org
Cc: Tore Anderson <tore@fud.no>
Reported-by: Tore Anderson <tore@fud.no>
Fixes: 7a0baa6234 ("Revert "drm/i915: Disable 12bpc hdmi for now"")
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Shashank Sharma <shashank.sharma@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462362322-31278-1-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Shashank Sharma <shashank.sharma@intel.com>
(cherry picked from commit d61992565b)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Avoiding the out-of-line call to sg_next() reduces the kernel execution
overhead by 10% in some workloads (for example the Unreal Engine 4 demo
Atlantis on 2GiB GTTs) which are dominated by the cost of inserting PTEs
due to texture thrashing. We can demonstrate this in a microbenchmark
that forces us to rebind the object on every execbuf, where we can
measure a 25% improvement, in the time required to execute an execbuf
requiring a texture to be rebound, for inlining the sg_next() for large
texture sizes.
Benchmark: igt/benchmarks/gem_exec_fault
Benchmark: igt/benchmarks/gem_exec_trace/Atlantis
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1463741647-15666-5-git-send-email-chris@chris-wilson.co.uk
The existing for_each_sg_page() iterator is somewhat heavyweight, and is
limiting i915 driver performance in a few benchmarks. So here we
introduce somewhat lighter weight iterators, primarily for use with GEM
objects or other case where we need only deal with whole aligned pages.
Unlike the old iterator, the new iterators use an internal state
structure which is not intended to be accessed by the caller; instead
each takes as a parameter an output variable which is set before each
iteration. This makes them particularly simple to use :)
One of the new iterators provides the caller with the DMA address of
each page in turn; the other provides the 'struct page' pointer required
by many memory management operations.
Various uses of for_each_sg_page() are then converted to the new macros.
v2: Force inlining of the sg_iter constructor and make the union
anonymous.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1463741647-15666-4-git-send-email-chris@chris-wilson.co.uk
userptr directly only uses drm_device in a single interface where it
meant to use drm_i915_private (everywhere else we have to derive it from
the drm_i915_gem_object and so require going from drm_device).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463671036-3235-1-git-send-email-chris@chris-wilson.co.uk
When creating the hibernation image, the CPU will read the pages of all
objects and thus conflict with our domain tracking. We need to update
our domain tracking to accurately reflect the state on restoration.
v2: Perform the domain tracking inside freeze, before the image is
written, rather than upon restoration.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: David Weinehall <david.weinehall@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463207195-22076-2-git-send-email-chris@chris-wilson.co.uk
We calculate the watermark config into intel_atomic_state and then save
it into dev_priv, but never actually use it from there. This is
left-over from some early ILK-style watermark programming designs that
got changed over time.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463061971-19638-18-git-send-email-matthew.d.roper@intel.com
Calculate the DDB blocks needed to satisfy the current atomic
transaction at atomic check time. This is a prerequisite to calculating
SKL watermarks during the 'check' phase and rejecting any configurations
that we can't find valid watermarks for.
Due to the nature of DDB allocation, it's possible for the addition of a
new CRTC to make the watermark configuration already in use on another,
unchanged CRTC become invalid. A change in which CRTC's are active
triggers a recompute of the entire DDB, which unfortunately means we
need to disallow any other atomic commits from racing with such an
update. If the active CRTC's change, we need to grab the lock on all
CRTC's and run all CRTC's through their 'check' handler to recompute and
re-check their per-CRTC DDB allocations.
Note that with this patch we only compute the DDB allocation but we
don't actually use the computed values during watermark programming yet.
For ease of review/testing/bisecting, we still recompute the DDB at
watermark programming time and just WARN() if it doesn't match the
precomputed values. A future patch will switch over to using the
precomputed values once we're sure they're being properly computed.
Another clarifying note: DDB allocation itself shouldn't ever fail with
the algorithm we use today (i.e., we have enough DDB blocks on BXT to
support the minimum needs of the worst-case scenario of every pipe/plane
enabled at full size). However the watermarks calculations based on the
DDB may fail and we'll be moving those to the atomic check as well in
future patches.
v2:
- Skip DDB calculations in the rare case where our transaction doesn't
actually touch any CRTC's at all. Assuming at least one CRTC state
is present in our transaction, then it means we can't race with any
transactions that would update dev_priv->active_crtcs (which requires
_all_ CRTC locks).
v3:
- Also calculate DDB during initial hw readout, to prevent using
incorrect bios values. (Maarten)
v4:
- Use new distrust_bios_wm flag instead of skip_initial_wm (which was
never actually set).
- Set intel_state->active_pipe_changes instead of just realloc_pipes
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Lyude Paul <cpaul@redhat.com>
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463061971-19638-10-git-send-email-matthew.d.roper@intel.com
SKL-style platforms can't fully trust the watermark/DDB settings
programmed by the BIOS and need to do extra sanitization on their first
atomic update. Add a flag to dev_priv that is set during hardware
readout and cleared at the end of the first commit.
Note that for the somewhat common case where everything is turned off
when the driver starts up, we don't need to bother with a recompute...we
know exactly what the DDB should be (all zero's) so just setup the DDB
directly in that case.
v2:
- Move clearing of distrust_bios_wm up below the swap_state call since
it's a more natural / self-explanatory location. (Maarten)
- Use dev_priv->active_crtcs to test whether any CRTC's are turned on
during HW WM readout rather than trying to count the active CRTC's
again ourselves. (Maarten)
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463061971-19638-9-git-send-email-matthew.d.roper@intel.com
We eventually want to calculate watermark values at atomic 'check' time
instead of atomic 'commit' time so that any requested configurations
that result in impossible watermark requirements are properly rejected.
The first step along this path is to allocate the DDB at atomic 'check'
time. As we perform this transition, allow the main allocation function
to operate successfully on either an in-flight state or an
already-commited state. Once we complete the transition in a future
patch, we'll come back and remove the unnecessary logic for the
already-committed case.
v2: Rebase/refactor; we should no longer need to grab extra plane states
while allocating the DDB since we can pull cached data rates and
minimum block counts from the CRTC state for any planes that aren't
being modified by this transaction.
v3:
- Simplify memsets to clear DDB plane entries. (Maarten)
- Drop a redundant memset of plane[pipe][PLANE_CURSOR] that was added
by an earlier Coccinelle patch. (Maarten)
- Assign *num_active at the top of skl_ddb_get_pipe_allocation_limits()
so that no code paths return without setting it. (kbuild robot)
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463061971-19638-8-git-send-email-matthew.d.roper@intel.com
The get-reset-stats ioctl reports upon the statistics (number of hangs,
be it as a victim or the guilty party) of a particular context. It is
semantically better as being part of i915_gem_context.c user interface,
as opposed to the hardware level access of intel_uncore.c
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463137042-9669-1-git-send-email-chris@chris-wilson.co.uk
To be used for more efficient Gen range checking.
v2: Remove spurious chunk. (Chris Wilson)
v3: Rebase.
v4: Renamed from INTEL_GEN_RANGE and added GEN_FOREVER.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462874228-6601-1-git-send-email-tvrtko.ursulin@linux.intel.com
It just makes more work for the compiler and generates more code.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
This way optimization from a previous patch works even better.
v2: Rebase.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
If we allow it a dedicated flag in dev_priv we enable the
compiler to nicely optimize conditions like IS_HASSWELL ||
IS_BROADWELL.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
If instead of numerical comparison me make these test a
bitmask, we enable the compiler to optimize all instances
of IS_GENx || IS_GENy.
v2: Make bit zero of gen mask mean gen 1.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Pass drm_i915_private to the uncore init/fini routines and their
subservients as it is their native type.
text data bss dec hex filename
6309978 3578778 696320 10585076 a183f4 vmlinux
6309530 3578778 696320 10584628 a18234 vmlinux
a modest 400 bytes of saving, but 60 lines of code deleted!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462885804-26750-1-git-send-email-chris@chris-wilson.co.uk
text data bss dec hex filename
6309351 3578714 696320 10584385 a18141 vmlinux
6308391 3578714 696320 10583425 a17d81 vmlinux
Almost 1KiB of code reduction.
v2: More s/INTEL_INFO()->gen/INTEL_GEN()/ and IS_GENx() conversions
text data bss dec hex filename
6304579 3578778 696320 10579677 a16edd vmlinux
6303427 3578778 696320 10578525 a16a5d vmlinux
Now over 1KiB!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462545621-30125-3-git-send-email-chris@chris-wilson.co.uk
I have noticed some of our interrupt handlers use both dev and
dev_priv while they could get away with only dev_priv in the
huge majority of cases.
Tidying that up had a cascading effect on changing functions
prototypes, so relatively big churn factor, but I think it is
for the better.
For example even where changes cascade out of i915_irq.c, for
functions prefixed with intel_, genX_ or <plat>_, it makes more
sense to take dev_priv directly anyway.
This allows us to eliminate local variables and intermixed usage
of dev and dev_priv where only one is good enough.
End result is shrinkage of both source and the resulting binary.
i915.ko:
- .text 000b0899
+ .text 000b0619
Or if we look at the Gen8 display irq chain:
-00000000000006ad t gen8_irq_handler
+0000000000000663 t gen8_irq_handler
-0000000000000028 T intel_opregion_asle_intr
+0000000000000024 T intel_opregion_asle_intr
-000000000000008c t ilk_hpd_irq_handler
+000000000000007f t ilk_hpd_irq_handler
-0000000000000116 T intel_check_page_flip
+0000000000000112 T intel_check_page_flip
-000000000000011a T intel_prepare_page_flip
+0000000000000119 T intel_prepare_page_flip
-0000000000000014 T intel_finish_page_flip_plane
+0000000000000013 T intel_finish_page_flip_plane
-0000000000000053 t hsw_pipe_crc_irq_handler
+000000000000004c t hsw_pipe_crc_irq_handler
-000000000000022e t cpt_irq_handler
+0000000000000213 t cpt_irq_handler
So small shrinkage but it is all fast paths so doesn't harm.
Situation is similar in other interrupt handlers as well.
v2: Tidy intel_queue_rps_boost_for_request as well. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
DP dual mode type 1 DVI adaptors aren't required to implement any
registers, so it's a bit hard to detect them. The best way would
be to check the state of the CONFIG1 pin, but we have no way to
do that. So as a last resort, check the VBT to see if the HDMI
port is in fact a dual mode capable DP port.
v2: Deal with VBT code reorganization
Deal with DRM_DP_DUAL_MODE_UNKNOWN
Reduce DEVICE_TYPE_DP_DUAL_MODE_BITS a bit
Accept both DP and HDMI dvo_port in VBT as my BSW
at least declare its DP port as HDMI :(
v3: Ignore DEVICE_TYPE_NOT_HDMI_OUTPUT (Shashank)
Cc: stable@vger.kernel.org
Cc: Tore Anderson <tore@fud.no>
Reported-by: Tore Anderson <tore@fud.no>
Fixes: 7a0baa6234 ("Revert "drm/i915: Disable 12bpc hdmi for now"")
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Shashank Sharma <shashank.sharma@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462362322-31278-1-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Shashank Sharma <shashank.sharma@intel.com>
If the command parser is not active, then it is appropriate to report it
as operating at version 0 as no higher mode is supported. This greatly
simplifies userspace querying for the command parser as we then do not
need to second guess when it will be active (a mixture of module
parameters and generational support, which may change over time).
v2: s/comand/command/ misspelling in comment
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1462368336-21230-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Only has one user and is nothing more than a shim on top of
i915_vma_unbind, so let's just get rid of it.
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461842691-27575-1-git-send-email-matthew.auld@intel.com
This function had copies in 3 different files. Unify them in kernel.h.
Cc: Joe Perches <joe@perches.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Acked-by: Daniel Vetter <daniel.vetter@intel.com> [drm/i915/]
Acked-by: Rob Clark <robdclark@gmail.com> [drm/msm/]
Acked-by: Lucas Stach <l.stach@pengutronix.de> [drm/etinav/]
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The code used by the DP and HDMI paths was very similar, so make them
share it. Note that this removes the write to signal level registers
from the HDMI pre pll enable path, but that's OK since those are set
in vlv_hdmi_pre_enable() function.
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: Jim Bride <jim.bride@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461761065-21195-9-git-send-email-ander.conselvan.de.oliveira@intel.com
The logic for setting signal levels is used for both HDMI and DP with
small variations. But it is similar enough to put behind a function
called from the encoders.
v2: Remove unrelated MST changes due to rebase fumble. (Jim Bride)
Fix typo in the commit message. (Jim Bride)
v3: Really fix the typo. (Jim)
Cc: Jim Bride <jim.bride@linux.intel.com>
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: Jim Bride <jim.bride@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461761065-21195-8-git-send-email-ander.conselvan.de.oliveira@intel.com
The exact same code was used by HDMI and DP encoders, so move it to
intel_dpio_phy.c.
v2: Fix typo in the commit message. (Jim Bride)
v3: Call the new function chv_phy_post_pll_disable() instead of
chv_phy_post_disable(), as it should be called after the pll
is disabled. (Ville)
Cc: Jim Bride <jim.bride@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: Jim Bride <jim.bride@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461761065-21195-7-git-send-email-ander.conselvan.de.oliveira@intel.com
The only difference between the DP and HDMI versions was the lane count.
Since lane_count is now set appropriately for HDMI too, get rid of the
duplication and move this to intel_dpio_phy.c
v2: Don't move comments about 2nd common lane staying alive. (Ville)
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: Jim Bride <jim.bride@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461761065-21195-6-git-send-email-ander.conselvan.de.oliveira@intel.com
As the contexts are accessed by the hardware until the switch is completed
to a new context, the hardware may still be writing to the context object
after the breadcrumb is visible. We must not unpin/unbind/prune that
object whilst still active and so we keep the previous context pinned until
the following request. We can generalise the tracking we already do via
the engine->last_context and move it to the request so that it works
equally for execlists and GuC.
v2: Drop the execlists double pin as that exposes a race inside the lrc
irq handler as it tries to access the context after it may be retired.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-22-git-send-email-chris@chris-wilson.co.uk
If we move the release of the GEM request (i.e. decoupling it from the
various lists used for client and context tracking) after it is complete
(either by the GPU retiring the request, or by the caller cancelling the
request), we can remove the requirement that the final unreference of
the GEM request need to be under the struct_mutex.
The careful reader may notice that one or two impossible NULL pointer
tests are dropped for readability. These pointers cannot be NULL since
they are assigned during request construction and never unset.
v2,v3: Rebalance execlists by moving the context unpinning.
v4: Rebase onto -nightly
v5: Avoid trying to rebalance execlist/GuC context pinning, leave that
to the next step
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-21-git-send-email-chris@chris-wilson.co.uk
Refactor pinning and unpinning of contexts, such that the default
context for an engine is pinned during initialisation and unpinned
during teardown (pinning of the context handles the reference counting).
Thus we can eliminate the special case handling of the default context
that was required to mask that it was not being pinned normally.
v2: Rebalance context_queue after rebasing.
v3: Rebase to -nightly (not 40 patches in)
v4: Rebase onto request_alloc unwinding
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-19-git-send-email-chris@chris-wilson.co.uk
The hardware tracks contexts and expects all live contexts (those active
on the hardware) to have a unique identifier. This is used by the
hardware to assign pagefaults and the like to a particular context.
v2: Reorder to make sure ctx->link is not left dangling if the
assignment of a hw_id fails (Mika).
v3: We have 21bits of context space, not 20.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-17-git-send-email-chris@chris-wilson.co.uk
Now that we share intel_ring_begin(), reserving space for the tail of
the request is identical between legacy/execlists and so the tautology
can be removed. In the process, we move the reserved space tracking
from the ringbuffer on to the request. This is to enable us to reorder
the reserved space allocation in the next patch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-13-git-send-email-chris@chris-wilson.co.uk
The code to switch_mm() is already handled by i915_switch_context(), the
only difference required to setup the aliasing ppgtt is that we need to
emit te switch_mm() on the first context, i.e. when transitioning from
engine->last_context == NULL. This allows us to defer the
initialisation of the GPU from early device initialisation to first use,
which should marginally speed up both. The caveat is that we then defer
the context initialisation until first use - i.e. we cannot assume that
the GPU engines are initialised. For example, this means that power
contexts for rc6 (Ironlake) need to explicitly loaded, as they are.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-11-git-send-email-chris@chris-wilson.co.uk
Since we do the l3-remap on context switch, we can remove the redundant
early call to set the mapping prior to performing the first context
switch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-10-git-send-email-chris@chris-wilson.co.uk
In order to force a reload of the context image upon resume, we first
need to mark its absence on suspend. Currently we are failing to restore
the golden context state and any context w/a to the default context
after resume.
One oversight corrected, is that we had forgotten to reapply the L3
remapping when restoring the lost default context.
v2: Remove deprecated WARN.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-7-git-send-email-chris@chris-wilson.co.uk
Only caller is i915_gem_obj_ggtt_size which only cares about
GGTT so simplify it and implement under that name.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Because having both i915_gem_object_alloc() and i915_gem_alloc_object()
(with different return conventions) is just too confusing!
(i915_gem_object_alloc() is the low-level memory allocator, and remains
unchanged, whereas i915_gem_alloc_object() is a constructor that ALSO
initialises the newly-allocated object.)
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461348872-4702-1-git-send-email-david.s.gordon@intel.com
The newly-introduced function i915_gem_object_pin_map() returns an
ERR_PTR (not NULL) if the pin-and-map opertaion fails, so that's what we
must check for. And it's nicer not to assign such a pointer-or-error to
a structure being filled in until after it's been validated, so we
should keep it local and avoid exporting a bogus pointer. Also, for
clarity and symmetry, we should clear 'virtual_start' along with 'vma'
when unmapping a ringbuffer.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
For all gt3 and gt4 skylake variants, extend the usage of
WaRsDisableCoarsePowerGating for all revisions. Without this
gt3 and gt4 skylakes up to atleast rev 0xa can gpu hang or
system hang.
Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Timo Aaltonen <tjaalton@ubuntu.com>
Cc: <stable@vger.kernel.org>
Reported-by: Mikael Djurfeldt <mikael@djurfeldt.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=94161
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
Tested-by: Timo Aaltonen <tjaalton@ubuntu.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459860977-27751-1-git-send-email-mika.kuoppala@intel.com
(cherry picked from commit 185c66e57c)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
I caught a few errors in our current PHY/CDCLK programming by sanity
checking the actual programmed state, so I thought it would be also
useful for the future. In addition to verifying the state after
programming it also verify it after exiting DC5, to make sure DMC
restored/kept intact everything related.
v2:
- Inlining __phy_reg_verify_state() doesn't make sense and also
incorrect, so don't do it (PW/CI gcc)
v3:
- Rebase on latest -nightly
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: David Weinehall <david.weinehall@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459780030-15781-1-git-send-email-imre.deak@intel.com
Conceptually, each request is a record of a hardware transaction - we
build up a list of pending commands and then either commit them to
hardware, or cancel them. However, whilst building up the list of
pending commands, we may modify state outside of the request and make
references to the pending request. If we do so and then cancel that
request, external objects then point to the deleted request leading to
both graphical and memory corruption.
The easiest example is to consider object/VMA tracking. When we mark an
object as active in a request, we store a pointer to this, the most
recent request, in the object. Then we want to free that object, we wait
for the most recent request to be idle before proceeding (otherwise the
hardware will write to pages now owned by the system, or we will attempt
to read from those pages before the hardware is finished writing). If
the request was cancelled instead, that wait completes immediately. As a
result, all requests must be committed and not cancelled if the external
state is unknown.
All that remains of i915_gem_request_cancel() users are just a couple of
extremely unlikely allocation failures, so remove the API entirely.
A consequence of committing all incomplete requests is that we generate
excess breadcrumbs and fill the ring much more often with dummy work. We
have completely undone the outstanding_last_seqno optimisation.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93907
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-16-git-send-email-chris@chris-wilson.co.uk
Reporting -EIO from i915_wait_request() has proven very troublematic
over the years, with numerous hard-to-reproduce bugs cropping up in the
corner case of where a reset occurs and the code wasn't expecting such
an error.
If the we reset the GPU or have detected a hang and wish to reset the
GPU, the request is forcibly complete and the wait broken. Currently, we
report either -EAGAIN or -EIO in order for the caller to retreat and
restart the wait (if appropriate) after dropping and then reacquiring
the struct_mutex (essential to allow the GPU reset to proceed). However,
if we take the view that the request is complete (no further work will
be done on it by the GPU because it is dead and soon to be reset), then
we can proceed with the task at hand and then drop the struct_mutex
allowing the reset to occur. This transfers the burden of checking
whether it is safe to proceed to the caller, which in all but one
instance it is safe - completely eliminating the source of all spurious
-EIO.
Of note, we only have two API entry points where we expect that
userspace can observe an EIO. First is when submitting an execbuf, if
the GPU is terminally wedged, then the operation cannot succeed and an
-EIO is reported. Secondly, existing userspace uses the throttle ioctl
to detect an already wedged GPU before starting using HW acceleration
(or to confirm that the GPU is wedged after an error condition). So if
the GPU is wedged when the user calls throttle, also report -EIO.
v2: Split more carefully the change to i915_wait_request() and assorted
ABI from the reset handling.
v3: Add a couple of WARN_ON(EIO) to the interruptible modesetting code
so that we don't start to leak EIO there in future (and break our hang
resistant modesetting).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-9-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-1-git-send-email-chris@chris-wilson.co.uk
As the request is only valid during the same global reset epoch, we can
record the current reset_counter when constructing the request and reuse
it when waiting upon that request in future. This removes a very hairy
atomic check serialised by the struct_mutex at the time of waiting and
allows us to transfer those waits to a central dispatcher for all
waiters and all requests.
PS: With per-engine resets, we obviously cannot assume a global reset
epoch for the requests - a per-engine epoch makes the most sense. The
challenge then is how to handle checking in the waiter for when to break
the wait, as the fine-grained reset may also want to requeue the
request (i.e. the assumption that just because the epoch changes the
request is completed may be broken - or we just avoid breaking that
assumption with the fine-grained resets).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-7-git-send-email-chris@chris-wilson.co.uk
In the reset_counter, we use two bits to track a GPU hang and reset. The
low bit is a "reset-in-progress" flag that we set to signal when we need
to break waiters in order for the recovery task to grab the mutex. As
soon as the recovery task has the mutex, we can clear that flag (which
we do by incrementing the reset_counter thereby incrementing the gobal
reset epoch). By clearing that flag when the recovery task holds the
struct_mutex, we can forgo a second flag that simply tells GEM to ignore
the "reset-in-progress" flag.
The second flag we store in the reset_counter is whether the
reset failed and we consider the GPU terminally wedged. Whilst this flag
is set, all access to the GPU (at least through GEM rather than direct mmio
access) is verboten.
PS: Fun is in store, as in the future we want to move from a global
reset epoch to a per-engine reset engine with request recovery.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-6-git-send-email-chris@chris-wilson.co.uk
This is principally a little bit of syntatic sugar to hide the
atomic_read()s throughout the code to retrieve the current reset_counter.
It also provides the other utility functions to check the reset state on the
already read reset_counter, so that (in later patches) we can read it once
and do multiple tests rather than risk the value changing between tests.
v2: Be more strict on converting existing i915_reset_in_progress() over to
the more verbose i915_reset_in_progress_or_wedged().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-4-git-send-email-chris@chris-wilson.co.uk
Currently there is a #define to enable extra BUG_ON for debugging
requests and associated activities. I want to expand its use to cover
all of GEM internals (so that we can saturate the code with asserts).
We can add a Kconfig option to make it easier to enable - with the usual
caveats of not enabling unless explicitly requested.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-3-git-send-email-chris@chris-wilson.co.uk
Separate out the layers of includes (linux, drm, intel, i915) so that it
is a little easier to order our definitions between our multiple
reentrant headers. A couple of headers needed fixes to make them more
standalone (forgotten includes, forward declarations etc).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-2-git-send-email-chris@chris-wilson.co.uk
Store the edram capabilities instead of only the size of
edram. This is preparatory patch to allow edram size calculation
based on edram capability bits for gen9+. With gen9 the
edram is behind llc and is a separate entity. With hsw/bdw
it was more of a victim cache for LLC so the name 'eLLC' might
be warranted. Regardless, rename all mentions of eLLC to EDRAM to
clear the confusion.
v2: return bytes for edram size (Chris)
s/eLLC/eDRAM in output if we are gen > 8
v3: rebase, INTEL_GEN (Chris)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
For all gt3 and gt4 skylake variants, extend the usage of
WaRsDisableCoarsePowerGating for all revisions. Without this
gt3 and gt4 skylakes up to atleast rev 0xa can gpu hang or
system hang.
Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Timo Aaltonen <tjaalton@ubuntu.com>
Cc: <stable@vger.kernel.org>
Reported-by: Mikael Djurfeldt <mikael@djurfeldt.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=94161
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
Tested-by: Timo Aaltonen <tjaalton@ubuntu.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459860977-27751-1-git-send-email-mika.kuoppala@intel.com
Rather than blindly waking up all forcewake domains on command
submission, we can teach each engine what is (or are) the correct
one to take.
On platforms with multiple forcewake domains like VLV, CHV, SKL
and BXT, this has the potential of lowering the GPU and CPU
power use and submission latency.
To implement it we add a function named
intel_uncore_forcewake_for_reg whose purpose is to query which
forcewake domains need to be taken to read or write a specific
register with raw mmio accessors.
These enables the execlists engine setup to query which
forcewake domains are relevant per engine on the currently
running platform.
v2:
* Kerneldoc.
* Split from intel_uncore.c macro extraction, WARN_ON,
no warns on old platforms. (Chris Wilson)
v3:
* Single domain per engine, mention all registers,
bi-directional function and a new name, fix handling
of gen6 and gen7 writes. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1460468251-14069-1-git-send-email-tvrtko.ursulin@linux.intel.com
As the vast majority of users do not use the domain id variable,
we can eliminate it from the iterator and also change the latter
using the same principle as was recently done for for_each_engine.
For a couple of callers which do need the domain mask, store it
in the domain array (which already has the domain id), then both
can be retrieved thence.
Result is clearer code and smaller generated binary, especially
in the tight fw get/put loops. Also, relationship between domain
id and mask is no longer assumed in the macro.
v2: Improve grammar in the commit message and rename the
iterator to for_each_fw_domain_masked for consistency.
(Dave Gordon)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Because it is based on jiffies, current implementation releases the
forcewake at any time between straight away and between 1ms and 10ms,
depending on the kernel configuration (CONFIG_HZ).
This is probably not what has been desired, since the dynamics of keeping
parts of the GPU awake should not be correlated with this kernel
configuration parameter.
Change the auto-release mechanism to use hrtimers and set the timeout to
1ms with a 1ms of slack. This should make the GPU power consistent
across kernel configs, and timer slack should enable some timer coalescing
where multiple force-wake domains exist, or with unrelated timers.
For GlBench/T-Rex this decreases the number of forcewake releases from
~480 to ~300 per second, and for a heavy combined OGL/OCL test from
~670 to ~360 (HZ=1000 kernel).
Even though this reduction can be attributed to the average release period
extending from 0-1ms to 1-2ms, as discussed above, it will make the
forcewake timeout consistent for different CONFIG_HZ values.
Real life measurements with the above workload has shown that, with this
patch, both manage to auto-release the forcewake between 2-4 times per
10ms, even though the number of forcewake gets is dramatically different.
T-Rex requests between 5-10 explicit gets and 5-10 implict gets in each
10ms period, while the OGL/OCL test requests 250 and 380 times in the same
period.
The two data points together suggest that the nature of the forwake
accesses is bursty and that further changes and potential timeout
extensions, or moving the start of timeout from the first to the last
automatic forcewake grab, should be carefully measured for power and
performance effects.
v2:
* Commit spelling. (Dave Gordon)
* More discussion on numbers in the commit. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
We've had problems on several occasions with using the panel type
from the VBT block 40. Usually it seems to be 2, which often
doesn't give us the correct timings for the panel. After some
more digging I found a way to get a panel type via the OpRegion
SWSCI GBDA "Get Panel Details" method. Let's try to use it.
The spec has this to say about the output:
"Bits [15:8] - Panel Type
Bits contain the panel type user setting from CMOS
00h = Not Valid, use default Panel Type & Timings from VBT
01h - 0Fh = Panel Number"
Another version of the spec lists the valid range as 1-16, which makes
more sense since VBT supports 16 panels. Based on actual results
from Rob's G45, 1-16 is what we need to accept.
The other bits in the output don't look relevant for the problem at
hand.
The input is specified as:
"Bits [31:4] - Reserved
Reserved (must be zero)
Bits [3:0] - Panel Number
These bits contain the sequential index of Panel, starting at 0 and
counting upwards from the first integrated Internal Flat-Panel Display
Encoder present, and then from the first external Display Encoder
(e.g., S/DVO-B then S/DVO-C) which supports Internal Flat-Panels.
0h - 0Fh = Panel number"
For now I've just hardcoded the input panel number as 0. That would seem
like a decent choise for LVDS. Not so sure about eDP when port != A.
v2: Accept values 1-16
Filter out bogus results in opregion code (Jani)
Add debug logging for all the different branches (Jani)
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Rob Kramer <rob@solution-space.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94825
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460359431-11003-1-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Tested-by: Rob Kramer <rob@solution-space.com>
Store the extracted panel_type under dev_priv.vbt instead of keeping
around a static variable for it.
Cc: Rob Kramer <rob@solution-space.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
When the GMBUS based i2c transfer times out, we try to fall back to
bit-banging and retry the operation that way. However if the bit-banging
attempt also fails, we should probably go back to the GMBUS method for
the next attempt. Maybe there simply wasn't anyone one the bus at this
time.
There's also a bit of a mess going on with the force_bit handling.
It's supposed to be a ref count actually, and it is as far as
intel_gmbus_force_bit() is concerned. But it's treated as just a
flag by the timeout based bit-banging fallback. I suppose that's
fine since we should never end up in the timeout fallback case
if force_bit was already non-zero. However now that we want to restore
things back to where they were after the bit-banging attempt failed,
we're going to have to do things a bit differently to avoid clobbering
the force_bit count as set up by intel_gmbus_force_bit(). So let's
dedicate the high bit as a flag for the low level timeout based fallback
and treat the rest of the bits as a ref count just as before.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1457366220-29409-4-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
When called because we have run out of vmap address space, we only need
to recover objects that have vmappings and not all.
v2: Start using is_vmalloc_addr()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460113874-17366-5-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
We now have two implementations for vmapping a whole object, one for
dma-buf and one for the ringbuffer. If we couple the mapping into the
obj->pages lifetime, then we can reuse an obj->mapping for both and at
the same time couple it into the shrinker. There is a third vmapping
routine in the cmdparser that maps only a range within the object, for
the time being that is left alone, but will eventually use these routines
in order to cache the mapping between invocations.
v2: Mark the failable kmalloc() as __GFP_NOWARN (vsyrjala)
v3: Call unpin_vmap from the right dmabuf unmapper
v4: Rename vmap to map as we don't wish to imply the type of mapping
involved, just that it contiguously maps the object into kernel space.
Add kerneldoc and lockdep annotations
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460113874-17366-4-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
In order to simplify future patches, extract the
lazy_coherency optimisation our of the engine->get_seqno() vfunc into
its own callback.
v2: Rename the barrier to engine->irq_seqno_barrier to try and better
reflect that the barrier is only required after the user interrupt before
reading the seqno (to ensure that the seqno update lands in time as we
do not have strict seqno-irq ordering on all platforms).
Reviewed-by: Dave Gordon <david.s.gordon@intel.com> [#v2]
v3: Comments for hangcheck paranoia. Mika wanted to keep the extra
barrier inside the hangcheck, just in case. I can argue that it doesn't
provide a barrier against anything, but the side-effects of applying the
barrier may prevent a false declaration of a hung GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460195877-20520-2-git-send-email-chris@chris-wilson.co.uk
It's useful to look at the last seqno submitted on a particular engine
and compare it against the HWS value to check for irregularities.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460010558-10705-1-git-send-email-chris@chris-wilson.co.uk
According to Chris, use of i915_vm_to_ppgtt is visible in benchmark
unless WARN_ON is removed, so lets get rid of it.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
This patch sets the invert bit for hpd detection for each port
based on VBT configuration. Since each AOB can be designed to
depend on invert bit or not, it is expected if an AOB requires
invert bit, the user will set respective bit in VBT.
v2: Separated VBT parsing from the rest of the logic. (Jani)
v3: Moved setting invert bit logic to bxt_hpd_irq_setup()
and changed its logic to avoid looping twice. (Ville)
v4: Changed the logic to mask out the bits first and then
set them to remove need of temporary variable. (Ville)
v5: Moved defines to existing set of defines for the register
and added required breaks. (Ville)
Signed-off-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Durgadoss R <durgadoss.r@intel.com>
Signed-off-by: Shubhangi Shrivastava <shubhangi.shrivastava@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
[Jani: fixed some checkpatch noise, added kernel-doc.]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459420907-11383-2-git-send-email-shubhangi.shrivastava@intel.com
Extract the GPLL reference frequency from CCK and use it in the
GPU freq<->opcode conversions on VLV/CHV. This eliminates all the
assumptions we have about which divider is used for which czclk
frequency.
Note that unlike most clocks from CCK, the GPLL ref clock is a divided
down version of the CZ clock rather than the HPLL clock. CZ clock itself
is a divided down version of the HPLL clock though, so in effect it just
gets divided down twice.
While at it, throw in a few comments explaining the remaining constants
for anyone who later wants to compare this to the spreadsheets.
v2: Add slow/fast notes for CHV clocks (Imre)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1457120584-26080-2-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com> (v1)
Due to timing issues in the HW, some of the status bits required for GuC
authentication occasionally don't get set; when that happens, the GuC
cannot be initialized and we will be left with a wedged GPU. The W/A
suggested is to perform a soft reset of the GuC and attempt to reload
the F/W again for few times before giving up.
As the failure is dependent on timing, tests performed by triggering
manual full gpu reset (i915_wedged) showed that we could sometimes hit
this after several thousand iterations, but sometimes tests ran even
longer without any issues. Reset and reload mechanism proved helpful
when we indeed hit f/w load failure, so it is better to include this
to improve driver stability.
This change implements the following WAs,
WaEnableuKernelHeaderValidFix:skl,bxt
WaEnableGuCBootHashCheckNotSet:skl,bxt
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
If the core runs out of vmap address space, it will call a notifier in
case any driver can reap some of its vmaps. As i915.ko is possibily
holding onto vmap address space that could be recovered, hook into the
notifier chain and try and reap objects holding onto vmaps.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Roman Pen <r.peniaev@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kahola <mika.kahola@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459777603-23618-4-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
DPLL_MD(PIPE_C) is AWOL on CHV. Instead of fixing it someone added
chicken bits to propagate the pixel multiplier from DPLL_MD(PIPE_B)
to either pipe B or C. So do that to make pixel repeat work on pipes
B and C. Pipe A is fine without any tricks.
Fortunately the pixel repeat propagation appears to be a oneshot
operation, so once the value has been written we can clear the
chicken bits. So it is still possible to drive pipe B and C with
different pixel multipliers simultaneosly.
Looks like DPLL_VGA_MODE_DIS must also be set in DPLL(PIPE_B)
for this to work. But since we keep that bit always set in all
DPLLs there's no problem.
This of course means we can't reliably read out the pixel multiplier
for pipes B and C. That would make the state checker unhappy, so I
added shadow copies of those registers in to dev_priv. The other
option would have been to skip pixel multiplier, dpll_md an dotclock
checks entirely on CHV, but that feels like a serious loss of cross
checking, so just pretending that we have working DPLL MD registers
seemed better. Obviously with the shadow copies we can't detect if
the pixel multiplier was properly configured, nor can we take over
its state from the BIOS, but hopefully people won't have displays
that would be limitd to such crappy modes.
There is one strange flicker still remaining. It's visible on
pipe C/HDMID when HDMIB is enabled while driven by pipe B.
It doesn't occur if pipe A drives HDMIB, nor is there any glitch
on pipe B/HDMIB when port C/HDMID starts up. I don't have a board
with HDMIC so not sure if it happens there too. So I'm not sure
if it's somehow tied in with this strange linkage between pipe B
and C. Sadly I was unable to find an enable sequence that would
avoid the glitch, but at least it's not fatal ie. the output
recovers afterwards.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458052809-23426-4-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Refer to the GGTT VM consistently as "ggtt->base" instead of just "ggtt",
"vm" or indirectly through other variables like "dev_priv->ggtt.base"
to avoid confusion with the i915_ggtt object itself and PPGTT VMs.
Refer to the GGTT as "ggtt" instead of indirectly through chaining.
As a bonus gets rid of the long-standing i915_obj_to_ggtt vs.
i915_gem_obj_to_ggtt conflict, due to removal of i915_obj_to_ggtt!
v2:
- Added some more after grepping sources with Chris
v3:
- Refer to GGTT VM through ggtt->base consistently instead of ggtt_vm
(Chris)
v4:
- Convert all dev_priv->ggtt->foo accesses to ggtt->foo.
v5:
- Make patch checker happy
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
With async modesets this is no longer protected with connection_mutex,
so ensure that each pll has its own lock. The pll configuration state
is still protected; it's only the pll updates that need locking against
concurrency.
Changes since v1:
- Rebased.
- Fix locking to protect all accesses. (Durgadoss)
Changes since v2:
- Make the dpll_lock global to protect concurrent updates to the
same register, for example DPLL_CTRL1 on skl. (Ander)
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/56F29F50.1090708@linux.intel.com
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Having provided for_each_engine_id() for cases where the third (id)
argument is useful, we can now replace all the remaining instances with
a simpler version that takes only two parameters. In many cases, this
also allows the elimination of the local variable used in the iterator
(usually 'i').
v2:
s/dev_priv/(dev_priv__)/ in body of for_each_engine_masked() [Chris Wilson]
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458757194-17783-2-git-send-email-david.s.gordon@intel.com
Equivalent to the existing for_each_engine() macro, this will replace
the latter wherever the third argument *is* actually wanted (in most
places, it is not used). The third argument is renamed to emphasise
that it is an engine id (type enum intel_engine_id). All the callers of
the macro that actually need the third argument are updated to use this
version, and the argument (generally 'i') is also updated to be 'id'.
Other callers (where the third argument is unused) are untouched for
now; they will be updated in the next patch.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
In preparation for engine reset, the wedged argument of i915_handle_error()
is extended to reflect as a mask of engines that are hung. This is further
passed down to error state capture functions which are also updated.
Engine reset recovery mechanism uses this mask and schedules recovery work
for those particular engines.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458331676-567-3-git-send-email-arun.siluvery@linux.intel.com
Initialize hangcheck struct during driver load. Since we do the same after
recovering from a reset, this is extracted into a helper function.
v2: remove redundant hangcheck init during load as this is done when
engines are initialized (Chris)
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458577619-12006-1-git-send-email-arun.siluvery@linux.intel.com
Patch based on a previous series by Shashank Sharma.
v2: Do not read GAMMA_MODE register to figure what mode we're in
v3: Program PREC_PAL_GC_MAX to clamp pixel values > 1.0
Add documentation on how the Broadcast RGB property is affected by CTM
v4: Update contributors
v5: Refactor degamma/gamma LUTs load into a single function
v6: Fix missing intel_crtc variable (bisect issue)
v7: Fix & simplify limited range matrix multiplication (Matt Roper's
comment)
Signed-off-by: Shashank Sharma <shashank.sharma@intel.com>
Signed-off-by: Kumar, Kiran S <kiran.s.kumar@intel.com>
Signed-off-by: Kausal Malladi <kausalmalladi@gmail.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acknowledged-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458125837-2576-4-git-send-email-lionel.g.landwerlin@intel.com
The moves a couple of functions programming the gamma LUT and CSC
units into their own file.
On generations prior to Haswell there is only a gamma LUT. From
haswell on there is also a new enhanced color correction unit that
isn't used yet. This is why we need to set the GAMMA_MODE register,
either we're using the legacy 8bits LUT or enhanced LUTs (of 10 or
12bits).
The CSC unit is only available from Haswell on.
We also need to make a special case for CherryView which is recognized
as a gen 8 but doesn't have the same enhanced color correction unit
from Haswell on.
v2: Fix access to GAMMA_MODE register on older generations than
Haswell (from Matt Roper's comments)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1458125837-2576-2-git-send-email-lionel.g.landwerlin@intel.com
The BXT display connections have DSI transcoders A and C that can be
muxed to any pipe, not unlike the eDP transcoder. Add the notion of DSI
transcoders.
The "normal" transcoders A, B and C are not used with BXT DSI, so care
must be taken to avoid accessing those registers with DSI transcoders in
the hardware state readout, modeset, and generally everywhere.
v2: addressing comments by Ville:
- rename the dsi get config function to hsw_get_dsi_transcoder_state
- rebase onto the higher level split of pipe/transcoder functions
- use more has_dsi_encoder as we can now because of the above,
with no need to look at the transcoder so much
- rename IS_DSI_TRANSCODER to transcoder_is_dsi
- use the above a bit more instead of comparing to < TRANSCODER_EDP
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/299740536b7941e31b2744f3ce34f7afe936a771.1458313400.git.jani.nikula@intel.com
Atm, in case failure injection forces an error the subsequent "*ERROR*
failed to init modeset" error message will make automated tests (CI)
report this event as a breakage even though the event is expected. To
fix this print the error message with debug log level in this case.
While at it print the error message for any init failure and change it
to
"""
Device initialization failed (errno)
Please file a bug at https://bugs.freedesktop.org/enter_bug.cgi?product=DRI
against DRM/Intel providing the dmesg log by booting with drm.debug=0xf
"""
and export a helper printing error messages using this same format.
A follow-up patch will convert all uses of DRM_ERROR reporting a user
facing problem to use this new helper instead.
v2:
- Include the problematic error message in the commit log, add a
request to file an fdo bug to the message (Chris)
v3:
- Include the new error message too in the commit log, make the
fdo link more precise and print part of the message with info log
level (Chris)
v4: (Chris)
- Use dev_printk instead of DRM_ERROR/INFO and use NOTICE instead of
INFO loglevel
- Export a helper for printing user facing error messages
v5:
- Keep the DRM_ERROR message prefix used by piglit-igt/CI to filter
relevant dmesg lines
- Use dev_notice(), instead of dev_printk(KERN_NOTICE,...)
v6:
- Print the fdo bug link only once (Chris)
CC: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1458290770-15480-1-git-send-email-imre.deak@intel.com
Refer to Global GTT consistently as GGTT, thus rename dev_priv->gtt
to dev_priv->ggtt and struct i915_gtt to struct i915_ggtt.
Fix a couple of whitespace problems while at it.
v2:
- Fix a typo in commit message.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Add support for forcing an error at selected places in the driver. As an
example add 4 options to fail during driver loading.
Requested by Chris.
v2:
- Add fault point for modeset initialization
- Print debug message when injecting an error
v3:
- Rename inject_fault to inject_load_failure, rename the related macros
and helper accordingly (Chris)
- Use a counter instead of a mask to identify the failure point (Daniel)
- Mark the module option as _unsafe and keep i915_params ordered (Joonas)
v4:
- Rebase on latest -nightly
v5:
- Use DRM_INFO instead of DRM_DEBUG_DRIVER, making it clearer in CI reports
that a following error message is expected (IRC r-b from Chris on v5)
CC: Chris Wilson <chris@chris-wilson.co.uk>
CC: Daniel Vetter <daniel.vetter@ffwll.ch>
CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
The only steps requiring device access is the fence and swizzling
initialization, so split these out keeping them in their current place
and move the rest of init steps earlier.
v2-v3:
- unchanged
v4:
- move call to i915_gem_detect_bit_6_swizzle() to
i915_gem_load_init_fences() and preserve the original order of
the detection of HW fence capailities wrt. swizzling (Chris)
CC: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1458132843-21860-1-git-send-email-imre.deak@intel.com
In full gpu reset we prime all engines and reset domains corresponding to
each engine. Per engine reset is just a special case of this process
wherein only a single engine is reset. This change is aimed to modify
relevant functions to achieve this. There are some other steps we carry out
in case of engine reset which are addressed in later patches.
Reset func now accepts a mask of all engines that need to be reset. Where
per engine resets are supported, error handler populates the mask
accordingly otherwise all engines are specified.
v2: ALL_ENGINES mask fixup, better for_each_ring_masked (Chris)
v3: Whitespace fixes (Chris)
v4: Rebase due to s/ring/engine
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458143640-20563-1-git-send-email-mika.kuoppala@intel.com
Favor a single point of truth instead of duplicating the
information. The change also filters out unsupported DSI ports at this
stage, accepting only ports A and C, instead of waiting until the port
checks.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458125015-7931-6-git-send-email-jani.nikula@intel.com
Some trivial ones, first pass done with Coccinelle:
@@
@@
(
- I915_NUM_RINGS
+ I915_NUM_ENGINES
|
- intel_ring_flag
+ intel_engine_flag
|
- for_each_ring
+ for_each_engine
|
- i915_gem_request_get_ring
+ i915_gem_request_get_engine
|
- intel_ring_idle
+ intel_engine_idle
|
- i915_gem_reset_ring_status
+ i915_gem_reset_engine_status
|
- i915_gem_reset_ring_cleanup
+ i915_gem_reset_engine_cleanup
|
- init_ring_lists
+ init_engine_lists
)
But that didn't fully work so I cleaned it up with:
for f in *.[hc]; do sed -i -e s/I915_NUM_RINGS/I915_NUM_ENGINES/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_request_get_ring/i915_gem_request_get_engine/ $f; done
for f in *.[hc]; do sed -i -e s/intel_ring_flag/intel_engine_flag/ $f; done
for f in *.[hc]; do sed -i -e s/intel_ring_idle/intel_engine_idle/ $f; done
for f in *.[hc]; do sed -i -e s/init_ring_lists/init_engine_lists/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_cleanup/i915_gem_reset_engine_cleanup/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_status/i915_gem_reset_engine_status/ $f; done
v2: Rebase.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
The function intel_get_shared_dpll() had a more or less generic
implementation with some platform specific checks to handle smaller
differences between platforms. However, the minimalist approach forces
bigger differences between platforms to be implemented outside of the
shared dpll code (see the *_ddi_pll_select() functions in intel_ddi.c,
for instance).
This patch changes the implementation of intel_get_share_dpll() so that
a completely platform specific version can be used, providing helpers to
reduce code duplication. This should allow the code from the ddi pll
select functions to be moved, and also make room for making more dplls
managed by the shared dpll infrastructure.
v2: WARN_ON(!dpll_mgr) in intel_get_shared_dpll(). (Maarten)
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1457451987-17466-9-git-send-email-ander.conselvan.de.oliveira@intel.com
Move the declarations related to shared dplls from i915_drv.h to their
own header file.
The code that became the shared dpll infrastructre was first introcude
in commit ee7b9f93fd ("drm/i915: manage PCH PLLs separately from
pipes"), hence the 2012-2016 copyright years in the new header file.
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1457451987-17466-6-git-send-email-ander.conselvan.de.oliveira@intel.com
Generalize rawclk handling by storing it in dev_priv.
Presumably our hrawclk readout works at least for CTG and ELK
since we've been using it for DP AUX on those platforms. There
are no real docs anymore after configdb vanished, so the only
reference is the public CTG GMCH spec. What bits are listed in
that doc match our code. The ELK GMCH spec have no relevant
details unfortunately.
The PNV situation is less clear. Starting from
commit aa17cdb4f8 ("drm/i915: initialize backlight max from VBT")
we assume that the CTG/ELK hrawclk readout works for PNV as well.
At least the results *seem* reasonable for one PNV machine (Lenovo
Ideapad S10-3t). Sadly the PNV GMCH spec doesn't have the goods on
the relevant register either.
So let's keep assuming it works for PNV,ELK,CTG and read it out on
those platforms. G33 also has hrawclk according to some notes
in BSpec, but we don't actually need it for anything, so let's not
even try to read it out there.
v2: Rebase due to IS_VALLYVIEW vs. IS_CHERRYVIEW split
Use KHz() all over, and kill off a few useless temp variables
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1456932138-14004-2-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Only planes that are part of the state should be used for recalculating
watermarks. For planes not part of the state the previous patch allows
us to re-use the old values since they're calculated even for levels
that are not actively used.
Changes since v1:
- Remove big if from intel_crtc_atomic_check.
- Remove extra newline.
- Remove memset in ilk_compute_pipe_wm.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1456826842-32553-2-git-send-email-maarten.lankhorst@linux.intel.com
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
commit e5756c10d8
Author: Imre Deak <imre.deak@intel.com>
Date: Fri Aug 14 18:43:30 2015 +0300
drm/i915/bxt: don't allow cached GEM mappings on A stepping
Added an exception of disallowing snooping for Broxton A
stepping hardware but userptr was still enabling it regardless.
Move the check to HAS_SNOOP now that it is used from multiple
call sites and use it.
v2: Userptr cannot be supported when it cannot be coherent and
generalize the code better. (Chris Wilson)
v3: Make has_snoop true only when !has_llc. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1456920631-34302-1-git-send-email-tvrtko.ursulin@linux.intel.com
We can simplify the conditions selecting the target DC state during
runtime by calculating the allowed DC states in advance during driver
loading. This also makes it easier to disable DC states depending on the
i915.disable_power_well module option, added in the next patch.
v2:
- Print a debug message if the requested max DC value was adjusted due
to a platform limit. Also debug print the calculated mask value. (Patrik)
CC: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1456778945-5411-2-git-send-email-imre.deak@intel.com
execute during context save/restore, good to have them in error state.
v2: use wa_ctx->size and print only size values (Mika)
v3: simplify conditions when recording and freeing object (Chris)
v4: resolve checkpatch errors (Tvrtko)
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1456831476-10782-1-git-send-email-arun.siluvery@linux.intel.com
In addition to calculating final watermarks, let's also pre-calculate a
set of intermediate watermark values at atomic check time. These
intermediate watermarks are a combination of the watermarks for the old
state and the new state; they should satisfy the requirements of both
states which means they can be programmed immediately when we commit the
atomic state (without waiting for a vblank). Once the vblank does
happen, we can then re-program watermarks to the more optimal final
value.
v2: Significant rebasing/rewriting.
v3:
- Move 'need_postvbl_update' flag to CRTC state (Daniel)
- Don't forget to check intermediate watermark values for validity
(Maarten)
- Don't due async watermark optimization; just do it at the end of the
atomic transaction, after waiting for vblanks. We do want it to be
async eventually, but adding that now will cause more trouble for
Maarten's in-progress work. (Maarten)
- Don't allocate space in crtc_state for intermediate watermarks on
platforms that don't need it (gen9+).
- Move WaCxSRDisabledForSpriteScaling:ivb into intel_begin_crtc_commit
now that ilk_update_wm is gone.
v4:
- Add a wm_mutex to cover updates to intel_crtc->active and the
need_postvbl_update flag. Since we don't have async yet it isn't
terribly important yet, but might as well add it now.
- Change interface to program watermarks. Platforms will now expose
.initial_watermarks() and .optimize_watermarks() functions to do
watermark programming. These should lock wm_mutex, copy the
appropriate state values into intel_crtc->active, and then call
the internal program watermarks function.
v5:
- Skip intermediate watermark calculation/check during initial hardware
readout since we don't trust the existing HW values (and don't have
valid values of our own yet).
- Don't try to call .optimize_watermarks() on platforms that don't have
atomic watermarks yet. (Maarten)
v6:
- Rebase
v7:
- Further rebase
v8:
- A few minor indentation and line length fixes
v9:
- Yet another rebase since Maarten's patches reworked a bunch of the
code (wm_pre, wm_post, etc.) that this was previously based on.
v10:
- Move wm_mutex to dev_priv to protect against racing commits against
disjoint CRTC sets. (Maarten)
- Drop unnecessary clearing of cstate->wm.need_postvbl_update (Maarten)
v11:
- Now that we've moved to atomic watermark updates, make sure we call
the proper function to program watermarks in
{ironlake,haswell}_crtc_enable(); the failure to do so on the
previous patch iteration led to us not actually programming the
watermarks before turning on the CRTC, which was the cause of the
underruns that the CI system was seeing.
- Fix inverted logic for determining when to optimize watermarks. We
were needlessly optimizing when the intermediate/optimal values were
the same (harmless), but not actually optimizing when they differed
(also harmless, but wasteful from a power/bandwidth perspective).
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1456276813-5689-1-git-send-email-matthew.d.roper@intel.com
The multiple levels of indirect do nothing but hinder the compiler and
the pointer chasing turns to be quite painful but painless to fix.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1456484600-11477-1-git-send-email-tvrtko.ursulin@linux.intel.com
The DMC can incorrectly run off and allow DC states on it's own. We
don't know the root-cause for this yet but this patch makes it more
visible.
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1455808874-22089-2-git-send-email-mika.kuoppala@intel.com
Instead of duplicating the functionality now that we no longer need
to preserve dpll state we can move to using the upstream suspend helper.
Changes since v1:
- Call hw readout with all mutexes held.
- Rework intel_display_suspend to only assign modeset_restore_state
on success.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/56C2E686.5060803@linux.intel.com
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Swap the order of context & engine cleanup, so that contexts are cleaned
up first, and *then* engines. This is a more sensible order anyway, but
in particular has become necessary since the 'intel_ring_initialized()
must be simple and inline' patch, which now uses ring->dev as an
'initialised' flag, so it can now be NULL after engine teardown. This
in turn can cause a problem in the context code, which (used to) check
the ring->dev->struct_mutex -- causing a fault if ring->dev was NULL.
Also rename the cleanup function to reflect what it actually does
(cleanup engines, not a ringbuffer), and fix an annoying whitespace issue.
v2: Also make the fix in i915_load_modeset_init, not just in
i915_driver_unload (Chris Wilson)
v3: Had extra stuff in it.
v4: Reverted extra stuff (so we're back to v2).
Rebased and updated commentary above (Dave Gordon).
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Signed-off-by: David Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1453504211-7982-2-git-send-email-david.s.gordon@intel.com
Make the gpio read/write functions more generic iosf sideband read/write
functions, taking the iosf port as argument.
v2: rebase
v3: rebase
v4 by Jani: address Ville's review
v5 by Jani: drop the PCI_DEVFN change (Ville)
Signed-off-by: Deepak M <m.deepak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1454604915-17142-1-git-send-email-jani.nikula@intel.com
The recent introduction of a new caller of dev_priv->fbc.deactivate()
is a good example of why we need unexport those functions. Anything
outside intel_fbc.c should only call the functions exported by
intel_fbc.c, so in order to enforce that, kill the function pointers
stored inside dev_priv->fbc and replace them with functions that can't
be called from outside intel_fbc.c.
This should make it much harder for new code to call these functions
from outside intel_fbc.c.
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1454101060-23198-2-git-send-email-paulo.r.zanoni@intel.com
commit 033908aed5
Author: Dave Gordon <david.s.gordon@intel.com>
Date: Thu Dec 10 18:51:23 2015 +0000
drm/i915: mark GEM object pages dirty when mapped & written by the CPU
introduced a check into i915_gem_object_get_dirty_pages() that returned
a NULL pointer when called with a bad object, one that was not backed by
shmemfs. This WARN was too strict as we can work on all struct page
backed objects, and resulted in a WARN + GPF for existing userspace. In
order to differentiate the various types of objects, add a new flags field
to the i915_gem_object_ops struct to describe their capabilities, with
the first flag being whether the object has struct pages.
v2: Drop silly const before an integer in the structure declaration.
Testcase: igt/gem_userptr_blits/relocations
Reported-and-tested-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Tested-by: Michal Winiarski <michal.winiarski@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453487551-16799-1-git-send-email-chris@chris-wilson.co.uk
Link standby support has been deprecated with 'commit 89251b177
("drm/i915: PSR: deprecate link_standby support for core platforms.")'
The reason for that is that main link in full off offers more power
savings and on HSW and BDW implementations on source side had known
bugs with link standby.
However that same HSD report only mentions BDW and HSW and tells that
a fix was going to new platforms. Since on Skylake link standby
didn't cause the bad blank flickering screens seen on HSW and BDW
let's respect VBT again for this and future platforms.
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
We already make sure we run intel_fbc_update_update during modesets
and page flips, and this function takes care of deactivating FBC, so
it shouldn't be possible for us to reach the condition we check at
intel_fbc_work_fn. So instead of grabbing framebuffer references and
adding a lot of code to track when we need to free them, just don't
track anything at all since we shouldn't need to.
v2: Rebase.
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453210558-7875-25-git-send-email-paulo.r.zanoni@intel.com
We don't actually use fb_id anywhere. We already compare all
parameters that matter to the hardware: pixel format, stride,
fence_reg and ggtt_offset. The ID shouldn't make a difference.
Besides, we already update the FBC data at every modeset/flip, so this
can't change behind our backs.
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453210558-7875-23-git-send-email-paulo.r.zanoni@intel.com
Older FBC platforms have this restriction where FBC can't be enabled
if multiple pipes are enabled. In the current code, we disable FBC
before the second pipe becomes visible.
One of the problems with this code is that the current
multiple_pipes_ok() implementation just iterates through all CRTCs
looking at their states, but it doesn't make sure that the state
locks are grabbed. It also can't just grab the locks for every CRTC
since this would kill one of the biggest advantages of atomic
modesetting.
After the recent FBC changes, we now have the appropriate locks for
the given CRTC, so we can just try to maintain the state of each CRTC
and update it once intel_fbc_pre_update is called.
As a last note, I don't have gen 2/3 machines to test this code. My
current plan is to enable FBC on just the newer platforms, so this
patch is just an attempt to get the gen 2/3 code at least looking
sane, so if one day someone decide to fix FBC on these platforms, they
may have less work to do.
Not-tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com> (only on HSW+)
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453210558-7875-16-git-send-email-paulo.r.zanoni@intel.com
Per the new atomic locking rules, we need to cache the CRTC, plane and
FB state structures we use so we can access them later without needing
more locks. So do this.
Notice that there are some pieces of the FBC code that look at things
that are only computed during the modeset, so we can't just can't
precompute whether FBC can be activated during the update_state_cache
stage. We may be able to do this later.
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453210558-7875-10-git-send-email-paulo.r.zanoni@intel.com
We say "dev_priv->fbc.something" way too many times in our code while
we could be saying just "fbc->something" with a previous declaration
of fbc. This has been bothering me for a while but I didn't want to
patch it since I wanted to fix the real problems first. But as I add
more code I keep thinking about it, especially since it makes the code
easier to read and it can make us fit 80 columns easier, so let's just
do the change now.
While at it, also rename from i915_fbc to intel_fbc because the whole
FBC code uses intel_fbc.
v2: Rebase after the work_fn changes.
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453406763-10400-1-git-send-email-paulo.r.zanoni@intel.com
The early return inside __intel_fbc_update does not completely check
all the parameters that affect the FBC register values. For example,
we currently lack looking at crtc->adjusted_y (for the fence Y offset)
and all the parameters that affect the CFB size (for i8xx).
Instead of just adding the missing parameters to the check and hoping
that any changes to the fbc_activate functions also come with a
matching change to the __intel_fbc_update check, introduce a new
structure where we store these parameters and use the structure at the
fbc_activate function. Of course, it's still possible to access
everything from dev_priv in those functions, but IMHO the new code
will be harder to break.
v2: Rebase.
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453210558-7875-5-git-send-email-paulo.r.zanoni@intel.com
Instead of waiting for 50ms, just wait until the next vblank, since
it's the minimum requirement. The whole infrastructure of FBC is based
on vblanks, so waiting for X vblanks instead of X milliseconds sounds
like the correct way to go. Besides, 50ms may be less than a vblank on
super slow modes that may or may not exist.
There are some small improvements in PC state residency (due to the
fact that we're now using 16ms for the common modes instead of 50ms),
but the biggest advantage is still the correctness of being
vblank-based instead of time-based.
v2:
- Rebase after changing the patch order.
- Update the commit message.
v3:
- Fix bogus vblank_get() instead of vblank_count() (Ville).
- Don't forget to call drm_crtc_vblank_{get,put} (Chris, Ville)
- Adjust the performance details on the commit message.
v4:
- Don't grab the FBC mutex just to grab the vblank (Maarten)
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453406585-10233-1-git-send-email-paulo.r.zanoni@intel.com
Factor out common clean-up code for the GEM load time init function.
Also rename i915_gem_load() to i915_gem_load_init() to have a better
match with its new clean-up function.
No functional change.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: David Weinehall <david.weinehall@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453209992-25995-5-git-send-email-imre.deak@intel.com
Factor out the common GEM shrinker clean-up code and call the shrinker
init function from the same function from where the corresponding
shrinker clean-up function is called. Also add sanity checking to the
shrinker and OOM registration calls.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: David Weinehall <david.weinehall@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453209992-25995-4-git-send-email-imre.deak@intel.com
Swap the order of context & engine cleanup, so that contexts are cleaned
up first, and *then* engines. This is a more sensible order anyway, but
in particular has become necessary since the 'intel_ring_initialized()
must be simple and inline' patch, which now uses ring->dev as an
'initialised' flag, so it can now be NULL after engine teardown. This
in turn can cause a problem in the context code, which (used to) check
the ring->dev->struct_mutex -- causing a fault if ring->dev was NULL.
Also rename the cleanup function to reflect what it actually does
(cleanup engines, not a ringbuffer), and fix an annoying whitespace issue.
v2: Also make the fix in i915_load_modeset_init, not just in
i915_driver_unload (Chris Wilson)
v3: Had extra stuff in it.
v4: Reverted extra stuff (so we're back to v2).
Rebased and updated commentary above (Dave Gordon).
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1453405067-32890-3-git-send-email-david.s.gordon@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Some of the HW registers are privileged and cannot be written to from
non-privileged batch buffers coming from userspace unless they are added to
the HW whitelist. This whitelist is maintained by HW and it is different from
SW whitelist. Userspace need write access to them to implement preemption
related WA.
The reason for using this approach is, the register bits that control
preemption granularity at the HW level are not context save/restored; so even
if we set these bits always in kernel they are going to change once the
context is switched out. We can consider making them non-privileged by
default but these registers also contain other chicken bits which should not
be allowed to be modified.
In the later revisions controlling bits are save/restored at context level but
in the existing revisions these are exported via other debug registers and
should be on the whitelist. This patch adds changes to provide HW with a list
of registers to be whitelisted. HW checks this list during execution and
provides access accordingly.
HW imposes a limit on the number of registers on whitelist and it is
per-engine. At this point we are only enabling whitelist for RCS and we don't
foresee any requirement for other engines.
The registers to be whitelisted are added using generic workaround list
mechanism, even these are only enablers for userspace workarounds. But by
sharing this mechanism we get some test assets without additional cost (Mika).
v2: rebase
v3: parameterize RING_FORCE_TO_NONPRIV() as _MMIO() should be limited to
i915_reg.h (Ville), drop inline for wa_ring_whitelist_reg (Mika).
v4: improvements suggested by Chris Wilson.
Clarify that this is HW whitelist and different from the one maintained in
driver. This list is engine specific but it gets initialized along with other
WA which is RCS specific thing, so make it clear that we are not doing any
cross engine setup during initialization.
Make HW whitelist count of each engine available in debugfs.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453412634-29238-2-git-send-email-arun.siluvery@linux.intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
At the moment execbuf ring selection is fully coupled to
internal ring ids which is not a good thing on its own.
This dependency is also spread between two source files and
not spelled out at either side which makes it hidden and
fragile.
This patch decouples this dependency by introducing an explicit
translation table of execbuf uAPI to ring id close to the only
call site (i915_gem_do_execbuffer).
This way we are free to change driver internal implementation
details without breaking userspace. All state relating to the
uAPI is now contained in, or next to, i915_gem_do_execbuffer.
As a side benefit, this patch decreases the compiled size
of i915_gem_do_execbuffer.
v2: Extract ring selection into eb_select_ring. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1452870770-13981-1-git-send-email-tvrtko.ursulin@linux.intel.com
Now that we've eliminated a lot of uses of ring->default_context,
we can eliminate the pointer itself.
All the engines share the same default intel_context, so we can just
keep a single reference to it in the dev_priv structure rather than one
in each of the engine[] elements. This make refcounting more sensible
too, as we now have a refcount of one for the one pointer, rather than
a refcount of one but multiple pointers.
From an idea by Chris Wilson.
v2: transform an extra instance of ring->default_context introduced by
42f1cae8c drm/i915: Restore inhibiting the load of the default context
That patch's commentary includes:
v2: Mark the global default context as uninitialized on GPU reset so
that the context-local workarounds are reloaded upon re-enabling
The code implementing that now also benefits from the replacement of
the multiple (per-ring) pointers to the default context with a single
pointer to the unique kernel context.
v4: Rebased, remove underused local (Nick Hoath)
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1453230175-19330-3-git-send-email-david.s.gordon@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There are a number of places where the driver needs a request, but isn't
working on behalf of any specific user or in a specific context. At
present, we associate them with the per-engine default context. A future
patch will abolish those per-engine context pointers; but we can already
eliminate a lot of the references to them, just by making the allocator
allow NULL as a shorthand for "an appropriate context for this ring",
which will mean that the callers don't need to know anything about how
the "appropriate context" is found (e.g. per-ring vs per-device, etc).
So this patch renames the existing i915_gem_request_alloc(), and makes
it local (static inline), and replaces it with a wrapper that provides
a default if the context is NULL, and also has a nicer calling
convention (doesn't require a pointer to an output parameter). Then we
change all callers to use the new convention:
OLD:
err = i915_gem_request_alloc(ring, user_ctx, &req);
if (err) ...
NEW:
req = i915_gem_request_alloc(ring, user_ctx);
if (IS_ERR(req)) ...
OLD:
err = i915_gem_request_alloc(ring, ring->default_context, &req);
if (err) ...
NEW:
req = i915_gem_request_alloc(ring, NULL);
if (IS_ERR(req)) ...
v4: Rebased
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453230175-19330-2-git-send-email-david.s.gordon@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This reverts commit 396e33ae20.
This commit was triggering some FIFO underrun warnings on ILK-IVB
platforms (but surprisingly not on HSW/BDW that share more or less the
same codepaths). These underruns were caught by the continuous
integration (CI) system and could be reproduced consistently when
running the basic acceptance tests (BAT) on the affected platforms.
Note that this revert will cause a visible regression for some
end-users; the "flicker when mouse moves between monitors in X" issue
that was reported before this patch was merged will now return. However
regressions that are visible to CI have higher priority since they
prevent proper testing of future patches on those platforms. Hopefully
we'll be able to figure out the cause of the underruns quickly and
remerge an improved version of this patch to fix the regression.
Cc: Daniel Vetter <daniel@ffwll.ch>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93640
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1453232584-8543-1-git-send-email-matthew.d.roper@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
LRC lifetime is well defined so we can cache the page pointing
to the object backing store in the context in order to avoid
walking over the object SG page list from the interrupt context
without the big lock held.
v2: Also cache the mapping. (Chris Wilson)
v3: Unmap on the error path.
v4: No need to cache the page. (Chris Wilson)
v5: No need to dirty the page on unpin. (Chris Wilson)
v6: kmap() cannot fail and use kmap_to_page to simplify unpin.
(Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1452877965-32042-1-git-send-email-tvrtko.ursulin@linux.intel.com
LRC code was calling GEM API like i915_gem_obj_ggtt_offset from
places where the struct_mutex cannot be grabbed (irq handlers).
To avoid that this patch caches some interesting bits and values
in the engine and context structures.
Some usages are also removed where they are not needed like a
few asserts which are either impossible or have been checked
already during engine initialization.
Side benefit is also that interrupt handlers and command
submission stop evaluating invariant conditionals, like what
Gen we are running on, on every interrupt and every command
submitted.
This patch deals with logical ring context id and descriptors
while subsequent patches will deal with the remaining issues.
v2:
* Cache the VMA instead of the address. (Chris Wilson)
* Incorporate Dave Gordon's good comments and function name.
v3:
* Extract ctx descriptor template to a function and group
functions dealing with ctx descriptor & co together near
top of the file. (Dave Gordon)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1452870629-13830-1-git-send-email-tvrtko.ursulin@linux.intel.com
Add a common function to return "on" or "off" string based on the
argument, and drop the local versions of it.
This is the onoff version of
commit 42a8ca4cb4
Author: Jani Nikula <jani.nikula@intel.com>
Date: Thu Aug 27 16:23:30 2015 +0300
drm/i915: add yesno utility function
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1452768814-29787-1-git-send-email-jani.nikula@intel.com
If we go into suspend with unclaimed access detected,
it would be nice to catch that access on a next suspend path.
So instead of just notifying about it, arm the unclaimed
mmio checks on suspend side.
We want to keep the asymmetry on resume, as if it was
on resume path, it was not driver that is responsible so
no point in arming mmio debugs.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1452261080-6979-2-git-send-email-mika.kuoppala@intel.com
We have done unclaimed register access check in normal
(mmio_debug=0) mode once per write. This adds probability
of finding the exact sequence where we did the bad access, but
also adds burden to each write.
As we have mmio_debug available for more fine grained analysis,
give up accuracy of detecting correct spot at the first occurrence
by doing the one shot detection and arming of mmio_debug in hangcheck
and in modeset. This removes the write path performance burden.
v2: Remove gratuitous DRM_DEBUG and return value, comments (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <przanoni@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1450250808-14864-1-git-send-email-mika.kuoppala@intel.com
Currently interrupt code is the only place checking
for the unclaimed register access prior to actual register
macros using the same functionality. Rename the function
and make it return bool so that the possible error message
context is clear in the caller side. The motivation is to allow
usage of unclaimed detection on arbitrary places.
v2: rebase, s/access/mmio, s/dev/dev_priv
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1450189512-30360-2-git-send-email-mika.kuoppala@intel.com
In addition to calculating final watermarks, let's also pre-calculate a
set of intermediate watermark values at atomic check time. These
intermediate watermarks are a combination of the watermarks for the old
state and the new state; they should satisfy the requirements of both
states which means they can be programmed immediately when we commit the
atomic state (without waiting for a vblank). Once the vblank does
happen, we can then re-program watermarks to the more optimal final
value.
v2: Significant rebasing/rewriting.
v3:
- Move 'need_postvbl_update' flag to CRTC state (Daniel)
- Don't forget to check intermediate watermark values for validity
(Maarten)
- Don't due async watermark optimization; just do it at the end of the
atomic transaction, after waiting for vblanks. We do want it to be
async eventually, but adding that now will cause more trouble for
Maarten's in-progress work. (Maarten)
- Don't allocate space in crtc_state for intermediate watermarks on
platforms that don't need it (gen9+).
- Move WaCxSRDisabledForSpriteScaling:ivb into intel_begin_crtc_commit
now that ilk_update_wm is gone.
v4:
- Add a wm_mutex to cover updates to intel_crtc->active and the
need_postvbl_update flag. Since we don't have async yet it isn't
terribly important yet, but might as well add it now.
- Change interface to program watermarks. Platforms will now expose
.initial_watermarks() and .optimize_watermarks() functions to do
watermark programming. These should lock wm_mutex, copy the
appropriate state values into intel_crtc->active, and then call
the internal program watermarks function.
v5:
- Skip intermediate watermark calculation/check during initial hardware
readout since we don't trust the existing HW values (and don't have
valid values of our own yet).
- Don't try to call .optimize_watermarks() on platforms that don't have
atomic watermarks yet. (Maarten)
v6:
- Rebase
v7:
- Further rebase
v8:
- A few minor indentation and line length fixes
v9:
- Yet another rebase since Maarten's patches reworked a bunch of the
code (wm_pre, wm_post, etc.) that this was previously based on.
v10:
- Move wm_mutex to dev_priv to protect against racing commits against
disjoint CRTC sets. (Maarten)
- Drop unnecessary clearing of cstate->wm.need_postvbl_update (Maarten)
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1452108870-24204-1-git-send-email-matthew.d.roper@intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Although we can do a good job of reading out hardware state, the
graphics firmware may have programmed the watermarks in a creative way
that doesn't match how i915 would have chosen to program them. We
shouldn't trust the firmware's watermark programming, but should rather
re-calculate how we think WM's should be programmed and then shove those
values into the hardware.
We can do this pretty easily by creating a dummy top-level state,
running it through the check process to calculate all the values, and
then just programming the watermarks for each CRTC.
v2: Move watermark sanitization after our BIOS fb reconstruction; the
watermark calculations that we do here need to look at pstate->fb,
which isn't setup yet in intel_modeset_setup_hw_state(), even
though we have an enabled & visible plane.
v3:
- Don't move 'active = optimal' watermark assignment; we just undo
that change in the next patch anyway. (Ville)
- Move atomic helper locking fix to separate patch. (Maarten)
v4:
- Grab connection_mutex before calling atomic helper to duplicate
state. The connector loop inside the helper will throw a WARN
if we don't hold something to protect the connector list (and the
helper itself doesn't try to lock the list).
- Make failure to calculate watermarks for inherited state a WARN()
since it probably indicates a serious problem in either our state
readout code or our watermark code for this platform.
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
On skylake when calculating plane visibility with the crtc in
dpms off mode the real cdclk may be different from what it would be
if the crtc was active. This may result in a WARN_ON(cdclk < crtc_clock)
from skl_max_scale. The fix is to keep a atomic_cdclk that would be true
if all crtc's were active.
This is required to get the same calculations done correctly regardless
of dpms mode.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1447945645-32005-12-git-send-email-maarten.lankhorst@linux.intel.com
Parallel modesets are still not allowed, but this will allow updating
a different crtc during a modeset if the clock is not changed.
Additionally when all pipes are DPMS off the cdclk will be lowered
to the minimum allowed.
Changes since v1:
- Add dev_priv->active_crtcs for tracking which crtcs are active.
- Rename min_cdclk to min_pixclk and move to dev_priv.
- Add a active_crtcs mask which is updated atomically.
- Add intel_atomic_state->modeset which is set on modesets.
- Commit new pixclk/active_crtcs right after state swap.
Changes since v2:
- Make the changes related to max_pixel_rate calculations more readable.
Changes since v3:
- Add cherryview and missing WARN_ON to readout.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
Using __stringify(x) instead of #x adds support for macros as
a parameter and compile-time concatenation reduces the runtime
overhead.
Slightly increases the .text size but should not matter.
v2:
- Define I915_STATE_WARN_ON though I915_STATE_WARN
(Bikeshed inspiration by Chris)
v3:
- More specific commit message
v4:
- Do not directly pass arbitary string as format, instead
guard with "%s" (Dave)
Cc: Rob Clark <robdclark@gmail.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1450441647-23924-3-git-send-email-joonas.lahtinen@linux.intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Otherwise usage in the i915 debug macros yields problems due to
i915_drv.h <-> i915_trace.h <-> intel_drv.h include loops.
v2:
- Document not-so-obvious need for linux/cache.h (Chris)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1450436898-20408-2-git-send-email-joonas.lahtinen@linux.intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
commit 344df9809f ("drm/i915/skl: Disable coarse power gating up until F0")
failed to take into account that the same workaround is used in guc
when forcewake is sampled.
Wrap the condition check inside a macro and use it in both places
to fix the guc side scope.
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1450286318-6854-1-git-send-email-mika.kuoppala@intel.com
Limit busywaiting only to the request currently being processed by the
GPU. If the request is not currently being processed by the GPU, there
is a very low likelihood of it being completed within the 2 microsecond
spin timeout and so we will just be wasting CPU cycles.
v2: Check for logical inversion when rebasing - we were incorrectly
checking for this request being active, and instead busywaiting for
when the GPU was not yet processing the request of interest.
v3: Try another colour for the seqno names.
v4: Another colour for the function names.
v5: Remove the forced coherency when checking for the active request. On
reflection and plenty of recent experimentation, the issue is not a
cache coherency problem - but an irq/seqno ordering problem (timing issue).
Here, we do not need the w/a to force ordering of the read with an
interrupt.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1449833608-22125-4-git-send-email-chris@chris-wilson.co.uk
As we mark the preallocated objects as bound, we should also flag them
correctly as being map-and-fenceable (if appropriate!) so that later
users do not get confused and try and rebind the pinned vma in order to
get a map-and-fenceable binding.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: drm-intel-fixes@lists.freedesktop.org
Link: http://patchwork.freedesktop.org/patch/msgid/1448029000-10616-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In some cases we want to check whether we hold an RPM wakelock reference
for the whole duration of a sequence. To achieve this add a new RPM
atomic sequence counter that we increment any time the wakelock refcount
drops to zero. Check whether the sequence number stays the same during
the atomic section and that we hold the wakelock at the beginning of the
section.
Motivated by Chris.
v2-v3:
- unchanged
v4:
- swap the order of atomic_read() and assert_rpm_wakelock_held() in
assert_rpm_atomic_begin() to avoid race
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1450203038-5150-10-git-send-email-imre.deak@intel.com
Atm, we assert that the device is not suspended until the point when the
device is truly put to a suspended state. This is fine, but we can catch
more problems if we check that RPM refcount is non-zero. After that one
drops to zero we shouldn't access the device any more, even if the actual
device suspend may be delayed. Change assert_rpm_wakelock_held()
accordingly to check for a non-zero RPM refcount in addition to the
current device-not-suspended check.
For the new asserts to work we need to annotate every place explicitly in
the code where we expect that the device is powered. The places where we
only assume this, but may not hold an RPM reference:
- driver load
We assume the device to be powered until we enable RPM. Make this
explicit by taking an RPM reference around the load function.
- system and runtime sudpend/resume handlers
These handlers are called when the RPM reference becomes 0 and know the
exact point after which the device can get powered off. Disable the
RPM-reference-held check for their duration.
- the IRQ, hangcheck and RPS work handlers
These handlers are flushed in the system/runtime suspend handler
before the device is powered off, so it's guaranteed that they won't
run while the device is powered off even though they don't hold any
RPM reference. Disable the RPM-reference-held check for their duration.
In all these cases we still check that the device is not suspended.
These explicit annotations also have the positive side effect of
documenting our assumptions better.
This caught additional WARNs from the atomic modeset path, those should
be fixed separately.
v2:
- remove the redundant HAS_RUNTIME_PM check (moved to patch 1) (Ville)
v3:
- use a new dedicated RPM wakelock refcount to also catch cases where
our own RPM get/put functions were not called (Chris)
- assert also that the new RPM wakelock refcount is 0 in the RPM
suspend handler (Chris)
- change the assert error message to be more meaningful (Chris)
- prevent false assert errors and check that the RPM wakelock is 0 in
the RPM resume handler too
- prevent false assert errors in the hangcheck work too
- add a device not suspended assert check to the hangcheck work
v4:
- rename disable/enable_rpm_asserts to disable/enable_rpm_wakeref_asserts
and wakelock_count to wakeref_count
- disable the wakeref asserts in the IRQ handlers and RPS work too
- update/clarify commit message
v5:
- mark places we plan to change to use proper RPM refcounting with
separate DISABLE/ENABLE_RPM_WAKEREF_ASSERTS aliases (Chris)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1450227139-13471-1-git-send-email-imre.deak@intel.com
The RVDA and RVDS (raw VBT data address and size) fields of the ASLE
mailbox may specify an alternate location for VBT instead of mailbox #4.
Use the alternate location if available and valid, falling back to
mailbox #4 otherwise.
v2: Update debug logging (Ville)
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Mika Kahola <mika.kahola@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1450178280-28020-1-git-send-email-jani.nikula@intel.com
In the future the VBT might not be in mailbox #4 of the ACPI OpRegion,
thus unavailable in i915_opregion, so add a separate file for the VBT.
v2: Drop the locking as unneeded (Chris)
v3: Rebase
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1450178232-27780-1-git-send-email-jani.nikula@intel.com
Make the validation function a boolean operating on a buffer of given
size, removing the extra pointer dances.
Move the OpRegion based VBT validation to intel_opregion_setup(), only
initializing opregion->vbt if it's valid.
v2: move logging about valid VBT to opregion setup too (Ville)
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1450178175-27420-1-git-send-email-jani.nikula@intel.com
Here are the patchset to add get_eld op to audio component for
communicating more directly between i915 and HD-audio.
Currently, the HDMI/DP audio status and ELD are notified and obtained
via the hardware-level communication over HD-audio unsolicited event
and verbs although the graphics driver holds the exactly same
information. As we already have a notification via audio component,
this is another step forward; namely, the audio driver may fetch
directly the audio status and ELD via the new component op.
The commits are based on Dave's latest drm-next branch.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJWaXTVAAoJEGwxgFQ9KSmkdZIQAKg2NQN1CGOIb+ce80UmJ9ap
myZH42aZXytPrMAhHQCw2mvf8aL18EKyQcGv+anLdFM+AqlzZvH/exfOblkr88lg
ZMGr0bXVNa2Lfyrgt9blUwH58uETQZC4P3BrI0cPqIJBdPagDxnxoaV1e/21g/9p
0a2bhiz0gn4OFb83vpi5pL4hGv+BQwwlmkOujcVg7yxR7ylnYI419NM9Z+Lbmfq7
p5jId6Q3EwBw6vpWryOI2TElM3VDThoOGCOtkfmZx6o4fDZ0bdl8CYVCgRFwZZCe
kk01Caa+5+CW88MlJ1VX6gLy0WRlPY0AFreCWKgy5HCUNqew9ruhUeMj4+C1oHpj
ui/79ULLRN2hmu2rvU8lZb0ClihXkDCBN8p89j6o2I+y1aIcUtxvY9Srg5w2tVBe
Ue+OSB3lA4rdnuSjxZiaPf+V4rozIyNJHRjo6xNdY0zuScB4lw9Bh7IYXmj8B8OW
k3LklToj4ZGeyCgfcTQwztAh7fFEXUb1wN+lLqCt3b9688zvMYTQlJ8ZdtK+t188
3DNz9QjjPd4DcxLypl1VpM2Xv3AhuFfugq0oEuQq9bXs7qtj+iLmSWWdmhUNaVWb
Qot21vJEHDii6jtoLdbVMTEZTWyr2nXUfUNFJpUgitif2UhqqgecnR16Fi05pjTv
+Th/GvjddrQ0oe9DwVGY
=NShN
-----END PGP SIGNATURE-----
Merge tag 'drm-i915-get-eld' of tiwai/sound into drm-intel-next-queued
Add get_eld audio component for i915/HD-audio
Currently, the HDMI/DP audio status and ELD are notified and obtained
via the hardware-level communication over HD-audio unsolicited event
and verbs although the graphics driver holds the exactly same
information. As we already have a notification via audio component,
this is another step forward; namely, the audio driver may fetch
directly the audio status and ELD via the new component op.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
In various places, a single page of a (regular) GEM object is mapped into
CPU address space and updated. In each such case, either the page or the
the object should be marked dirty, to ensure that the modifications are
not discarded if the object is evicted under memory pressure.
The typical sequence is:
va = kmap_atomic(i915_gem_object_get_page(obj, pageno));
*(va+offset) = ...
kunmap_atomic(va);
Here we introduce i915_gem_object_get_dirty_page(), which performs the
same operation as i915_gem_object_get_page() but with the side-effect
of marking the returned page dirty in the pagecache. This will ensure
that if the object is subsequently evicted (due to memory pressure),
the changes are written to backing store rather than discarded.
Note that it works only for regular (shmfs-backed) GEM objects, but (at
least for now) those are the only ones that are updated in this way --
the objects in question are contexts and batchbuffers, which are always
shmfs-backed.
Separate patches deal with the cases where whole objects are (or may
be) dirtied.
v3: Mark two more pages dirty in the page-boundary-crossing
cases of the execbuffer relocation code [Chris Wilson]
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1449773486-30822-2-git-send-email-david.s.gordon@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This patch adds a reverse mapping from a digital port number to
intel_encoder object containing the corresponding intel_digital_port.
It simplifies the query of the encoder a lot.
Note that, even if it's a valid digital port, the dig_port_map[] might
point still to NULL -- usually it implies a DP MST port. Due to this
fact, the NULL check in each place has no WARN_ON() and just skips the
port. Once when the situation changes in future, we might introduce
WARN_ON() for a more strict check.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The cherryview device shares many characteristics with the valleyview
device. When support was added to the driver for cherryview, the
corresponding device info structure included .is_valleyview = 1.
This is not correct and leads to some confusion.
This patch changes .is_valleyview to .is_cherryview in the cherryview
device info structure and simplifies the IS_CHERRYVIEW macro.
Then where appropriate, instances of IS_VALLEYVIEW are replaced with
IS_VALLEYVIEW || IS_CHERRYVIEW or equivalent.
v2: Use IS_VALLEYVIEW || IS_CHERRYVIEW instead of defining a new macro.
Also add followup patches to fix issues discovered during the first
review. (Ville)
v3: Fix some style issues and one gen check. Remove CRT related changes
as CRT is not supported on CHV. (Imre, Ville)
v4: Make a few more optimizations. (Ville)
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Wayne Boyer <wayne.boyer@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1449692975-14803-1-git-send-email-wayne.boyer@intel.com
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Userspace can pass in an offset that it presumes the object is located
at. The kernel will then do its utmost to fit the object into that
location. The assumption is that userspace is handling its own object
locations (for example along with full-ppgtt) and that the kernel will
rarely have to make space for the user's requests.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
v2: Fixed incorrect eviction found by Michal Winiarski - fix suggested by Chris
Wilson. Fixed incorrect error paths causing crash found by Michal Winiarski.
(Not published externally)
v3: Rebased because of trivial conflict in object_bind_to_vm. Fixed eviction
to allow eviction of soft-pinned objects when another soft-pinned object used
by a subsequent execbuffer overlaps reported by Michal Winiarski.
(Not published externally)
v4: Moved soft-pinned objects to the front of ordered_vmas so that they are
pinned first after an address conflict happens to avoid repeated conflicts in
rare cases (Suggested by Chris Wilson). Expanded comment on
drm_i915_gem_exec_object2.offset to cover this new API.
v5: Added I915_PARAM_HAS_EXEC_SOFTPIN parameter for detecting this capability
(Kristian). Added check for multiple pinnings on eviction (Akash). Made sure
buffers are not considered misplaced without the user specifying
EXEC_OBJECT_SUPPORTS_48B_ADDRESS. User must assume responsibility for any
addressing workarounds. Updated object2.offset field comment again to clarify
NO_RELOC case (Chris). checkpatch cleanup.
v6: Trivial rebase on latest drm-intel-nightly
v7: Catch attempts to pin above the max virtual address size and return
EINVAL (Tvrtko). Decouple EXEC_OBJECT_SUPPORTS_48B_ADDRESS and
EXEC_OBJECT_PINNED flags, user must pass both flags in any attempt to pin
something at an offset above 4GB (Chris, Daniel Vetter).
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Zou Nanhai <nanhai.zou@intel.com>
Cc: Kristian Høgsberg <hoegsberg@gmail.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Acked-by: PDT
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1449575707-20933-1-git-send-email-thomas.daniel@intel.com
Let's introduce ULT and ULX Kabylake definitions and start
using it for a propper DDI buffer translation.
v2: Remove extra white space. (Paulo)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
This reverts commit 6d65ba943a.
Mika Kuoppala traced down a use-after-free crash in module unload to
this commit, because ring->last_context is leaked beyond when the
context gets destroyed. Mika submitted a quick fix to patch that up in
the context destruction code, but that's too much of a hack.
The right fix is instead for the ring to hold a full reference onto
it's last context, like we do for legacy contexts.
Since this is causing a regression in BAT it gets reverted before we
can close this.
Cc: Nick Hoath <nicholas.hoath@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Alex Dai <yu.dai@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93248
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Use the first retired request on a new context to unpin
the old context. This ensures that the hw context remains
bound until it has been written back to by the GPU.
Now that the context is pinned until later in the request/context
lifecycle, it no longer needs to be pinned from context_queue to
retire_requests.
This fixes an issue with GuC submission where the GPU might not
have finished writing back the context before it is unpinned. This
results in a GPU hang.
v2: Moved the new pin to cover GuC submission (Alex Dai)
Moved the new unpin to request_retire to fix coverage leak
v3: Added switch to default context if freeing a still pinned
context just in case the hw was actually still using it
v4: Unwrapped context unpin to allow calling without a request
v5: Only create a switch to idle context if the ring doesn't
already have a request pending on it (Alex Dai)
Rename unsaved to dirty to avoid double negatives (Dave Gordon)
Changed _no_req postfix to __ prefix for consistency (Dave Gordon)
Split out per engine cleanup from context_free as it
was getting unwieldy
Corrected locking (Dave Gordon)
v6: Removed some bikeshedding (Mika Kuoppala)
Added explanation of the GuC hang that this fixes (Daniel Vetter)
v7: Removed extra per request pinning from ring reset code (Alex Dai)
Added forced ring unpin/clean in error case in context free (Alex Dai)
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Issue: VIZ-4277
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Alex Dai <yu.dai@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Directly call intel_fbc_calculate_cfb_size() in the only place that
actually needs it, and use the proper check before removing the stolen
node. IMHO, this change makes our code easier to understand.
v2: Use drm_mm_node_allocated() (Chris).
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/
This was already on my TODO list, and was requested both by Chris and
Ville, for different reasons. The advantages are avoiding a frequent
malloc/free pair, and the locality of having the work structure
embedded in dev_priv. The maximum used memory is also smaller since
previously we could have multiple allocated intel_fbc_work structs at
the same time, and now we'll always have a single one - the one
embedded on dev_priv. Of course, we're now using a little more memory
on the cases where there's nothing scheduled.
The biggest challenge here is to keep everything synchronized the way
it was before.
Currently, when we try to activate FBC, we allocate a new
intel_fbc_work structure. Then later when we conclude we must delay
the FBC activation a little more, we allocate a new intel_fbc_work
struct, and then adjust dev_priv->fbc.fbc_work to point to the new
struct. So when the old work runs - at intel_fbc_work_fn() - it will
check that dev_priv->fbc.fbc_work points to something else, so it does
nothing. Everything is also protected by fbc.lock.
Just cancelling the old delayed work doesn't work because we might
just cancel it after the work function already started to run, but
while it is still waiting to grab fbc.lock. That's why we use the
"dev_priv->fbc.fbc_work == work" check described in the paragraph
above.
So now that we have a single work struct we have to introduce a new
way to synchronize everything. So we're making the work function a
normal work instead of a delayed work, and it will be responsible for
sleeping the appropriate amount of time itself. This way, after it
wakes up it can grab the lock, ask "were we delayed or cancelled?" and
then go back to sleep, enable FBC or give up.
v2:
- Spelling fixes.
- Rebase after changing the patch order.
- Fix ms/jiffies confusion.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/
The goal is to call FBC enable/disable only once per modeset, while
activate/deactivate/update will be called multiple times.
The enable() function will be responsible for deciding if a CRTC will
have FBC on it and then it will "lock" FBC on this CRTC: it won't be
possible to change FBC's CRTC until disable(). With this, all checks
and resource acquisition that only need to be done once per modeset
can be moved from update() to enable(). And then the update(),
activate() and deactivate() code will also get simpler since they
won't need to worry about the CRTC being changed.
The disable() function will do the reverse operation of enable(). One
of its features is that it should only be called while the pipe is
already off. This guarantees that FBC is stopped and nothing is
using the CFB.
With this, the activate() and deactivate() functions just start and
temporarily stop FBC. They are the ones touching the hardware enable
bit, so HW state reflects dev_priv->crtc.active.
The last function remaining is update(). A lot of times I thought
about renaming update() to activate() or try_to_activate() since it's
called when we want to activate FBC. The thing is that update() may
not only decide to activate FBC, but also deactivate or keep it on the
same state, so I'll leave this name for now.
Moving code to enable() and disable() will also help in case we decide
to move FBC to pipe_config or something else later.
The current patch only puts the very basic code on enable() and
disable(). The next commits will take care of moving more stuff from
update() to the new functions.
v2:
- Rebase.
- Improve commit message (Chris).
v3: Rebase after changing the patch order.
v4: Rebase again after upstream changes.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/
The long term goal is to have enable/disable as the higher level
functions and activate/deactivate as the lower level functions, just
like we do for PSR and for the CRTC. This way, we'll run enable and
disable once per modeset, while update, activate and deactivate will
be run many times. With this, we can move the checks and code that
need to run only once per modeset to enable(), making the code simpler
and possibly a little faster.
This patch is just the first step on the conversion: it starts by
converting the current low level functions from enable/disable to
activate/deactivate. This patch by itself has no benefits other than
making review and rebase easier. Please see the next patches for more
details on the conversion.
v2:
- Rebase.
- Improve commit message (Chris).
v3: Rebase after changing the patch order.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/
This thing where we need to get the crtc either from the work
structure or the fbc structure itself is confusing and unnecessary.
Set fbc.crtc right when scheduling the enable work so we can always
use it.
The problem is not what gets passed and how to retrieve it. The
problem is that when we're in the other parts of the code we always
have to keep in mind that if FBC is already enabled we have to get the
CRTC from place A, if FBC is scheduled we have to get the CRTC from
place B, and if it's disabled there's no CRTC. Having a single place
to retrieve the CRTC from allows us to treat the "is enabled" and "is
scheduled" cases as the same case, reducing the mistake surface. I
guess I should add this to the commit message.
Besides the immediate advantages, this is also going to make one of
the next commits much simpler. And even later, when we introduce
enable/disable + activate/deactivate, this will be even simpler as
we'll set the CRTC at enable time. So all the
activate/deactivate/update code can just look at the single CRTC
variable regardless of the current state.
v2: Improve commit message (Chris).
v3: Rebase after changing the patch order.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/
drm-intel-next-2015-11-20-rebased:
4 weeks because of my vacation, so a bit more:
- final bits of the typesafe register mmio functions (Ville)
- power domain fix for hdmi detection (Imre)
- tons of fixes and improvements to the psr code (Rodrigo)
- refactoring of the dp detection code (Ander)
- complete rework of the dmc loader and dc5/dc6 handling (Imre, Patrik and
others)
- dp compliance improvements from Shubhangi Shrivastava
- stop_machine hack from Chris to fix corruptions when updating GTT ptes on bsw
- lots of fifo underrun fixes from Ville
- big pile of fbc fixes and improvements from Paulo
- fix fbdev failures paths (Tvrtko and Lukas Wunner)
- dp link training refactoring (Ander)
- interruptible prepare_plane for atomic (Maarten)
- basic kabylake support (Deepak&Rodrigo)
- don't leak ringspace on resets (Chris)
drm-intel-next-2015-10-23:
- 2nd attempt at atomic watermarks from Matt, but just prep for now
- fixes all over
* tag 'drm-intel-next-2015-11-20-merged' of git://anongit.freedesktop.org/drm-intel: (209 commits)
drm/i915: Update DRIVER_DATE to 20151120
drm/i915: take a power domain reference while checking the HDMI live status
drm/i915: take a power domain ref only when needed during HDMI detect
drm/i915: Tear down fbdev if initialization fails
async: export current_is_async()
Revert "drm/i915: Initialize HWS page address after GPU reset"
drm/i915: Fix oops caused by fbdev initialization failure
drm/i915: Fix i915_ggtt_view_equal to handle rotation correctly
drm/i915: Stuff rotation params into view union
drm/i915: Drop return value from intel_fill_fb_ggtt_view
drm/i915 : Fix to remove unnecsessary checks in postclose function.
drm/i915: add MISSING_CASE to a few port/aux power domain helpers
drm/i915/ddi: fix intel_display_port_aux_power_domain() after HDMI detect
drm/i915: Remove platform specific *_dp_detect() functions
drm/i915: Don't do edp panel detection in g4x_dp_detect()
drm/i915: Send TP1 TP2/3 even when panel claims no NO_TRAIN_ON_EXIT.
drm/i915: PSR: Don't Skip aux handshake on DP_PSR_NO_TRAIN_ON_EXIT.
drm/i915: Reduce PSR re-activation time for VLV/CHV.
drm/i915: Delay first PSR activation.
drm/i915: Type safe register read/write
...
ironlake_{enable,disable}_display_irq() each just call
ilk_update_display_irq() so let's make them static inlines.
While at it s/ironlake/ilk/ to make things shorter, and a bit more
consistent with the ibx functions.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1448294777-13722-3-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Commit "30c964a drm/i915: Detect virtual south bridge" detects and
handles the southbridge emulated by vmware esx. Add the ich9 south
bridge emulated by 'qemu -M q35'.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We have serious dangling else bugs waiting to happen in our for_each_
style macros with ifs. Consider, for example,
#define for_each_power_domain(domain, mask) \
for ((domain) = 0; (domain) < POWER_DOMAIN_NUM; (domain)++) \
if ((1 << (domain)) & (mask))
If this is used in context:
if (condition)
for_each_power_domain(domain, mask);
else
foo();
foo() will be called for each domain *not* in mask, if condition holds,
and not at all if condition doesn't hold.
Fix this by reversing the conditions in the macros, and adding an else
branch for the "for each" block, so that other if/else blocks can't
interfere. Provide a "for_each_if" helper macro to make it easier to get
this right.
v2: move for_each_if to drmP.h in a separate patch.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1448392916-2281-2-git-send-email-jani.nikula@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
During suspend-to-idle we need to keep the DMC firmware active and DC6
enabled, since otherwise we won't reach deep system power states like
PC9/10. The lead for this came from Nivedita who noticed that the
kernel's turbostat tool didn't report any PC9/10 residency change
across an 'echo freeze > /sys/power/state'.
Reported-by: Nivedita Swaminathan <nivedita.swaminathan@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1447860750-18110-1-git-send-email-imre.deak@intel.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWUmHZAAoJEHm+PkMAQRiGHtcH/RVRsn8re0WdRWYaTr9+Hknm
CGlRJN4LKecttgYQ/2bS1QsDbt8usDPBiiYVopqGXQxPBmjyDAqPjsa+8VzCaVc6
WA+9LDB+PcW28lD6BO+qSZCOAm7hHSZq7dtw9x658IqO+mI2mVeCybsAyunw2iWi
Kf5q90wq6tIBXuT8YH9MXGrSCQw00NclbYeYwB9CmCt9hT/koEFBdl7uFUFitB+Q
GSPTz5fXhgc5Lms85n7flZlrVKoQKmtDQe4/DvKZm+SjsATHU9ru89OxDBdS5gSG
YcEIM4zc9tMjhs3GC9t6WXf6iFOdctum8HOhUoIN/+LVfeOMRRwAhRVqtGJ//Xw=
=DCUg
-----END PGP SIGNATURE-----
Merge tag 'v4.4-rc2' into drm-intel-next-queued
Linux 4.4-rc2
Backmerge to get at
commit 1b0e3a049e
Author: Imre Deak <imre.deak@intel.com>
Date: Thu Nov 5 23:04:11 2015 +0200
drm/i915/skl: disable display side power well support for now
so that we can proplery re-eanble skl power wells in -next.
Conflicts are just adjacent lines changed, except for intel_fbdev.c
where we need to interleave the changs. Nothing nefarious.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
This reverts
commit 6764e9f872
Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Date: Thu Aug 27 15:44:06 2015 +0200
drm/i915: skip modeset if compatible for everyone.
Bring back the i915.fastboot module parameter, disabled by default, due
to backlight regression on Chromebook Pixel 2015.
Apparently the firmware of the Chromebook in question enables the panel
but disables backlight to avoid a brief garbage scanout upon loading the
kernel/module. With fastboot, we leave the backlight untouched, in this
case disabled. The user would have to do a modeset (i.e. not just crank
up the brightness) to enable the backlight.
There is no clean fix readily available, so get back to the drawing
board by reverting.
[N.B. The reference below is for when the thread was included on public
lists, and some of the context had already been dropped by then.]
Reported-and-tested-by: Olof Johansson <olof@lixom.net>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
References: http://marc.info/?i=CAKMK7uES7xk05ki92oeX6gmvZWAh9f2vL7yz=6T+fGK9J3X7cQ@mail.gmail.com
Fixes: 6764e9f872 ("drm/i915: skip modeset if compatible for everyone.")
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1447921590-3785-1-git-send-email-jani.nikula@intel.com
Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.
This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.
The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.
As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
lea 0x70024(%rdx,%rax,1),%r9d
mov $0x1,%edx
- movslq %r9d,%r9
- mov %r9,%rsi
- mov %r9,-0x58(%rbp)
- callq *0xd8(%rbx)
+ mov %r9d,%esi
+ mov %r9d,-0x48(%rbp)
callq *0xd8(%rbx)
So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.
v2: i915_mmio_reg_{offset,equal,valid}() helpers added
s/_REG/_MMIO/ in the register defines
mo more switch statements left to worry about
ring_emit stuff got sorted in a prep patch
cmd parser, lrc context and w/a batch buildup also in prep patch
vgpu stuff cleaned up and moved to a prep patch
all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com
When diagnosing a unrelated bug for someone on irc, it would seem the hardware can
be brought up by the BIOS with the embedded displayport using the SPLL for spread spectrum.
Right now this is not handled well in i915, and it calculates the crtc needs to
be reprogrammed on the first modeset without SSC, but the SPLL itself was kept
active. Fix this by exposing SPLL as a shared pll that will not be returned
by intel_get_shared_dpll; you have to know it exists to use it.
Changes since v1:
- Create a separate dpll_hw_state.spll for spll, and use
separate pll functions for spll.
Tested-by: Emil Renner Berthing <kernel@esmil.dk>
Tested-by: Gabriel Feceoru <gabriel.feceoru@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1447681332-6318-1-git-send-email-maarten.lankhorst@linux.intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Currently the gmbus code uses intel_aux_display_runtime_get/put in an
effort to make sure the hardware is powered up sufficiently for gmbus.
That function only takes the runtime PM reference which on VLV/CHV/BXT
is not enough. We need the disp2d/pipe-a well on VLV/CHV and power well
2 on BXT. So add a new power domnain for gmbus and kill off the now
unused intel_aux_display_runtime_get/put. And change
intel_hdmi_set_edid() to use the gmbus power domain too since that's all
we need there.
Also toss in a BUILD_BUG_ON() to catch problems if we run out of
bits for power domains. We're already really close to the limit...
[Patrik: Add gmbus string to debugfs output]
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1447084107-8521-5-git-send-email-patrik.jakobsson@linux.intel.com
Drop the EDP_PSR_BASE() thing, and just stick the PSR register offset
under dev_priv, like we for DSI and GPIO for example.
TODO: could probably move a bunch of this kind of stuff into the device
info instead...
v2: Drop the spurious whitespace change (Jani)
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1447266856-30249-7-git-send-email-ville.syrjala@linux.intel.com
Two benefits:
- We can use FW_LOADER_USERSPACE_FALLBACK.
- We can use flush_work to synchronize with the oustanding worker,
which is a notch more obvious what it does than having a special
completion.
The next patch will properly synchronize against the async loader in
the resume and unload code.
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Sunil Kamath <sunil.kamath@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Animesh Manna <animesh.manna@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Tested-by: Daniel Stone <daniels@collabora.com> # SKL
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1446069547-24760-11-git-send-email-imre.deak@intel.com
If we really want to we can be more verbose here, but we really don't
need an entire function for this.
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Sunil Kamath <sunil.kamath@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Animesh Manna <animesh.manna@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Tested-by: Daniel Stone <daniels@collabora.com> # SKL
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1446069547-24760-7-git-send-email-imre.deak@intel.com
This removes two anti-patterns:
- Locking shouldn't be used to synchronize with async work (of any
form, whether callbacks, workers or other threads). This is what the
mutex_lock/unlock seems to have been for in intel_csr_load_program.
Instead ordering should be ensured with the generic
wait_for_completion()/complete(). Or more specific functions
provided by the core kernel like e.g.
flush_work()/cancel_work_sync() in the case of synchronizing with a
work item.
- Don't invent own completion like the following code did with the
(already removed) wait_for(csr_load_status_get()) pattern - it's
really hard to get these right when you want them to be _really_
correct (and be fast) in all cases. Furthermore it's easier to read
code using the well-known primitives than new ones using
non-standard names.
Before enabling/disabling DC6 check if the firmware is loaded
successfully. This is guaranteed during runtime s/r, since otherwise we
don't enable RPM, but not during system s/r.
Note that it's still unclear whether we need to enable/disable DC6
during system s/r, until that's clarified, keep the current behavior and
enable/disable DC6.
Also after this patch there is a race during system s/r where the
firmware may not be loaded yet, that's addressed in an upcoming patch.
v2-v3:
- unchanged
v4:
- rebased on latest drm-intel-nightly
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Sunil Kamath <sunil.kamath@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Animesh Manna <animesh.manna@intel.com>
[imre: added code and note about checking if the firmware loaded ok,
before enabling/disabling it]
Reviewed-by: Animesh Manna <animesh.manna@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Tested-by: Daniel Stone <daniels@collabora.com> # SKL
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1447341037-2623-1-git-send-email-imre.deak@intel.com
That can be handy later on to tell which DMC firmware version the user
has, by just looking at the dmesg.
v2: use DRM_DEBUG_DRIVER (Chris)
v3: use DRM_INFO (Marc Herbert)
Cc: Marc Herbert <marc.herbert@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> (v1)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1445950025-5793-1-git-send-email-mika.kuoppala@intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
Tested-by: Daniel Stone <daniels@collabora.com> # SKL
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
I wanted to add yet another check to intel_fbc_update() and realized
I would need to create yet another enum no_fbc_reason case. So I
remembered this patch series that Damien wrote a long time ago and
nobody ever reviewed, so I decided to reimplement it since the code
changed a lot since then.
Credits-to: Damien Lespiau <damien.lespiau@intel.com>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1445964628-30226-2-git-send-email-paulo.r.zanoni@intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Make pinning and waiting a separate step, and wait for object idle
without struct_mutex held.
Changes since v1:
- Do not wait when a reset is in progress.
- Remove call to i915_gem_object_wait_rendering for
intel_overlay_do_put_image (Chris Wilson)
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Bunch of -fixes for 4.4. Well not just, I've left the mmio/register work
from Ville in here since it's low-risk but lots of churn all over.
* tag 'drm-intel-next-fixes-2015-10-22' of git://anongit.freedesktop.org/drm-intel: (23 commits)
drm/i915: Use round to closest when computing the CEA 1.001 pixel clocks
drm/i915: Kill the leftover RMW from ivb_sprite_disable()
drm/i915: restore ggtt double-bind avoidance
drm/i915/skl: Enable pipe gamma for sprite planes.
drm/i915/skl+: Enable pipe CSC on cursor planes. (v2)
MAINTAINERS: add link to the Intel Graphics for Linux web site
drm/i915: Move skl/bxt gt specific workarounds to ring init
drm/i915: Drop i915_gem_obj_is_pinned() from set-cache-level
drm/i915: revert a few more watermark commits
drm/i915: Remove dev_priv argument from NEEDS_FORCE_WAKE
drm/i915: Clean up LVDS register handling
drm/i915: Throw out some useless variables
drm/i915: Parametrize and fix SWF registers
drm/i915: s/PIPE_FRMCOUNT_GM45/PIPE_FRMCOUNT_G4X/ etc.
drm/i915: Turn GEN5_ASSERT_IIR_IS_ZERO() into a function
drm/i915: Fix a few bad hex numbers in register defines
drm/i915: Protect register macro arguments
drm/i915: Include gpio_mmio_base in GMBUS reg defines
drm/i915: Parametrize HSW video DIP data registers
drm/i915: Eliminate weird parameter inversion from BXT PPS registers
...
Kabylake is a Intel® Processor containing Intel® HD Graphics
following Skylake.
It is Gen9p5, so it inherits everything from Skylake.
Let's start by adding the platform separated from Skylake
but reusing most of all features, functions etc. Later we
rebase the PCI-ID patch without is_skylake=1
so we don't replace what original Author did there.
Few IS_SKYLAKEs if statements are not being covered by this patch
on purpose:
- Workarounds: Kabylake is derivated from Skylake H0 so no
W/As apply here.
- GuC: A following patch removes Kabylake support with an
explanation: No firmware available yet.
- DMC/CSR: Done in a separated patch since we need to be carefull
and load the version for revision 7 since
Kabylake is Skylake H0.
v2: relative cleaner commit message and added the missed
IS_KABYLAKE to intel_i2c.c as pointed out by Jani.
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
There's no need for __raw_i915_read8() & co. to be macros, so make them
inline functions. To avoid typo mistakes generate the inline functions
using preprocessor templates.
We have a few users of the raw register acces functions outside
intel_uncore.c, so let's also move the functions into intel_drv.h.
While doing that switch I915_READ_FW() & co. to use the
__raw_i915_read() functions, and use the _FW macros everywhere
outside intel_uncore.c where we want to read registers without
grabbing forcewake and whatnot. The only exception is
i915_check_vgpu() which itself gets called from intel_uncore.c,
so using the __raw_i915_read stuff there seems appropriate.
v2: Squash in the intel_uncore.c->i915_drv.h move
Convert I915_READ_FW() to use __raw_i915_read(), and use
I915_READ_FW() outside of intel_uncore.c (Chris)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1445517300-28173-2-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
v2: Don't forget to actually check the cstate->active value when
tallying up the number of active CRTC's. (Ander)
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Smoke-tested-by: Paulo Zanoni <przanoni@gmail.com>
Link: http://patchwork.freedesktop.org/patch/59561/
Calculate pipe watermarks during atomic calculation phase, based on the
contents of the atomic transaction's state structure. We still program
the watermarks at the same time we did before, but the computation now
happens much earlier.
While this patch isn't too exciting by itself, it paves the way for
future patches. The eventual goal (which will be realized in future
patches in this series) is to calculate multiple sets up watermark
values up front, and then program them at different times (pre- vs
post-vblank) on the platforms that need a two-step watermark update.
While we're at it, s/intel_compute_pipe_wm/ilk_compute_pipe_wm/ since
this function only applies to ILK-style watermarks and we have a
completely different function for SKL-style watermarks.
Note that the original code had a memcmp() in ilk_update_wm() to avoid
calling ilk_program_watermarks() if the watermarks hadn't changed. This
memcmp vanishes here, which means we may do some unnecessary result
generation and merging in cases where watermarks didn't change, but the
lower-level function ilk_write_wm_values already makes sure that we
don't actually try to program the watermark registers again.
v2: Squash a few commits from the original series together; no longer
leave pre-calculated wm's in a separate temporary structure since
it's easier to follow the logic if we just cut over to using the
pre-calculated values directly.
v3:
- Pass intel_crtc instead of drm_crtc to .compute_pipe_wm() entrypoint
and use intel_atomic_get_crtc_state() to avoid need for extra
casting. (Ander)
- Drop unused intel_check_crtc() function prototype. (Ander)
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Smoke-tested-by: Paulo Zanoni <przanoni@gmail.com>
Link: http://patchwork.freedesktop.org/patch/60363/
The only platform that still has an update_sprite_wm entrypoint is SKL;
on SKL, intel_update_sprite_watermarks just updates intel_plane->wm and
then performs a regular watermark update. However intel_plane->wm is
only used to update a couple fields in intel_wm_config, and those fields
are never used by the SKL code, so on SKL an update_sprite_wm is
effectively identical to an update_wm call. Since we're already
ensuring that the regular intel_update_wm is called any time we'd try to
call intel_update_sprite_watermarks, the whole call is redundant and can
be dropped.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Smoke-tested-by: Paulo Zanoni <przanoni@gmail.com>
Link: http://patchwork.freedesktop.org/patch/60372/
Revision checks are almost always accompanied by a platform check. (The
exceptions are platform specific code.) Add helpers to check for a
platform and a revision range: IS_SKL_REVID() and IS_BXT_REVID(). In
most places this simplifies and clarifies the code. It will be obvious
that revid macros are used for the correct platform.
This should make it easier to find all the revision checks for
workarounds for each platform, and make it easier to remove them once we
drop support for early hardware revisions.
This should also make it easier to differentiate between Skylake and
Kabylake revision checks when Kabylake support is added.
v2: rebase
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1445343722-3312-3-git-send-email-jani.nikula@intel.com
- dmc fixes from Animesh (not yet all) for deeper sleep states
- piles of prep patches from Ville to make mmio functions type-safe
- more fbc work from Paulo all over
- w/a shuffling from Arun Siluvery
- first part of atomic watermark updates from Matt and Ville (later parts had to
be dropped again unfortunately)
- lots of patches to prepare bxt dsi support ( Shashank Sharma)
- userptr fixes from Chris
- audio rate interface between i915/snd_hda plus kerneldoc (Libin Yang)
- shrinker improvements and fixes (Chris Wilson)
- lots and lots of small patches all over
* tag 'drm-intel-next-2015-10-10' of git://anongit.freedesktop.org/drm-intel: (134 commits)
drm/i915: Update DRIVER_DATE to 20151010
drm/i915: Partial revert of atomic watermark series
drm/i915: Early exit from semaphore_waits_for for execlist mode.
drm/i915: Remove wrong warning from i915_gem_context_clean
drm/i915: Determine the stolen memory base address on gen2
drm/i915: fix FBC buffer size checks
drm/i915: fix CFB size calculation
drm/i915: remove pre-atomic check from SKL update_primary_plane
drm/i915: don't allocate fbcon from stolen memory if it's too big
Revert "drm/i915: Call encoder hotplug for init and resume cases"
Revert "drm/i915: Add hot_plug hook for hdmi encoder"
drm/i915: use error path
drm/i915/irq: Fix misspelled word register in kernel-doc
drm/i915/irq: Fix kernel-doc warnings
drm/i915: Hook up ring workaround writes at context creation time on Gen6-7.
drm/i915: Don't warn if the workaround list is empty.
drm/i915: Resurrect golden context on gen6/7
drm/i915/chv: remove pre-production hardware workarounds
drm/i915/snb: remove pre-production hardware workaround
drm/i915/bxt: Set time interval unit to 0.833us
...
Another round of drm-misc. Unfortunately the DRM_UNLOCKED removal for
DRIVER_MODESET isn't complete yet for lack of review on 1-2 patches.
Otherwise just various stuff all over.
* tag 'topic/drm-misc-2015-10-08' of git://anongit.freedesktop.org/drm-intel:
drm: Stop using drm_vblank_count() as the hw frame counter
drm/irq: Use unsigned int pipe in public API
drm: Use DRM_ROTATE_MASK and DRM_REFLECT_MASK
drm: Add DRM_ROTATE_MASK and DRM_REFLECT_MASK
vga_switcheroo: Add missing locking
vgaarb: use kzalloc in vga_arbiter_add_pci_device()
drm: Don't zero vblank timestamps from the irq handler
drm: Hack around CONFIG_AGP=m build failures
drm/i915: Remove setparam ioctl
drm: Remove dummy agp ioctl wrappers
drm/vmwgfx: Stop checking for DRM_UNLOCKED
drm/drm_ioctl.c: kerneldoc
drm: Define a drm_invalid_op ioctl implementation
drm: Remove __OS_HAS_AGP
drm/doc: Update docs about device instance setup
This is a squash of the following commits:
Revert "drm/i915: Drop intel_update_sprite_watermarks"
This reverts commit 47c99438b5.
Revert "drm/i915/ivb: Move WaCxSRDisabledForSpriteScaling w/a to atomic check"
This reverts commit 7809e5ae35.
Revert "drm/i915/skl: Eliminate usage of pipe_wm_parameters from SKL-style WM (v3)"
This reverts commit 3a05f5e2e7.
With these reverts, SKL finally stops failing every single FBC test
with FIFO underrun error messages. After some brief testing, it also
seems that this commit prevents the machine from completely freezing
when we run igt/kms_fbc_crc (see fd.o #92355).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92355
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Parametrize the SWF registers. This also fixes the register offsets,
which were mostly garbage in the old defines.
Also save/restore only as many SWF registers that each platform has.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
i915 expects the OpRegion to be cached (i.e. not __iomem), so explicitly
map it with memremap rather than the implied cache setting of
acpi_os_ioremap().
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: David Airlie <airlied@linux.ie>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It's been reported that the atomic watermark series triggers some
regressions on SKL, which we haven't been able to track down yet. Let's
temporarily revert these patches while we track down the root cause.
This commit squashes the reverts of:
76305b1 drm/i915: Calculate watermark configuration during atomic check (v2)
a4611e4 drm/i915: Don't set plane visible during HW readout if CRTC is off
a28170f drm/i915: Calculate ILK-style watermarks during atomic check (v3)
de4a9f8 drm/i915: Calculate pipe watermarks into CRTC state (v3)
de165e0 drm/i915: Refactor ilk_update_wm (v3)
Reference: http://lists.freedesktop.org/archives/intel-gfx/2015-October/077190.html
Cc: "Zanoni, Paulo R" <paulo.r.zanoni@intel.com>
Cc: "Vetter, Daniel" <daniel.vetter@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Exclude active GPU pages from the purview of the background shrinker
(kswapd), as these cause uncontrollable GPU stalls. Given that the
shrinker is rerun until the freelists are satisfied, we should have
opportunity in subsequent passes to recover the pages once idle. If the
machine does run out of memory entirely, we have the forced idling in the
oom-notifier as a means of releasing all the pages we can before an oom
is prematurely executed.
Note that this relies upon an up-front retire_requests to keep the
inactive list in shape, which was added in a previous patch, mostly as
execlist ctx pinning band-aids.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Add note about retire_requests.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With UMS gone, we no longer use it during suspend. And with the last
user removed from the shrinker, we can remove the dead code.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pull in the i915/hda changes for N/CTS setting so I can apply the
follow-up documentation work for drm/i915.
Some conflicts because ofc we had to rework i915 while that N/CTS work
was going on. But not more than adjacent changes really.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
As the shrinker_control now passes us unsigned long targets, update our
shrinker functions to match.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Prevent leaking VMAs and PPGTT VMs when objects are imported
via flink.
Scenario is that any VMAs created by the importer will be left
dangling after the importer exits, or destroys the PPGTT context
with which they are associated.
This is caused by object destruction not running when the
importer closes the buffer object handle due the reference held
by the exporter. This also leaks the VM since the VMA has a
reference on it.
In practice these leaks can be observed by stopping and starting
the X server on a kernel with fbcon compiled in. Every time
X server exits another VMA will be leaked against the fbcon's
frame buffer object.
Also on systems where flink buffer sharing is used extensively,
like Android, this leak has even more serious consequences.
This version is takes a general approach from the earlier work
by Rafael Barbalho (drm/i915: Clean-up PPGTT on context
destruction) and tries to incorporate the subsequent discussion
between Chris Wilson and Daniel Vetter.
v2:
Removed immediate cleanup on object retire - it was causing a
recursive VMA unbind via i915_gem_object_wait_rendering. And
it is in fact not even needed since by definition context
cleanup worker runs only after the last context reference has
been dropped, hence all VMAs against the VM belonging to the
context are already on the inactive list.
v3:
Previous version could deadlock since VMA unbind waits on any
rendering on an object to complete. Objects can be busy in a
different VM which would mean that the cleanup loop would do
the wait with the struct mutex held.
This is an even simpler approach where we just unbind VMAs
without waiting since we know all VMAs belonging to this VM
are idle, and there is nothing in flight, at the point
context destructor runs.
v4:
Double underscore prefix for __915_vma_unbind_no_wait and a
commit message typo fix. (Michel Thierry)
Note that this is just a partial/interim fix since we have a bit a
fundamental issue with cleaning up, e.g.
https://bugs.freedesktop.org/show_bug.cgi?id=87729
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Testcase: igt/gem_ppgtt.c/flink-and-exit-vma-leak
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Rafael Barbalho <rafael.barbalho@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
[danvet: Add a note that this isn't everything.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The BIOS can leave the CHV display PHY in some odd state where
some of the LDOs/lanes won't power down fully when unused. This
will trigger a host of asserts that were added in:
30142273a3 drm/i915: Add CHV PHY LDO power sanity checks
6669e39f95 drm/i915: Add some CHV DPIO lane power state asserts
To avoid that, skip the asserts until the PHY power well has been
disabled at least once. That will fully reset the PHY, and once
brought back up, the dynamic power down features will work correctly.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There are some allocations that must be only referenced by 32-bit
offsets. To limit the chances of having the first 4GB already full,
objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
DRM_MM_CREATE_TOP flags
In specific, any resource used with flat/heapless (0x00000000-0xfffff000)
General State Heap (GSH) or Instruction State Heap (ISH) must be in a
32-bit range, because the General State Offset and Instruction State
Offset are limited to 32-bits.
Objects must have EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag to indicate if
they can be allocated above the 32-bit address range. To limit the
chances of having the first 4GB already full, objects will use
DRM_MM_SEARCH_BELOW + DRM_MM_CREATE_TOP flags when possible.
The libdrm user of the EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag is here:
http://lists.freedesktop.org/archives/intel-gfx/2015-September/075836.html
v2: Changed flag logic from neeeds_32b, to supports_48b.
v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel)
v4: Split pin flags into PIN_ZONE_4G and PIN_HIGH; update PIN_OFFSET_MASK
to use last PIN_ defined instead of hard-coded value; use correct limit
check in eb_vma_misplaced. (Chris)
v5: Don't touch PIN_OFFSET_MASK and update workaround comment (Chris)
v6: Apply pin-high for ggtt too (Chris)
v7: Handle simultaneous pin-high and pin-mappable end correctly (Akash)
Fix check for entries currently using +4GB addresses, use min_t and
other polish in object_bind_to_vm (Chris)
v8: Commit message updated to point to libdrm patch.
v9: vmas are allocated in the correct ozone, so only check flag when the
vma has not been allocated. (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v4)
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: Don't forget to actually check the cstate->active value when
tallying up the number of active CRTC's. (Ander)
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Calculate pipe watermarks during atomic calculation phase, based on the
contents of the atomic transaction's state structure. We still program
the watermarks at the same time we did before, but the computation now
happens much earlier.
While this patch isn't too exciting by itself, it paves the way for
future patches. The eventual goal (which will be realized in future
patches in this series) is to calculate multiple sets up watermark
values up front, and then program them at different times (pre- vs
post-vblank) on the platforms that need a two-step watermark update.
While we're at it, s/intel_compute_pipe_wm/ilk_compute_pipe_wm/ since
this function only applies to ILK-style watermarks and we have a
completely different function for SKL-style watermarks.
Note that the original code had a memcmp() in ilk_update_wm() to avoid
calling ilk_program_watermarks() if the watermarks hadn't changed. This
memcmp vanishes here, which means we may do some unnecessary result
generation and merging in cases where watermarks didn't change, but the
lower-level function ilk_write_wm_values already makes sure that we
don't actually try to program the watermark registers again.
v2: Squash a few commits from the original series together; no longer
leave pre-calculated wm's in a separate temporary structure since
it's easier to follow the logic if we just cut over to using the
pre-calculated values directly.
v3:
- Pass intel_crtc instead of drm_crtc to .compute_pipe_wm() entrypoint
and use intel_atomic_get_crtc_state() to avoid need for extra
casting. (Ander)
- Drop unused intel_check_crtc() function prototype. (Ander)
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The only platform that still has an update_sprite_wm entrypoint is SKL;
on SKL, intel_update_sprite_watermarks just updates intel_plane->wm and
then performs a regular watermark update. However intel_plane->wm is
only used to update a couple fields in intel_wm_config, and those fields
are never used by the SKL code, so on SKL an update_sprite_wm is
effectively identical to an update_wm call. Since we're already
ensuring that the regular intel_update_wm is called any time we'd try to
call intel_update_sprite_watermarks, the whole call is redundant and can
be dropped.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
A bunch of SKL watermark-related structures have the cursor plane as a
separate entry from the rest of the planes. Since a previous patch
updated I915_MAX_PLANES such that those plane arrays now have a slot for
the cursor, update the code to use the new slot in the existing plane
arrays and kill off the cursor-specific structures.
There shouldn't be any functional change here; this is just shuffling
around how the data is stored in some of the data structures. The whole
patch is generated with Coccinelle via the following semantic patch:
@@ struct skl_pipe_wm_parameters WMP; @@
- WMP.cursor
+ WMP.plane[PLANE_CURSOR]
@@ struct skl_pipe_wm_parameters *WMP; @@
- WMP->cursor
+ WMP->plane[PLANE_CURSOR]
@@ @@
struct skl_pipe_wm_parameters {
...
- struct intel_plane_wm_parameters cursor;
...
};
@@
struct skl_ddb_allocation DDB;
expression E;
@@
- DDB.cursor[E]
+ DDB.plane[E][PLANE_CURSOR]
@@
struct skl_ddb_allocation *DDB;
expression E;
@@
- DDB->cursor[E]
+ DDB->plane[E][PLANE_CURSOR]
@@ @@
struct skl_ddb_allocation {
...
- struct skl_ddb_entry cursor[I915_MAX_PIPES];
...
};
@@
struct skl_wm_values WMV;
expression E1, E2;
@@
(
- WMV.cursor[E1][E2]
+ WMV.plane[E1][PLANE_CURSOR][E2]
|
- WMV.cursor_trans[E1]
+ WMV.plane_trans[E1][PLANE_CURSOR]
)
@@
struct skl_wm_values *WMV;
expression E1, E2;
@@
(
- WMV->cursor[E1][E2]
+ WMV->plane[E1][PLANE_CURSOR][E2]
|
- WMV->cursor_trans[E1]
+ WMV->plane_trans[E1][PLANE_CURSOR]
)
@@ @@
struct skl_wm_values {
...
- uint32_t cursor[I915_MAX_PIPES][8];
...
- uint32_t cursor_trans[I915_MAX_PIPES];
...
};
@@ struct skl_wm_level WML; @@
(
- WML.cursor_en
+ WML.plane_en[PLANE_CURSOR]
|
- WML.cursor_res_b
+ WML.plane_res_b[PLANE_CURSOR]
|
- WML.cursor_res_l
+ WML.plane_res_l[PLANE_CURSOR]
)
@@ struct skl_wm_level *WML; @@
(
- WML->cursor_en
+ WML->plane_en[PLANE_CURSOR]
|
- WML->cursor_res_b
+ WML->plane_res_b[PLANE_CURSOR]
|
- WML->cursor_res_l
+ WML->plane_res_l[PLANE_CURSOR]
)
@@ @@
struct skl_wm_level {
...
- bool cursor_en;
...
- uint16_t cursor_res_b;
- uint8_t cursor_res_l;
...
};
v2: Use a PLANE_CURSOR enum entry rather than making the code reference
I915_MAX_PLANES or I915_MAX_PLANES+1, which was confusing. (Ander)
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Let the compiler figure out what I915_MAX_PLANES is from 'enum plane' so
that we don't need a separate #define.
While we're at it, add the cursor plane to the enum. This will cause
I915_MAX_PLANES to now include the cursor plane in its count (it didn't
previously). This change is safe since we currently only use this
value in array declarations (never in the actual code logic); we just
wind up allocating slightly more memory than we need to. A followup
patch will cause various parts of the code to start using the extra
array element where appropriate.
(This patch probably should have been squashed with the followup patch,
but I couldn't figure out how to get Coccinelle to modify enum
declarations...)
Suggested-by: Ander Conselvan De Oliveira <conselvan2@gmail.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This was only used for the ums+gem combo, so ripe for removal now that
we only have kms code left.
v2: Drop fence_reg_start since it's now unused, noticed by Ville.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Previously we've relied on having basically one backlight and one
backlight type per platform. This is already a bit quirky with PMIC PWM
support on VLV/CHV platforms with MIPI DSI. In the foreseeable future
we'll have at least DPCD based backlight control on eDP and DCS command
based backlight control on MIPI DSI. Backlight is becoming more and more
connector specific, so reflect this fact by making the backlight control
hooks connector specific.
This enables further work to reuse generic backlight code in
intel_panel.c while adding more specific backlight code accessed via the
hooks.
Cc: Deepak M <m.deepak@intel.com>
Cc: Yetunde Adebisi <yetundex.adebisi@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Deepak M <m.deepak@intel.com>
Reviewed-by: Yetunde Adebisi <yetundex.adebisi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As with the cdclk, read out czclk from CCK as well. This gives us the
real current value and avoids having to decode fuses and whatnot.
Also store it in kHz under dev_priv like we do for cdlck since it's not
just an rps related clock, and having it in kHz is more
standard/convenient for some things.
Imre also pointed out that we currently fail to read czclk on VLV, which
means the PFI credit programming isn't working as expected.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Rename the function argument to 'adjusted_mode' whenever the function
only ever gets passed the adjusted_mode.
v2: Update due to intel_dsi.c changes
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Virtualized systems often use a virtual P2X4 south bridge.
Detect this in intel_detect_pch and make a best guess as to which PCH
we should be using.
This was seen on vmware esxi hypervisor. When passing the graphics device
through to a guest, it can not pass through the PCH. Instead it simulates
a P2X4 southbridge.
Signed-off-by: Robert Beckett <robert.beckett@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Backmerge to catch up with 4.3. slightly more involved conflict in the
irq code, but nothing beyond adjacent changes.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
HDMI audio may not work at some frequencies
with the HW provided N/CTS.
This patch sets the proper N value for the
given audio sample rate at the impacted frequencies.
At other frequencies, it will use the N/CTS value
which HW provides.
Signed-off-by: Libin Yang <libin.yang@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: Nick Hoath <nicholas.hoath@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
An HPD interrupt may fire while we are in a function that changes
the PORT_HOTPLUG_EN register - especially when an HPD interrupt
storm occurs.
Since the interrupt handler changes the enabled HPD lines when it
detects such a storm the read-modify-write cycles may interfere.
To avoid this, shiled the rmw cycles with IRQ save spinlocks.
Changes since v1:
- Implement a function which takes care of accessing PORT_HOTPLUG_EN.
Signed-off-by: Egbert Eich <eich@suse.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It would be initialized just moments later by i915_init_vm.
Rearrange the code such that i915_init_vm() is next to its callers
inside i915_gem_gtt (and so we can make it static). After removing the
dance around the files, it is clear that we are repeating some work
inside the initializers (such as calling drm_mm_init() multiple times),
so take advantage of the refactor to also remove some redundant code and
clean up the interface.
v2: Commit msg update,
s/i915_init_vm/i915_address_space_init, move to i915_gem_gtt.c,
init address_space during i915_gem_setup_global_gtt for ggtt.
v3: Do not init global_link - we are adding it to vm_list moments later,
make i915_address_space_init static, use OOP style parameter order.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This commit is essentially a rewrite of "drm/i915: Check pixel format
for fbc" from Ville Syrjälä. The idea is the same, but the code is
different due to all the changes that happened since his original
patch. So any bugs are due to my bad rewrite.
v2:
- Drop the alpha formats (Ville).
v3:
- Drop the stale comment (Ville).
Testcases: igt/kms_frontbuffer_tracking/*fbc*-${format_name}-draw-*
Credits-to: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
BSpec says we shouldn't enable FBC on HSW/BDW when the pipe pixel rate
exceeds 95% of the core display clock.
v2:
- HSW also needs the WA (Ville).
- Add the WA name (Ville).
- Use the current cdclk (Ville).
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The FBC hardware for these platforms doesn't have access to the
bios_reserved range, so it always assumes the maximum (8mb) is used.
So avoid this range while allocating.
This solves a bunch of FIFO underruns that happen if you end up
putting the CFB in that memory range. On my machine, with 32mb of
stolen, I need a 2560x1440 mode for that.
Testcase: igt/kms_frontbuffer_tracking/fbc-* (given the right setup)
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Don't allow FBC for cases where the spec says we can't FBC.
v2:
- Just WARN_ON() the strides that should have been caught earlier
(Daniel)
- Make it a new function since I expect this to grow more.
v3:
- Document which IGT test is exercised by this.
v4:
- Implement the restrictions for gens 2-6 too (Ville).
- Fix off-by-one mistake (Ville).
Testcase: igt/kms_frontbuffer_tracking/fbc-badstride
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It will be usefull to specify w/a that affects only SKL GT3 and GT4.
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Modified HAS_CSR macro defination which earlier only supported
for skl, now added support for BXT.
v1: Initial version.
v2: Instaed of skylake/broxton check added gen9 check alone based
on review comment from Sunil.
Cc: Vetter, Daniel <daniel.vetter@intel.com>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Sunil Kamath <sunil.kamath@intel.com>
Signed-off-by: Animesh Manna <animesh.manna@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Shared frontbuffer bits are causing warnings when same FB is displayed
in another plane without clearing the bits from previous plane.
v2: Removing coversion of fb bits to 64 bit as it is not needed for now. (Daniel)
Change-Id: Ic2df80747f314b82afd22f8326297c57d1e652c6
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Kumar, Mahesh <mahesh1.kumar@intel.com>
[danvet: Drop INTEL_FRONTBUFFER_SPRITE_MASK since unused.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Extend init/init_hw split to context init.
- Move context initialisation in to i915_gem_init_hw
- Move one off initialisation for render ring to
i915_gem_validate_context
- Move default context initialisation to logical_ring_init
Rename intel_lr_context_deferred_create to
intel_lr_context_deferred_alloc, to reflect reduced functionality &
alloc/init split.
This patch is intended to split out the allocation of resources &
initialisation to allow easier reuse of code for resume/gpu reset.
v2: Removed function ptr wrapping of do_switch_context (Daniel Vetter)
Left ->init_context int intel_lr_context_deferred_alloc
(Daniel Vetter)
Remove unnecessary init flag & ring type test. (Daniel Vetter)
Improve commit message (Daniel Vetter)
v3: On init/reinit, set the hw next sequence number to the sw next
sequence number. This is set to 1 at driver load time. This prevents
the seqno being reset on reinit (Chris Wilson)
v4: Set seqno back to ~0 - 0x1000 at start-of-day, and increment by 0x100
on reset.
This makes it obvious which bbs are which after a reset. (David Gordon
& John Harrison)
Rebase.
v5: Rebase. Fixed rebase breakage. Put context pinning in separate
function. Removed code churn. (Thomas Daniel)
v6: Cleanup up issues introduced in v2 & v5 (Thomas Daniel)
Issue: VIZ-4798
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: David Gordon <david.s.gordon@intel.com>
Cc: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is done as a separate commit, to make it easier to revert
when things break.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pull drm fixes from Dave Airlie:
"Just a bunch of fixes to squeeze in before -rc1:
- three nouveau regression fixes
- one qxl regression fix
- a bunch of i915 fixes
... and some core displayport/atomic fixes"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/nouveau/device: enable c800 quirk for tecra w50
drm/nouveau/clk/gt215: Unbreak engine pausing for GT21x/MCP7x
drm/nouveau/gr/nv04: fix big endian setting on gr context
drm/qxl: validate monitors config modes
drm/i915: Allow DSI dual link to be configured on any pipe
drm/i915: Don't try to use DDR DVFS on CHV when disabled in the BIOS
drm/i915: Fix CSR MMIO address check
drm/i915: Limit the number of loops for reading a split 64bit register
drm/i915: Fix broken mst get_hw_state.
drm/i915: Pass hpd_status_i915[] to intel_get_hpd_pins() in pre-g4x
uapi/drm/i915_drm.h: fix userspace compilation.
drm/i915: Always mark the object as dirty when used by the GPU
drm/dp: Add dp_aux_i2c_speed_khz module param to set the assume i2c bus speed
drm/dp: Adjust i2c-over-aux retry count based on message size and i2c bus speed
drm/dp: Define AUX_RETRY_INTERVAL as 500 us
drm/atomic: Fix bookkeeping with TEST_ONLY, v3.
If one disables DDR DVFS in the BIOS, Punit will apparently ignores
all DDR DVFS request. Currently we assume that DDR DVFS is always
operational, which leads to errors in dmesg when the DDR DVFS requests
time out.
Fix the problem by gently prodding Punit during driver load to find out
whether it will respond to DDR DVFS requests. If the request times out,
we assume that DDR DVFS has been permanenly disabled in the BIOS and
no longer perster the Punit about it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91629
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com>
Tested-by: Clint Taylor <Clinton.A.Taylor@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
In I915_READ64_2x32 we attempt to read a 64bit register using 2 32bit
reads. Due to the nature of the registers we try to read in this manner,
they may increment between the two instruction (e.g. a timestamp
counter). To keep the result accurate, we repeat the read if we detect
an overflow (i.e. the upper value varies). However, some hardware is just
plain flaky and may endless loop as the the upper 32bits are not stable.
Just give up after a couple of tries and report whatever we read last.
v2: Use the most recent values when erring out on an unstable register.
Reported-by: russianneuromancer@ya.ru
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91906
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Normally we determine the backlight PWM modulation frequency (which we
also use as backlight max value) from the backlight registers at module
load time, expecting the registers have been initialized by the BIOS. If
this is not the case, we fail.
The VBT contains the backlight modulation frequency in Hz. Add platform
specific functions to convert the frequency in Hz to backlight PWM
modulation frequency, and use them to initialize the backlight when the
registers are not initialized by the BIOS.
v2: Fix SPT and VLV. Thanks to Clint for the VLV code.
Cc: Clint Taylor <clinton.a.taylor@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pull drm updates from Dave Airlie:
"This is the main pull request for the drm for 4.3. Nouveau is
probably the biggest amount of changes in here, since it missed 4.2.
Highlights below, along with the usual bunch of fixes.
All stuff outside drm should have applicable acks.
Highlights:
- new drivers:
freescale dcu kms driver
- core:
more atomic fixes
disable some dri1 interfaces on kms drivers
drop fb panic handling, this was just getting more broken, as more locking was required.
new core fbdev Kconfig support - instead of each driver enable/disabling it
struct_mutex cleanups
- panel:
more new panels
cleanup Kconfig
- i915:
Skylake support enabled by default
legacy modesetting using atomic infrastructure
Skylake fixes
GEN9 workarounds
- amdgpu:
Fiji support
CGS support for amdgpu
Initial GPU scheduler - off by default
Lots of bug fixes and optimisations.
- radeon:
DP fixes
misc fixes
- amdkfd:
Add Carrizo support for amdkfd using amdgpu.
- nouveau:
long pending cleanup to complete driver,
fully bisectable which makes it larger,
perfmon work
more reclocking improvements
maxwell displayport fixes
- vmwgfx:
new DX device support, supports OpenGL 3.3
screen targets support
- mgag200:
G200eW support
G200e new revision support
- msm:
dragonboard 410c support, msm8x94 support, msm8x74v1 support
yuv format support
dma plane support
mdp5 rotation
initial hdcp
- sti:
atomic support
- exynos:
lots of cleanups
atomic modesetting/pageflipping support
render node support
- tegra:
tegra210 support (dc, dsi, dp/hdmi)
dpms with atomic modesetting support
- atmel:
support for 3 more atmel SoCs
new input formats, PRIME support.
- dwhdmi:
preparing to add audio support
- rockchip:
yuv plane support"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (1369 commits)
drm/amdgpu: rename gmc_v8_0_init_compute_vmid
drm/amdgpu: fix vce3 instance handling
drm/amdgpu: remove ib test for the second VCE Ring
drm/amdgpu: properly enable VM fault interrupts
drm/amdgpu: fix warning in scheduler
drm/amdgpu: fix buffer placement under memory pressure
drm/amdgpu/cz: fix cz_dpm_update_low_memory_pstate logic
drm/amdgpu: fix typo in dce11 watermark setup
drm/amdgpu: fix typo in dce10 watermark setup
drm/amdgpu: use top down allocation for non-CPU accessible vram
drm/amdgpu: be explicit about cpu vram access for driver BOs (v2)
drm/amdgpu: set MEC doorbell range for Fiji
drm/amdgpu: implement burst NOP for SDMA
drm/amdgpu: add insert_nop ring func and default implementation
drm/amdgpu: add amdgpu_get_sdma_instance helper function
drm/amdgpu: add AMDGPU_MAX_SDMA_INSTANCES
drm/amdgpu: add burst_nop flag for sdma
drm/amdgpu: add count field for the SDMA NOP packet v2
drm/amdgpu: use PT for VM sync on unmap
drm/amdgpu: make wait_event uninterruptible in push_job
...
Make LPT:LP checks look neater by wrapping the details in a
new HAS_PCH_LPT_LP() macro.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Backmerge -fixes since there's more DDI-E related cleanups on top of
the pile of -fixes for skl that just landed for 4.3.
Conflicts:
drivers/gpu/drm/i915/intel_display.c
drivers/gpu/drm/i914/intel_dp.c
drivers/gpu/drm/i915/intel_lrc.c
Conflicts are all fairly harmless adjacent line stuff.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
When the audio codec is enabled or disabled, notify the audio driver.
This will enable the audio driver to get the notification at all times
(even when audio is in different powersave states).
Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Add a common function to return "yes" or "no" string based on the
argument, and drop the local versions of it.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The driver doesn't support UMS any more, so set DRIVER_MODESET by default,
remove the legacy s/r callbacks, and rename the s/r functions to make it more clear
they're only in use by switcheroo now.
Also remove an obsolete comment about atomic. Normal updates are supported only
async updates aren't yet.
v2: Don't unconditionally set DRIVER_ATOMIC, we're not yet there.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From B spec, DDI_E port belong to PowerWell 2, but
DDI_E share the powerwell_req/staus register bit with
DDI_A which belong to DDI_A_E_POWER_WELL.
In order to communicate with the connector on DDI-E, both
DDI_A_E_POWER_WELL and POWER_WELL_2 must be enabled.
Currently intel_dp_power_get(DDI_E) only enable
DDI_A_E_POWER_WELL, this patch will not only enable
DDI_a_E_POWER_WELL but also enable POWER_WELL_2.
This patch also fix the DDI-E hotplug function.
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
DDI-E doesn't have the correspondent GMBUS pin.
We rely on VBT to tell us which one it being used instead.
The DVI/HDMI on shared port couldn't exist.
This patch isn't tested without hardware wchich has HDMI
on DDI-E.
v2: fix trailing whitespace
v3: MISSING_CASE take place of BUG()
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
This partially reverts commit 74c090b1bd.
The DRIVER_ATOMIC cap cannot yet be exported because i915 lacks async
support.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Store max dotclock into dev_priv structure so we are able
to filter out the modes that are not supported by our
platforms.
V2:
- limit the max dot clock frequency to max CD clock frequency
for the gen9 and above
- limit the max dot clock frequency to 90% of the max CD clock
frequency for the older gens
- for Cherryview the max dot clock frequency is limited to 95%
of the max CD clock frequency
- for gen2 and gen3 the max dot clock limit is set to 90% of the
2X max CD clock frequency
V3:
- max_dotclk variable renamed as max_dotclk_freq in i915_drv.h
- in intel_compute_max_dotclk() the rounding method changed from
round up to round down when computing max dotclock
V4:
- Haswell and Broadwell supports now dot clocks up to max CD clock
frequency
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The current versions of these two macros don't work correctly if the
argument expression happens to contain a modulo operator (%) -- when
stringified, it gets interpreted as a printf formatting character!
With a specifically crafted parameter, this could probably cause a
kernel OOPS; consider WARN_ON(p%s) or WARN_ON(f %*pEp).
Instead, we should use an explicit "%s" format, with the stringified
expression as the coresponding literal-string argument.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: fix one error found by checkpath.pl
v3: Add one ignored break for switch-case. DDI-E hotplug
function doesn't work after updating drm-intel tree,
I checked the code and found this missing which isn't
the root cause for broke DDI-E hp. The broken
DDI-E hp function is fixed by "Adding DDI_E power
well domain".
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Tested-by: Timo Aaltonen <timo.aaltonen@canonical.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Bunch more fixes for 4.3, most of it skl fallout. It's not quite all yet,
there's still a few more patches pending to enable DDI-E correctly on skl.
Also included the dpms atomic work from Maarten since atomic is just a
pain and not including would cause piles of conflicts right from the
start.
* tag 'drm-intel-next-fixes-2015-08-16' of git://anongit.freedesktop.org/drm-intel: (67 commits)
drm/i915: Per-DDI I_boost override
drm/i915/skl: WaIgnoreDDIAStrap is forever, always init DDI A
drm/i915: fix checksum write for automated test reply
drm/i915: Contain the WA_REG macro
drm/i915: Remove the failed context from the fpriv->context_idr
drm/i915: Report IOMMU enabled status for GPU hangs
drm/i915: Check idle to active before processing CSQ
drm/i915: Set alternate aux for DDI-E
drm/i915: Set power domain for DDI-E
drm/i915: fix stolen bios_reserved checks
drm/i915: Use masked write for Context Status Buffer Pointer
drm/i915/skl WaDisableSbeCacheDispatchPortSharing
drm/i915: Spam less on dp aux send/receive problems
drm/i915: Handle return value in intel_pin_and_fence_fb_obj, v2.
drm/i915: Only update mode related state if a modeset happened.
drm/i915: Remove connectors_active.
drm/i915: Remove connectors_active from intel_dp.c, v2.
drm/i915: Remove connectors_active from sanitization, v2.
drm/i915: Get rid of dpms handling.
drm/i915: Make crtc checking use the atomic state, v2.
...
This fetches the required firmware image from the filesystem,
then loads it into the GuC's memory via a dedicated DMA engine.
This patch is derived from GuC loading work originally done by
Vinit Azad and Ben Widawsky.
v2:
Various improvements per review comments by Chris Wilson
v3:
Removed 'wait' parameter to intel_guc_ucode_load() as firmware
prefetch is no longer supported in the common firmware loader,
per Daniel Vetter's request.
Firmware checker callback fn now returns errno rather than bool.
v4:
Squash uC-independent code into GuC-specifc loader [Daniel Vetter]
Don't keep the driver working (by falling back to execlist mode)
if GuC firmware loading fails [Daniel Vetter]
v5:
Clarify WOPCM-related #defines [Tom O'Rourke]
Delete obsolete code no longer required with current h/w & f/w
[Tom O'Rourke]
Move the call to intel_guc_ucode_init() later, so that it can
allocate GEM objects, and have it fetch the firmware; then
intel_guc_ucode_load() doesn't need to fetch it later.
[Daniel Vetter].
v6:
Update comment describing intel_guc_ucode_load() [Tom O'Rourke]
Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tom O'Rourke <Tom.O'Rourke@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Similar to commit c44ef60e43 ("drm/i915/gtt:
Allow >= 4GB sizes for vm"), i915_gem_obj_offset and i915_gem_obj_ggtt_offset
return an unsigned long, which in only 4-bytes long in 32-bit kernels.
Change return type (and other related offset variables) to u64.
Since Global GTT is always limited to 4GB, this change would not be required
in i915_gem_obj_ggtt_offset, but this is done for consistency.
v2: Remove unnecessary offset variable in do_pin, as we already have
vma->node.start (Chris).
Update GGTT offset too (Tvrtko).
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Otherwise it can overflow in 48-bit mode, and cause an incorrect
exec_start.
Before commit 5f19e2bffa ("drm/i915: Merged
the many do_execbuf() parameters into a structure"), it was already an u64.
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: For semaphore errors, object is mapped to GGTT and offset will not
be > 4GB, print only lower 32-bits (Akash)
v3: Print gtt_offset in groups of 32-bit (Chris)
Cc: Akash Goel <akash.goel@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Introduces the Page Map Level 4 (PML4), ie. the new top level structure
of the page tables.
To facilitate testing, 48b mode will be available on Broadwell and
GEN9+, when i915.enable_ppgtt = 3.
v2: Remove unnecessary CONFIG_X86_64 checks, ppgtt code is already
32/64-bit safe (Chris).
v3: Add goto free_scratch in temp 48-bit mode init code (Akash).
v4: kfree the pdp until the 4lvl alloc/free patch (Akash).
v5: Postpone 48-bit code in sanitize_enable_ppgtt (Akash).
v6: Keep _insert_pte_entries changes outside this patch (Akash).
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
An OEM may request increased I_boost beyond the recommended values
by specifying an I_boost value to be applied to all swing entries for
a port. These override values are specified in VBT.
v2: rebase and remove unused iboost_bit variable
Issue: VIZ-5676
Signed-off-by: Antti Koskipaa <antti.koskipaa@linux.intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Backmerge drm-intel-fixes because a bunch of atomic patch backporting
we had to do lead to horrible conflicts.
Conflicts:
drivers/gpu/drm/drm_crtc.c
Just a bit of context conflict between -next and -fixes.
drivers/gpu/drm/i915/intel_atomic.c
drivers/gpu/drm/i915/intel_display.c
Atomic conflicts, always pick the code from -next.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
The IOMMU for Intel graphics has historically had many issues resulting
in random GPU hangs. Lets include its status when capturing the GPU hang
error state for post-mortem analysis.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There is no correspondent Aux channel for DDI-E.
So we need to rely on VBT to let us know witch one
is being used instead.
v2: Removing some trailing spaces and giving proper
credit to Xiong that added a nice way to avoid port
conflicts by setting supports_dp = 0 when using
equivalent aux for DDI-E.
Credits-to: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Instead of our own duplicated one. This fixes a bug in the driver
unload code if DRM_FBDEV_EMULATION=n but DRM_I915_FBDEV=y because we
try to unregister the nonexistent fbdev drm_framebuffer.
Cc: Archit Taneja <architt@codeaurora.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reported-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This patch contains the changes to remove the byte
swapping logic introduced with old dmc firmware.
While debugging PC10 entry issue for skylake found
with latest dmc firmware version 1.18 without byte
swapping dmc is working fine and able to enter PC10.
Note that apparently this was changed with dmc version 1.0 and earlier
ones indeed are byteswapped like this ...
v1: Initial version.
v2: Corrected firmware size during memcpy(). (Suggested by Sunil)
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Sunil Kamath <sunil.kamath@intel.com>
Signed-off-by: Animesh Manna <animesh.manna@intel.com>
Signed-off-by: Vathsala Nagaraju <vathsala.nagaraju@intel.com>
Reviewed-by: A.Sunil Kamath <sunil.kamath@intel.com>
[danvet: Add note that this only holds for released dmc firmware.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
First, an introduction. We currently have two types of GTT mmaps: the
"normal" old mmap, and the WC mmap. For frontbuffer-related features
that have automatic hardware tracking, only the non-WC mmap writes are
detected by the hardware. Since inside the Kernel both are treated as
ORIGIN_GTT, any features ignoring ORIGIN_GTT because of the hardware
tracking are destined to fail.
One of the special rules defined for the WC mmaps is that the user
should call the dirtyfb IOCTL after he is done using the pointers, so
that results in an intel_fb_obj_flush() call. The problem is that the
dirtyfb is passing ORIGIN_GTT, so it is being ignored by FBC - even
though the hardware tracking is not detecing the WC mmap operations.
So in order to fix that without having to give up the automatic
hardware tracking for GTT mmaps we transform the flush operation from
dirtyfb into a special operation: ORIGIN_DIRTYFB.
This commit fixes all the kms_frontbuffer_tracking subtests that
contain "fbc" and "mmap-wc" in their names and are currently failing
(for a total of 16 subtests).
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Since we may conceivably encounter situations where the upper part of the
64bit register changes between reads, for example when a timestamp
counter overflows, change the WARN into a retry loop.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Remove the leftovers, yay!
AGP for i915 kms died long ago with
commit 3bb6ce6686
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Wed Nov 13 22:14:16 2013 +0100
drm/i915: Kill legeacy AGP for gen3 kms
and with ums now gone to there's really no users any more.
Note that device_is_agp is only called when DRIVER_USE_AGP is set and
since we've unconditionally cleared that since a while there are
really no users left for i915_driver_device_is_agp.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
It fits more with the low-level fence code, and this move leaves only
the userspace tiling ioctl handling in i915_gem_tiling.c.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
No code changes, just moving all the fence related code into a
separate file (and avoiding a bunch of forward declarations while at
it).
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Currently HPD_PORT_A is used as an alias for HPD_NONE to mean that the
given port doesn't support long/short HPD pulse detection. SDVO and CRT
ports are like this and for these ports we only want to know whether an
hot plug event was detected on the corresponding pin. Since at least on
BXT we need long/short pulse detection on PORT A as well (added by the
next patch) remove this aliasing of HPD_PORT_A/HPD_NONE and let the
return value of intel_hpd_pin_to_port() show whether long/short pulse
detection is supported on the passed in pin.
No functional change.
v2:
- rebase on top of -nightly (Daniel)
- make the check for intel_hpd_pin_to_port() return value more readable
(Sivakumar)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Sonika Jindal <sonika.jindal@intel.com>
Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Two new module parameters: "enable_guc_submission" which will turn
on submission of batchbuffers via the GuC (when implemented), and
"guc_log_level" which controls the level of debugging logged by the
GuC and captured by the host.
Signed-off-by: Alex Dai <yu.dai@intel.com>
v4:
Mark "enable_guc_submission" unsafe [Daniel Vetter]
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tom O'Rourke <Tom.O'Rourke@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
i915_gem_object_create_from_data() is a generic function to save data
from a plain linear buffer in a new pageable gem object that can later
be accessed by the CPU and/or GPU.
We will need this for the microcontroller firmware loading support code.
Derived from i915_gem_object_write(), originally by Alex Dai
v2:
Change of function: now allocates & fills a new object, rather than
writing to an existing object
New name courtesy of Chris Wilson
Explicit domain-setting and other improvements per review comments
by Chris Wilson & Daniel Vetter
v4:
Rebased
Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tom O'Rourke <Tom.O'Rourke@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Backmerge fixes since it's getting out of hand again with the massive
split due to atomic between -next and 4.2-rc. All the bugfixes in
4.2-rc are addressed already (by converting more towards atomic
instead of minimal duct-tape) so just always pick the version in next
for the conflicts in modeset code.
All the other conflicts are just adjacent lines changed.
Conflicts:
drivers/gpu/drm/i915/i915_drv.h
drivers/gpu/drm/i915/i915_gem_gtt.c
drivers/gpu/drm/i915/intel_display.c
drivers/gpu/drm/i915/intel_drv.h
drivers/gpu/drm/i915/intel_ringbuffer.h
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Huzzah! \o/
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Updated the HAS_CORE_RING_FREQ macro to add the broxton check,
so as to disallow the programming & read of ring frequency
table for it.
Issue: VIZ-5144
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Added a new HAS_CORE_RING_FREQ macro, currently used in
gen6_update_ring_freq & i915_ring_freq_table debugfs function.
The programming & read of ring frequency table is needed for newer
GEN(>=6) platforms, except VLV/CHV.
Issue: VIZ-5144
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Instead of all the ad-hoc updating, duplicate the old state first
before reading out the hw state, then restore it.
intel_display_resume is a new function that duplicates the sw state,
then reads out the hw state, and commits the old state.
intel_display_setup_hw_state now only reads out the atomic state.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90396
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
After the previous patch this flag will check always clear, as it's
never set for shmem backed and userptr objects, so we can remove it.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Yeah this isn't really fixes but it's a nice cleanup to
clarify the code but not really worth the hassle of backmerging. So
just add to -fixes, we're still early in -rc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This broken code was introduced in
commit aa7471d228
Author: Jani Nikula <jani.nikula@intel.com>
Date: Wed Apr 1 11:15:21 2015 +0300
drm/i915: add i915 specific connector debugfs file for DPCD
v2: Drop hunk that accidentally crept in.
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Bob Paauwe <bob.j.paauwe@intel.com>
Cc: François Valenduc <francoisvalenduc@gmail.com>
Reported-by: François Valenduc <francoisvalenduc@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
The poor in_dbg_master() check was the only one without a reason
string. Give it a reason string so it won't feel excluded.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is all internal i915.ko work, let's start using intel_crtc for
everything.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Because the cool kids use dev_priv and FBC wants to be cool too.
We've been historically using struct drm_device on the FBC function
arguments, but we only really need it for intel_vgpu_active(): we can
use dev_priv everywhere else. So let's fully switch to dev_priv since
I'm getting tired of adding "struct drm_device *dev = dev_priv->dev"
everywhere.
If I get a NACK here I'll propose the opposite: convert all the
functions that currently take dev_priv to take dev.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Because it makes more sense there, IMHO.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
After the register save/restore code is gone there's just one user
left and it just obfuscates that one. Remove it.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Since that's really what we want to test for. Note remove the gen5
case doesn't change anything: In intel_setup_outputs ilk is handled
already in the HAS_PCH_SPLIT case, and the register save/restore code
touches registers which simply doesn't exist anymore at all.
v2: Drop UMS parts.
v3: Update commit message to reflect that the reg save/restore code is
gone (Ville).
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Make sure we're not going to have weird races in really weird cases
where a lot of different CRTCs are doing rendering and modesets at the
same time.
With this change and the stolen_lock from the previous patch, we can
start removing the struct_mutex locking we have around FBC in the next
patches.
v2:
- Rebase (6 months later)
- Also lock debugfs and stolen.
v3:
- Don't lock a single value read (Chris).
- Replace lockdep assertions with WARNs (Daniel).
- Improve commit message.
- Don't forget intel_pre_plane_update() locking.
v4:
- Don't remove struct_mutex at intel_pre_plane_update() (Chris).
- Add comment regarding locking dependencies (Chris).
- Rebase after the stolen code rework.
- Rebase again after drm-intel-nightly changes.
v5:
- Rebase after the new stolen_lock patch.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v4)
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Which should protect dev_priv->mm.stolen usage. This will allow us to
simplify the relationship between stolen memory, FBC and struct_mutex.
v2:
- Rebase after the stolen_remove_node() dev_priv patch move.
- I realized that after we fixed a few things related to the FBC CFB
size checks, we're not reallocating the CFB anymore with FBC
enabled, so we can just move all the locking to i915_gem_stolen.c
and stop worrying about freezing all the stolen alocations while
freeing/rellocating the CFB. This allows us to fix the "Too
coarse" observation from Chris.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the abstractions created by the last patch, we can move this code
and the only thing inside intel_fbc.c that knows about dev_priv->mm is
the code that reads stolen_base.
We also had to move a call to i915_gem_stolen_cleanup_compression()
- now called intel_fbc_cleanup_cfb() - outside i915_gem_stolen.c.
v2:
- Rebase after the remove_node() changes on the previous patch.
Requested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We want to move the FBC code out of i915_gem_stolen.c, but that code
directly adds/removes stolen memory nodes. Let's create this
abstraction, so i915_gme_stolen.c is still in control of all the
stolen memory handling. The abstraction will also allow us to add
locking assertions later.
v2:
- Add dev_priv as remove_node() argument since we'll need it later
(Chris).
Requested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ensures that the batch buffer is executed by the resource streamer.
And will let userspace know whether Resource Streamer is supported in
the kernel.
v2: Don't skip 1<<15 for the exec flags (Jani Nikula)
v3: Use HAS_RESOURCE_STREAMER macro for execbuf validation (Chris Wilson)
(from getparam patch)
v2: Update I915_PARAM_HAS_RESOURCE_STREAMER so it's after
I915_PARAM_HAS_GPU_RESET.
v3: Only advertise RS support for hardware that supports it.
v4: Add HAS_RESOURCE_STREAMER() macro (Chris)
Testcase: igt/gem_exec_params
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
[danvet: squash in getparam patch since it'd break bisect, suggested
by Chris.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Although we have a fixed setting for the PLL9 and EBB4 registers, it
still makes sense to check them together with the rest of PLL registers.
While at it also remove a redundant comment about 10 bit clock enabling.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Sonika Jindal <sonika.jindal@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This patch adds support for 0.85V VccIO on Skylake Y,
separate buffer translation tables for Skylake U,
and support for I_boost for the entries that needs this.
Changes in v2:
* Refactored the code a bit to move all DDI signal level setup to
intel_ddi.c
Issue: VIZ-5677
Signed-off-by: David Weinehall <david.weinehall@linux.intel.com>
Reviewed-by: Antti Koskipää <antti.koskipaa@linux.intel.com>
[danvet: Apply style polish checkpatch suggested.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Turns out the VLV/CHV system agent doesn't understand memory
latencies, so trying to rely on the PND deadline mechanism is not
going to fly especially when DDR DVFS is enabled. Currently we try to
avoid the problems by lying to the system agent about the deadlines
and setting the FIFO watermarks to 8 cachelines. This however leads to
bad memory self refresh residency.
So in order to satosfy everyone we'll just give up on the deadline
scheme and program the watermarks old school based on the worst case
memory latency.
I've modelled this a bit on the ILK+ approach where we compute multiple
sets of watermarks for each pipe (PM2,PM5,DDR DVFS) and when merge thet
appropriate one later with the watermarks from other pipes. There isn't
too much to merge actually since each pipe has a totally independent
FIFO (well apart from the mess with the partially shared DSPARB
registers), but still decopuling the pipes from each other seems like a
good idea.
Eventually we'll want to perform the watermark update in two phases
around the plane update to avoid underruns due to the single buffered
watermark registers. But that's still in limbo for ILK+ too, so I've not
gone that far yet for VLV/CHV either.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Read out the current watermark settings from the hardware at driver init
time. This will allow us to compare the newly calculated values against
the currrent ones and potentially avoid needless WM updates.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the new DRRS code it kinda sticks out, and we never managed to
get this to work well enough without causing issues. Time to wave
goodbye.
I've decided to keep the logic for programming the reduced clocks
intact, but everything else is gone. If anyone ever wants to resurrect
this we need to redo it all anyway on top of the frontbuffer tracking.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As there is no OLR to check, the check_olr() function is now a no-op and can be
removed.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In _i915_add_request(), the request is associated with a userland client.
Specifically it is linked to the 'file' structure and the current user process
is recorded. One problem here is that the current user process is not
necessarily the same as when the request was submitted to the driver. This is
especially true when the GPU scheduler arrives and decouples driver submission
from hardware submission. Note also that it is only in the case where the add
request comes from an execbuff call that there is a client to associate. Any
other add request call is kernel only so does not need to do it.
This patch moves the client association into a separate function. This is then
called from the execbuffer code path itself at a sensible time. It also removes
the now redundant 'file' pointer from the add request parameter list.
An extra cleanup of the client association is also added to the request clean up
code for the eventuality where the request is killed after association but
before being submitted (e.g. due to out of memory error somewhere). Once the
submission has happened, the request is on the request list and the regular
request list removal will clear the association. Note that this still needs to
happen at this point in time because the request might be kept floating around
much longer (due to someone holding a reference count) and the client should not
be worrying about this request after it has been retired.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Converted i915_gem_l3_remap() to take a request structure instead of a ring.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that everything above has been converted to use request structures, it is
possible to update the lower level move_to_active() functions to be request
based as well.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that all callers of i915_add_request() have a request pointer to hand, it is
possible to update the add request function to take a request pointer rather
than pulling it out of the OLR.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Updated the display page flip code to do explicit request creation and
submission rather than relying on the OLR and just hoping that the request
actually gets submitted at some random point.
The sequence is now to create a request, queue the work to the ring, assign the
known request to the flip queue work item then actually submit the work and post
the request.
Note that every single flip function used to finish with
'__intel_ring_advance(ring);'. However, immediately after they return there is
now an add request call which will do the advance anyway. Thus the many
duplicate advance calls have been removed.
v2: Updated commit message with comment about advance removal.
v3: The request can now be allocated by the _sync() code earlier on. Thus the
page flip path does not necessarily need to allocate a new request, it may be
able to re-use one.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The plan is to pass requests around as the basic submission tracking structure
rather than rings and contexts. This patch updates the i915_gem_object_sync()
code path.
v2: Much more complex patch to share a single request between the sync and the
page flip. The _sync() function now supports lazy allocation of the request
structure. That is, if one is passed in then that will be used. If one is not,
then a request will be allocated and passed back out. Note that the _sync() code
does not necessarily require a request. Thus one will only be created until
certain situations. The reason the lazy allocation must be done within the
_sync() code itself is because the decision to need one or not is not really
something that code above can second guess (except in the case where one is
definitely not required because no ring is passed in).
The call chains above _sync() now support passing a request through which most
callers passing in NULL and assuming that no request will be required (because
they also pass in NULL for the ring and therefore can't be generating any ring
code).
The exeception is intel_crtc_page_flip() which now supports having a request
returned from _sync(). If one is, then that request is shared by the page flip
(if the page flip is of a type to need a request). If _sync() does not generate
a request but the page flip does need one, then the page flip path will create
its own request.
v3: Updated comment description to be clearer about 'to_req' parameter (Tomas
Elf review request). Rebased onto newer tree that significantly changed the
synchronisation code.
v4: Updated comments from review feedback (Tomas Elf)
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that the request is guaranteed to specify the context, it is possible to
update the context switch code to use requests rather than ring and context
pairs. This patch updates i915_switch_context() accordingly.
Also removed the warning that the request's context must match the last context
switch's context. As the context switch now gets the context object from the
request structure, there is no longer any scope for the two to become out of
step.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The final step in removing the OLR from i915_gem_init_hw() is to pass the newly
allocated request structure in to each step rather than passing a ring
structure. This patch updates both i915_ppgtt_init_ring() and
i915_gem_context_enable() to take request pointers.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that a single per ring loop is being done for all the different
intialisation steps in i915_gem_init_hw(), it is possible to add proper request
management as well. The last remaining issue is that the context enable call
eventually ends up within *_render_state_init() and this does its own private
_i915_add_request() call.
This patch adds explicit request creation and submission to the top level loop
and removes the add_request() from deep within the sub-functions.
v2: Updated for removal of batch_obj from add_request call in previous patch.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The start of day context initialisation code in i915_gem_context_enable() loops
over each ring and calls the legacy switch context or the execlist init context
code as appropriate.
This patch moves the ring looping out of that function in to the top level
caller i915_gem_init_hw(). This means the a single pass can be made over all
rings doing the PPGTT, L3 remap and context initialisation of each ring
altogether.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In order to explcitly track all GPU work (and completely remove the outstanding
lazy request), it is necessary to add extra i915_add_request() calls to various
places. Some of these do not need the implicit cache flush done as part of the
standard batch buffer submission process.
This patch adds a flag to _add_request() to specify whether the flush is
required or not.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The plan is to pass requests around as the basic submission tracking structure
rather than rings and contexts. This patch updates the
execbuffer_move_to_active() code path.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Rather than just having a local request variable in the execbuff code, the
request pointer is now stored in the execbuff params structure. Also added
explicit cleanup of the request (plus wiping the OLR to match) in the error
case. This means that the execbuff code is no longer dependent upon the OLR
keeping track of the request so as to not leak it when things do go wrong. Note
that in the success case, the i915_add_request() at the end of the submission
function will tidy up the request and clear the OLR.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The alloc_request() function does not actually return the newly allocated
request. Instead, it must be pulled from ring->outstanding_lazy_request. This
patch fixes this so that code can create a request and start using it knowing
exactly which request it actually owns.
v2: Updated for new i915_gem_request_alloc() scheme.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Shrunk the parameter list of i915_gem_execbuffer_retire_commands() to a single
structure as everything it requires is available in the execbuff_params object.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The do_execbuf() function takes quite a few parameters. The actual set of
parameters is going to change with the conversion to passing requests around.
Further, it is due to grow massively with the arrival of the GPU scheduler.
This patch simplifies the prototype by passing a parameter structure instead.
Changing the parameter set in the future is then simply a matter of
adding/removing items to the structure.
Note that the structure does not contain absolutely everything that is passed
in. This is because the intention is to use this structure more extensively
later in this patch series and more especially in the GPU scheduler that is
coming soon. The latter requires hanging on to the structure as the final
hardware submission can be delayed until long after the execbuf IOCTL has
returned to user land. Thus it is unsafe to put anything in the structure that
is local to the IOCTL call itself - such as the 'args' parameter. All entries
must be copies of data or pointers to structures that are reference counted in
some way and guaranteed to exist for the duration of the batch buffer's life.
v2: Rebased to newer tree and updated for changes to the command parser.
Specifically, a code shuffle has required saving the batch start address in the
params structure.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The i915_add_request() function is called to keep track of work that has been
written to the ring buffer. It adds epilogue commands to track progress (seqno
updates and such), moves the request structure onto the right list and other
such house keeping tasks. However, the work itself has already been written to
the ring and will get executed whether or not the add request call succeeds. So
no matter what goes wrong, there isn't a whole lot of point in failing the call.
At the moment, this is fine(ish). If the add request does bail early on and not
do the housekeeping, the request will still float around in the
ring->outstanding_lazy_request field and be picked up next time. It means
multiple pieces of work will be tagged as the same request and driver can't
actually wait for the first piece of work until something else has been
submitted. But it all sort of hangs together.
This patch series is all about removing the OLR and guaranteeing that each piece
of work gets its own personal request. That means that there is no more
'hoovering up of forgotten requests'. If the request does not get tracked then
it will be leaked. Thus the add request call _must_ not fail. The previous patch
should have already ensured that it _will_ not fail by removing the potential
for running out of ring space. This patch enforces the rule by actually removing
the early exit paths and the return code.
Note that if something does manage to fail and the epilogue commands don't get
written to the ring, the driver will still hang together. The request will be
added to the tracking lists. And as in the old case, any subsequent work will
generate a new seqno which will suffice for marking the old one as complete.
v2: Improved WARNings (Tomas Elf review request).
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It is a bad idea for i915_add_request() to fail. The work will already have been
send to the ring and will be processed, but there will not be any tracking or
management of that work.
The only way the add request call can fail is if it can't write its epilogue
commands to the ring (cache flushing, seqno updates, interrupt signalling). The
reasons for that are mostly down to running out of ring buffer space and the
problems associated with trying to get some more. This patch prevents that
situation from happening in the first place.
When a request is created, it marks sufficient space as reserved for the
epilogue commands. Thus guaranteeing that by the time the epilogue is written,
there will be plenty of space for it. Note that a ring_begin() call is required
to actually reserve the space (and do any potential waiting). However, that is
not currently done at request creation time. This is because the ring_begin()
code can allocate a request. Hence calling begin() from the request allocation
code would lead to infinite recursion! Later patches in this series remove the
need for begin() to do the allocate. At that point, it becomes safe for the
allocate to call begin() and really reserve the space.
Until then, there is a potential for insufficient space to be available at the
point of calling i915_add_request(). However, that would only be in the case
where the request was created and immediately submitted without ever calling
ring_begin() and adding any work to that request. Which should never happen. And
even if it does, and if that request happens to fall down the tiny window of
opportunity for failing due to being out of ring space then does it really
matter because the request wasn't doing anything in the first place?
v2: Updated the 'reserved space too small' warning to include the offending
sizes. Added a 'cancel' operation to clean up when a request is abandoned. Added
re-initialisation of tracking state after a buffer wrap to keep the sanity
checks accurate.
v3: Incremented the reserved size to accommodate Ironlake (after finally
managing to run on an ILK system). Also fixed missing wrap code in LRC mode.
v4: Added extra comment and removed duplicate WARN (feedback from Tomas).
For: VIZ-5115
CC: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We have enough generic hotplug functions sprinkled all over i915_irq.c
to warrant moving them to a file of their own. This should further
underline the distinction between generic code in the new file and
platform specific hotplug and irq code that remains in i915_irq.c.
Add new intel_hpd_init_work to keep work functions static, and rename
get_port_from_pin to intel_hpd_pin_to_port while increasing its
visibility, but keep everything else the same.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The skylake scalers depend on the cdclk freq, but that frequency can
change during a modeset. So when a modeset happens calculate the new
cdclk in the atomic state. With the transitional helpers gone the
cached value can be used in the scaler, and committed after all
crtc's are disabled.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90874
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Tested-by(IVB): Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Because we're currently using FBC_UNSUPPORTED_MODE for two different
cases.
This commit will also allow us to write the next one without hiding
information from the user.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The i915 atomic conversion is a real beast and it's not getting easier
wrangling in a separate branch. I'm might be regretting this, but
right after vacation nothing can burst my little bubble here!
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
In igt, we want to test handling of GPU hangs, both for recovery
purposes and for reporting. However, we don't want to inject a genuine
GPU hang onto a machine that cannot recover and so be permenantly
wedged. Rather than embed heuristics into igt, have the kernel report
exactly when it expects the GPU reset to work.
This can also be usefully extended in future to indicate different
levels of fine-grained resets.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tim Gore <tim.gore@intel.com>
Cc: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Until now the software command checker assumed that commands could
read or write at most a single register per packet. This is not
necessarily the case, MI_LOAD_REGISTER_IMM expects a variable-length
list of offset/value pairs and writes them in sequence. The previous
code would only check whether the first entry was valid, effectively
allowing userspace to write unrestricted registers of the MMIO space
by sending a multi-register write with a legal first register, with
potential security implications on Gen6 and 7 hardware.
Fix it by extending the drm_i915_cmd_descriptor table to represent
multi-register access and making validate_cmd() iterate for all
register offsets present in the command packet.
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Until now the software command checker assumed that commands could
read or write at most a single register per packet. This is not
necessarily the case, MI_LOAD_REGISTER_IMM expects a variable-length
list of offset/value pairs and writes them in sequence. The previous
code would only check whether the first entry was valid, effectively
allowing userspace to write unrestricted registers of the MMIO space
by sending a multi-register write with a legal first register, with
potential security implications on Gen6 and 7 hardware.
Fix it by extending the drm_i915_cmd_descriptor table to represent
multi-register access and making validate_cmd() iterate for all
register offsets present in the command packet.
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that we can subclass drm_atomic_state we can also use it to keep
track of all the pll settings. atomic_state is a better place to hold
all shared state than keeping pll->new_config everywhere.
Changes since v1:
- Assert connection_mutex is held.
Changes since v2:
- Fix swapped arguments to kzalloc for intel_atomic_state_alloc. (Jani Nikula)
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Now that the dpll updates are (mostly) atomic, the .off() code is a noop,
and intel_crtc_disable does mostly the same as intel_modeset_update_state.
Move all logic for connectors_active and setting dpms to that function.
Changes since v1:
- Move drm_atomic_helper_swap_state up.
Changes since v2:
- Split out intel_put_shared_dpll removal.
Changes since v3:
- Rebase on top of latest drm-intel.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
We need to tell BDW ULT and ULX apart.
v2: Rebased to the latest
v3: Rebased to the latest
v4: Fix for patch style problems
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Keep the cdclk maximum supported frequency around in dev_priv so that we
can verify certain things against it before actually changing the cdclk
frequency.
For now only VLV/CHV have support changing cdclk frequency, so other
plarforms get to assume cdclk is fixed.
v2: Rebased to the latest
v3: Rebased to the latest
v4: Fix for patch style problems
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
No functional changes.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There are plenty of hotplug related fields in struct drm_i915_private
scattered all around. Group them under one hotplug struct. Clean up
naming while at it. No functional changes.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Export a new context parameter that can be set/queried through the
context_{get,set}param ioctls. This parameter is passed as a context
flag and decides whether or not a GPU address mapping is allowed to
be made at address zero. The default is to allow such mappings.
Signed-off-by: David Weinehall <david.weinehall@intel.com>
Acked-by: "Zou, Nanhai" <nanhai.zou@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Rename dpio_lock to sb_lock to inform the reader that its primary
purpose is to protect the sideband mailbox rather than some DPIO
state.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In commit 1854d5ca0d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Apr 7 16:20:32 2015 +0100
drm/i915: Deminish contribution of wait-boosting from clients
we removed an atomic timer based check for allowing waitboosting and
moved it below the mutex taken during RPS. However, that mutex can be
held for long periods of time on Vallyview/Cherryview as communication
with the PCU is slow. As clients may frequently wait for results (e.g.
such as tranform feedback) we introduced contention between the client
and the RPS worker. We can take advantage of the RPS worker, by
switching the wait boost decision to use spin locks and defer the
actual reclocking to the worker.
Fixes a regression of up to 45% on Baytrail and Baswell!
v2 (Daniel):
- Use max_freq_softlimit instead of the not-yet-merged boost
frequency.
- Don't inject a fake irq into the boost work, instead treat
client_boost as just another legit waker.
v3: Drop the now unused mask (Chris).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90112
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As Daniel commented on
commit b7ffe1362c5f468b853223acc9268804aa92afc8
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Apr 27 13:41:24 2015 +0100
drm/i915: Free RPS boosts for all laggards
it is better to be explicit when sharing hardcoded values such as
throttle/boost timeouts. Make it so!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We need to re-init the display hardware when going out of suspend. This
includes:
- Hooking the PCH to the reset logic
- Restoring CDCDLK
- Enabling the DDB power
Among those, only the CDCDLK one is a bit tricky. There's some
complexity in that:
- DPLL0 (which is the source for CDCLK) has two VCOs, each with a set
of supported frequencies. As eDP also uses DPLL0 for its link rate,
once DPLL0 is on, we restrict the possible eDP link rates the chosen
VCO.
- CDCLK also limits the bandwidth available to push pixels.
So, as a first step, this commit restore what the BIOS set, until I can
do more testing.
In case that's of interest for the reviewer, I've unit tested the
function that derives the decimal frequency field:
#include <stdio.h>
#include <stdint.h>
#include <assert.h>
#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
static const struct dpll_freq {
unsigned int freq;
unsigned int decimal;
} freqs[] = {
{ .freq = 308570, .decimal = 0b01001100111},
{ .freq = 337500, .decimal = 0b01010100001},
{ .freq = 432000, .decimal = 0b01101011110},
{ .freq = 450000, .decimal = 0b01110000010},
{ .freq = 540000, .decimal = 0b10000110110},
{ .freq = 617140, .decimal = 0b10011010000},
{ .freq = 675000, .decimal = 0b10101000100},
};
static void intbits(unsigned int v)
{
int i;
for(i = 10; i >= 0; i--)
putchar('0' + ((v >> i) & 1));
}
static unsigned int freq_decimal(unsigned int freq /* in kHz */)
{
return (freq - 1000) / 500;
}
static void test_freq(const struct dpll_freq *entry)
{
unsigned int decimal = freq_decimal(entry->freq);
printf("freq: %d, expected: ", entry->freq);
intbits(entry->decimal);
printf(", got: ");
intbits(decimal);
putchar('\n');
assert(decimal == entry->decimal);
}
int main(int argc, char **argv)
{
int i;
for (i = 0; i < ARRAY_SIZE(freqs); i++)
test_freq(&freqs[i]);
return 0;
}
v2:
- Rebase on top of -nightly
- Use (freq - 1000) / 500 for the decimal frequency (Ville)
- Fix setting the enable bit of HSW_NDE_RSTWRN_OPT (Ville)
- Rename skl_display_{resume,suspend} to skl_{init,uninit}_cdclk to
be consistent with the BXT code (Ville)
- Store boot CDCLK in ddi_pll_init (Ville)
- Merge dev_priv's skl_boot_cdclk into cdclk_freq
- Use LCPLL_PLL_LOCK instead of (1 << 30) (Ville)
- Replace various '0' by SKL_DPLL0 to be a bit more explicit that
we're programming DPLL0
- Busy poll the PCU before doing the frequency change. It takes about
3/4 cycles, each separated by 10us, to get the ACK from the CPU
(Ville)
v3:
- Restore dev_priv->skl_boot_cdclk, leaving unification with
dev_priv->cdclk_freq for a later patch (Daniel, Ville)
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that we have internal clients, rather than faking a whole
drm_i915_file_private just for tracking RPS boosts, create a new struct
intel_rps_client and pass it along when waiting.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: s/rq/req/]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Since we will often pageflip to an active surface, we will often have to
wait for the surface to be written before issuing the flip. Also we are
likely to wait on that surface in plenty of time before the vblank.
Since we have a mechanism for boosting when a flip misses the expected
vblank, curtain the number of times we RPS boost when simply waiting for
mmioflip.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: s/rq/req/]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ring switches can occur many times per frame, and are often out of
control, causing frequent RPS boosting for no practical benefit. Treat
the sw semaphore synchronisation as a separate client and only allow it
to boost once per busy/idle cycle.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: s/rq/req/]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Currently, we only track the last request globally across all engines.
This prevents us from issuing concurrent read requests on e.g. the RCS
and BCS engines (or more likely the render and media engines). Without
semaphores, we incur costly stalls as we synchronise between rings -
greatly impacting the current performance of Broadwell versus Haswell in
certain workloads (like video decode). With the introduction of
reference counted requests, it is much easier to track the last request
per ring, as well as the last global write request so that we can
optimise inter-engine read read requests (as well as better optimise
certain CPU waits).
v2: Fix inverted readonly condition for nonblocking waits.
v3: Handle non-continguous engine array after waits
v4: Rebase, tidy, rewrite ring list debugging
v5: Use obj->active as a bitfield, it looks cool
v6: Micro-optimise, mostly involving moving code around
v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf)
v8: Rebase
v9: Refactor i915_gem_object_sync() to allow the compiler to better
optimise it.
Benchmark: igt/gem_read_read_speed
hsw:gt3e (with semaphores):
Before: Time to read-read 1024k: 275.794µs
After: Time to read-read 1024k: 123.260µs
hsw:gt3e (w/o semaphores):
Before: Time to read-read 1024k: 230.433µs
After: Time to read-read 1024k: 124.593µs
bdw-u (w/o semaphores): Before After
Time to read-read 1x1: 26.274µs 10.350µs
Time to read-read 128x128: 40.097µs 21.366µs
Time to read-read 256x256: 77.087µs 42.608µs
Time to read-read 512x512: 281.999µs 181.155µs
Time to read-read 1024x1024: 1196.141µs 1118.223µs
Time to read-read 2048x2048: 5639.072µs 5225.837µs
Time to read-read 4096x4096: 22401.662µs 21137.067µs
Time to read-read 8192x8192: 89617.735µs 85637.681µs
Testcase: igt/gem_concurrent_blit (read-read and friends)
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v8]
[danvet: s/\<rq\>/req/g]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
BUN 1: prop_coeff, int_coeff, tdctargetcnt programming updated and tied to
VCO frequencies. Program i_lockthresh in PORT_PLL_9.
VCO calculated based on the formula:
Desired Output = Port bit rate in MHz (DisplayPort HBR2 is 5400 MHz)
Fast Clock = Desired Output / 2
VCO = Fast Clock * P1 * P2
Prop_coeff, int_coeff, and tdctargetcnt modified according to above
calculation.
BUN 2: Port PLLs require additional programming at certain frequencies -
DCO amplitude in PORT_PLL_10
Review comments from Siva which were addressed in the initial version of the
patch.
- Change PORT_PLL_LOCK_THRESHOLD to PORT_PLL_LOCK_THRESHOLD_MASK
- Calculate for HDMI
- Correct values for vco = 5.4
- return in case of invalid vco range
v2: Imre's review comments addressed
- change dcoampovr_en to dcoampovr_en_h
- change PORT_PLL_DCO_AMP_OVR_EN to PORT_PLL_DCO_AMP_OVR_EN_H
- Correct lane stagger value for 324MHz
- Make coef common for HDMI and DP
- remove superfluous comments
v3: Imre's comments addressed
- Remove Prop_coeff, int_coeff, tdctargetcnt, dcoampovr_en, gain_ctl,
dcoampovr_en_h from bxt_clk_div and make them local variables.
Signed-off-by: Vandana Kannan <vandana.kannan@intel.com>
Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com> [v1]
Cc: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Be in line with other features that we have.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we perform the mmio-flip without any locking and then try to acquire
the struct_mutex prior to dereferencing the request, it is possible for
userspace to queue a new pageflip before the worker can finish clearing
the old state - and then it will clear the new flip request. The result
is that the new flip could be completed before the GPU has finished
rendering.
The bugs stems from removing the seqno checking in
commit 536f5b5e86
Author: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Date: Thu Nov 6 11:03:40 2014 +0200
drm/i915: Make mmio flip wait for seqno in the work function
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We no longer interpolate domains in the same manner, and even if we did,
we should trust setting either of the other write domains would trigger
an invalidation rather than force it. Remove the tweaking of the
read_domains since it serves no purpose and use
i915_gem_object_wait_rendering() directly.
Note that this goes back to
commit a8198eea15
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Apr 13 22:04:09 2011 +0100
drm/i915: Introduce i915_gem_object_finish_gpu()
and gpu domain tracking died in
commit cc889e0f6c
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Wed Jun 13 20:45:19 2012 +0200
drm/i915: disable flushing_list/gpu_write_list
which is more than 1 year older.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add notes with information dug out of git history.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Skylake nv12 format requires dbuf (aka. ddb) calculations
and programming for each of y and uv sub-planes. Made minor
changes to reuse current dbuf calculations and programming
for uv plane. i.e., with this change, existing computation
is used for either packed format or uv portion of nv12
depending on incoming format. Added new code for dbuf
computation and programming for y plane.
This patch is a pre-requisite for adding NV12 format support.
Actual nv12 support is coming in later patches.
Signed-off-by: Chandra Konduru <chandra.konduru@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Sometimes (exactly when is a bit unclear) DISPLAY_PHY_CONTROL appears to
get corrupted. The values I've managed to read from it seem to have some
pattern but vary quite a lot. The corruption doesn't seem to just happen
when the register is accessed, but can also happen spontaneosly during
modeset. When this happens during a modeset things go south and the
display doesn't light up.
I've managed to hit the problemn when toggling HDMI on port D on and
off. When things get corrupted the display doesn't light up, but as soon
as I manually write the correct value to the register the display comes
up.
First I was suspicious that we ourselves accidentally overwrite it with
garbage, but didn't catch anything with the reg_rw tracepoint. Also I
sprinkled check all over the modeset path to see exactly when the
corruption happens, and eg. the read back value was fine just before
intel_dp_set_m(), and corrupted immediately after it. I also made my
check function repair the register value whenever it was wrong, and with
this approach the corruption repeated several times during the modeset
operation, always seeming to trigger in the same exact calls to the
check function, while other calls to the function never caught anything.
So far I've not seen this problem occurring when carefully avoiding all
read accesses to DISPLAY_PHY_CONTROL. Not sure if that's just pure luck
or an actual workaround, but we can hope it works. So let's avoid reading
the register and instead track the desired value of the register in dev_priv.
v2: Read out the power well state to determine initial register value
v3: Use DPIO_CHx names instead of raw numbers
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This allows disabling all planes affecting a crtc without caring what type it is.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This provides an option to override the value set by VBT
for selecting edp Vswing Pre-emph setting table.
v2: Adding comment about this being a temporary workaround and
making the parameter read-only (Jani)
v3: Changing mode to 0400 instead of 0 (Jani)
https://bugs.freedesktop.org/show_bug.cgi?id=89554
Signed-off-by: Sonika Jindal <sonika.jindal@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Enable runtime PM for Skylake platform
v2: After adding dmc ver 1.0 support rebased on top of nightly. (Animesh)
Issue: VIZ-2819
Signed-off-by: A.Sunil Kamath <sunil.kamath@intel.com>
Signed-off-by: Suketu Shah <suketu.j.shah@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Animesh Manna <animesh.manna@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Add triggers as per expectations mentioned in gen9_enable_dc5
and gen9_disable_dc5 patch.
Also call POSTING_READ for every write to a register to ensure that
its written immediately.
v1: Remove POSTING_READ calls as they've already been added in previous patches.
v2: Rebase to move all runtime pm specific changes to intel_runtime_pm.c file.
Modified as per review comments from Imre:
1] Change variable name 'dc5_allowed' to 'dc5_enabled' to correspond to relevant
functions.
2] Move the check dc5_enabled in skl_set_power_well() to disable DC5 into
gen9_disable_DC5 which is a more appropriate place.
3] Convert checks for 'pm.dc5_enabled' and 'pm.suspended' in skl_set_power_well()
to warnings. However, removing them for now as they'll be included in a future patch
asserting DC-state entry/exit criteria.
4] Enable DC5, only when CSR firmware is verified to be loaded. Create new structure
to track 'enabled' and 'deferred' status of DC5.
5] Ensure runtime PM reference is obtained, if CSR is not loaded, to avoid entering
runtime-suspend and release it when it's loaded.
6] Protect necessary CSR-related code with locks.
7] Move CSR-loading call to runtime PM initialization, as power domains needed to be
accessed during deferred DC5-enabling, are not initialized earlier.
v3: Rebase to latest.
Modified as per review comments from Imre:
1] Use blocking wait for CSR-loading to finish to enable DC5 for simplicity, instead of
deferring enabling DC5 until CSR is loaded.
2] Obtain runtime PM reference during CSR-loading initialization itself as deferred DC5-
enabling is removed and release it at the end of CSR-loading functionality.
3] Revert calling CSR-loading functionality to the beginning of i915 driver-load
functionality to avoid any delay in loading.
4] Define another variable to track whether CSR-loading failed and use it to avoid enabling
DC5 if it's true.
5] Define CSR-load-status accessor functions for use later.
v4:
1] Disable DC5 before enabling PG2 instead of after it.
2] DC5 was being mistaken enabled even when CSR-loading timed-out. Fix that.
3] Enable DC5-related functionality using a macro.
4] Remove dc5_enabled tracking variable and its use as it's not needed now.
v5:
1] Mark CSR failed to load where necessary in finish_csr_load function.
2] Use mutex-protected accessor function to check if CSR loaded instead of directly
accessing the variable.
3] Prefix csr_load_status_get/set function names with intel_.
v6: rebase to latest.
v7: Rebase on top of nightly (Damien)
v8: Squashed the patch from Imre - added csr helper pointers to simplify the code. (Imre)
v9: After adding dmc ver 1.0 support rebased on top of nightly. (Animesh)
v10: Added a enum for different csr states, suggested by Imre. (Animesh)
v11: Based on review comments from Imre, Damien and Daniel following changes done
- enum name chnaged to csr_state (singular form).
- FW_UNINITIALIZED used as zeroth element in enum csr_state.
- Prototype changed for helper function(set/get csr status), using enum csr_state instead of bool.
v12: Based on review comment from Imre, introduced bool fw_loaded local to finish_csr_load() which helps
calling once to set the csr status. The same flag used to fail RPM if find any issue during
firmware loading.
Issue: VIZ-2819
Signed-off-by: A.Sunil Kamath <sunil.kamath@intel.com>
Signed-off-by: Suketu Shah <suketu.j.shah@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Animesh Manna <animesh.manna@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Display Context Save and Restore support is needed for
various SKL Display C states like DC5, DC6.
This implementation is added based on first version of DMC CSR program
that we received from h/w team.
Here we are using request_firmware based design.
Finally this firmware should end up in linux-firmware tree.
For SKL platform its mandatory to ensure that we load this
csr program before enabling DC states like DC5/DC6.
As CSR program gets reset on various conditions, we should ensure
to load it during boot and in future change to be added to load
this system resume sequence too.
v1: Initial relese as RFC patch
v2: Design change as per Daniel, Damien and Shobit's review comments
request firmware method followed.
v3: Some optimization and functional changes.
Pulled register defines into drivers/gpu/drm/i915/i915_reg.h
Used kmemdup to allocate and duplicate firmware content.
Ensured to free allocated buffer.
v4: Modified as per review comments from Satheesh and Daniel
Removed temporary buffer.
Optimized number of writes by replacing I915_WRITE with I915_WRITE64.
v5:
Modified as per review comemnts from Damien.
- Changed name for functions and firmware.
- Introduced HAS_CSR.
- Reverted back previous change and used csr_buf with u8 size.
- Using cpu_to_be64 for endianness change.
Modified as per review comments from Imre.
- Modified registers and macro names to be a bit closer to bspec terminology
and the existing register naming in the driver.
- Early return for non SKL platforms in intel_load_csr_program function.
- Added locking around CSR program load function as it may be called
concurrently during system/runtime resume.
- Releasing the fw before loading the program for consistency
- Handled error path during f/w load.
v6: Modified as per review comments from Imre.
- Corrected out_freecsr sequence.
v7: Modified as per review comments from Imre.
Fail loading fw if fw->size%8!=0.
v8: Rebase to latest.
v9: Rebase on top of -nightly (Damien)
v10: Enabled support for dmc firmware ver 1.0.
According to ver 1.0 in a single binary package all the firmware's that are
required for different stepping's of the product will be stored. The package
contains the css header, followed by the package header and the actual dmc
firmwares. Package header contains the firmware/stepping mapping table and
the corresponding firmware offsets to the individual binaries, within the
package. Each individual program binary contains the header and the payload
sections whose size is specified in the header section. This changes are done
to extract the specific firmaware from the package. (Animesh)
v11: Modified as per review comemnts from Imre.
- Added code comment from bpec for header structure elements.
- Added __packed to avoid structure padding.
- Added helper functions for stepping and substepping info.
- Added code comment for CSR_MAX_FW_SIZE.
- Disabled BXT firmware loading, will be enabled with dmc 1.0 support.
- Changed skl_stepping_info based on bspec, earlier used from config DB.
- Removed duplicate call of cpu_to_be* from intel_csr_load_program function.
- Used cpu_to_be32 instead of cpu_to_be64 as firmware binary in dword aligned.
- Added sanity check for header length.
- Added sanity check for mmio address got from firmware binary.
- kmalloc done separately for dmc header and dmc firmware. (Animesh)
v12: Modified as per review comemnts from Imre.
- Corrected the typo error in skl stepping info structure.
- Added out-of-bound access for skl_stepping_info.
- Sanity check for mmio address modified.
- Sanity check added for stepping and substeppig.
- Modified the intel_dmc_info structure, cache only the required header info. (Animesh)
v13: clarify firmware load error message.
The reason for a firmware loading failure can be obscure if the driver
is built-in. Provide an explanation to the user about the likely reason for
the failure and how to resolve it. (Imre)
v14: Suggested by Jani.
- fix s/I915/CONFIG_DRM_I915/ typo
- add fw_path to the firmware object instead of using a static ptr (Jani)
v15:
1) Changed the firmware name as dmc_gen9.bin, everytime for a new firmware version a symbolic link
with same name will help not to build kernel again.
2) Changes done as per review comments from Imre.
- Error check removed for intel_csr_ucode_init.
- Moved csr-specific data structure to intel_csr.h and optimization done on structure definition.
- fw->data used directly for parsing the header info & memory allocation
only done separately for payload. (Animesh)
v16:
- No need for out_regs label in i915_driver_load(), so removed it.
- Changed the firmware name as skl_dmc_ver1.bin, followed naming convention <platform>_dmc_<api-version>.bin (Animesh)
Issue: VIZ-2569
Signed-off-by: A.Sunil Kamath <sunil.kamath@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Animesh Manna <animesh.manna@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
drm-intel-next-2015-04-23:
- dither support for ns2501 dvo (Thomas Richter)
- some polish for the gtt code and fixes to finally enable the cmd parser on hsw
- first pile of bxt stage 1 enabling (too many different people to list ...)
- more psr fixes from Rodrigo
- skl rotation support from Chandra
- more atomic work from Ander and Matt
- pile of cleanups and micro-ops for execlist from Chris
drm-intel-next-2015-04-10:
- cdclk handling cleanup and fixes from Ville
- more prep patches for olr removal from John Harrison
- gmbus pin naming rework from Jani (prep for bxt)
- remove ->new_config from Ander (more atomic conversion work)
- rps (boost) tuning and unification with byt/bsw from Chris
- cmd parser batch bool tuning from Chris
- gen8 dynamic pte allocation (Michel Thierry, based on work from Ben Widawsky)
- execlist tuning (not yet all of it) from Chris
- add drm_plane_from_index (Chandra)
- various small things all over
* tag 'drm-intel-next-2015-04-23-fixed' of git://anongit.freedesktop.org/drm-intel: (204 commits)
drm/i915/gtt: Allocate va range only if vma is not bound
drm/i915: Enable cmd parser to do secure batch promotion for aliasing ppgtt
drm/i915: fix intel_prepare_ddi
drm/i915: factor out ddi_get_encoder_port
drm/i915/hdmi: check port in ibx_infoframe_enabled
drm/i915/hdmi: fix vlv infoframe port check
drm/i915: Silence compiler warning in dvo
drm/i915: Update DRIVER_DATE to 20150423
drm/i915: Enable dithering on NatSemi DVO2501 for Fujitsu S6010
rm/i915: Move i915_get_ggtt_vma_pages into ggtt_bind_vma
drm/i915: Don't try to outsmart gcc in i915_gem_gtt.c
drm/i915: Unduplicate i915_ggtt_unbind/bind_vma
drm/i915: Move ppgtt_bind/unbind around
drm/i915: move i915_gem_restore_gtt_mappings around
drm/i915: Fix up the vma aliasing ppgtt binding
drm/i915: Remove misleading comment around bind_to_vm
drm/i915: Don't use atomics for pg_dirty_rings
drm/i915: Don't look at pg_dirty_rings for aliasing ppgtt
drm/i915/skl: Support Y tiling in MMIO flips
drm/i915: Fixup kerneldoc for struct intel_context
...
Conflicts:
drivers/gpu/drm/i915/i915_drv.c
At the moment intel_prepare_ddi buffer will iterate through both MST and
CRT encoders, which is incorrect. Neither of these encoder types have an
embedding intel_digital_port object, so for these encoder types we will
use random data when dereferencing the corresponding
intel_digital_port->port field.
Introduced in
commit b403745c84
Author: Damien Lespiau <damien.lespiau@intel.com>
Date: Mon Aug 4 22:01:33 2014 +0100
drm/i915: Iterate through the initialized DDIs to prepare their buffers
v2:
- fix getting at the port for MST encoders too
- make sure that intel_prepare_ddi_buffers() gets called for port E too
(Paulo)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90067
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Currently we have the problem that the decision whether ptes need to
be (re)written is splattered all over the codebase. Move all that into
i915_vma_bind. This needs a few changes:
- Just reuse the PIN_* flags for i915_vma_bind and do the conversion
to vma->bound in there to avoid duplicating the conversion code all
over.
- We need to make binding for EXECBUF (i.e. pick aliasing ppgtt if
around) explicit, add PIN_USER for that.
- Two callers want to update ptes, give them a PIN_UPDATE for that.
Of course we still want to avoid double-binding, but that should be
taken care of:
- A ppgtt vma will only ever see PIN_USER, so no issue with
double-binding.
- A ggtt vma with aliasing ppgtt needs both types of binding, and we
track that properly now.
- A ggtt vma without aliasing ppgtt could be bound twice. In the
lower-level ->bind_vma functions hence unconditionally set
GLOBAL_BIND when writing the ggtt ptes.
There's still a bit room for cleanup, but that's for follow-up
patches.
v2: Fixup fumbles.
v3: s/PIN_EXECBUF/PIN_USER/ for clearer meaning, suggested by Chris.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
commit ae6c480692
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Wed Aug 6 15:04:53 2014 +0200
drm/i915: Only track real ppgtt for a context
Changed the code but didn't update kerneldoc.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: "Thierry, Michel" <michel.thierry@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Purpose of this tracking is to know when to flush the cache between
the CPU and the non-coherent display engine. Prior to:
commit 121920faf2
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date: Mon Mar 23 11:10:37 2015 +0000
drm/i915/skl: Query display address through a wrapper
This worked by a mix of direct flag manipulation and checking for
existence of a pinned GGTT VMA.
With the introduction of rotated display mappings this approach is
no longer correct.
New simpler approach is to just keep this count over calls which pin
and unpin objects to and from display, at the slight cost of extra
space in every bo.
(Inspired and extracted code from a larger rework by Chris Wilson.)
v2: Remove the limit since it is not well defined. (Chris Wilson, Ville Syrjälä)
v3: Commit message corrections. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The merge is clean, but the arm build fails afterwards,
due to API changes in the regulator tree.
I've included the patch into the merge to fix the build.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Not every DDIs is necessarily connected can be strapped off and, in the
future, we'll have platforms with a different number of default DDI
ports. So, let's only call intel_prepare_ddi_buffers() on DDI ports that
are actually detected.
We also use the opportunity to give a struct intel_digital_port to
intel_prepare_ddi_buffers() as we'll need it in a following patch to
query if the port supports HMDI or not.
On my HSW machine this removes the initialization of a couple of
(unused) DDIs.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Plug bxt PLL code into existing shared DPLL framework.
v2: (imre)
- squash in Satheeshakrishna's "Define BXT clock registers" and
"Add state variables for bxt clock registers" patches
- squash in Vandanas's "Change grp access to lane access for PLL"
- fix group vs. lane access in bxt_ddi_pll_get_hw_state
- add code comment why we read from lane registers while writing to
group registers
- clean up register macros
- use BXT_PORT_PLL_* macros instead of open-coding the same
- check if BXT_PORT_PCS_DW12_LN01 matches BXT_PORT_PCS_DW12_LN23
during hardware state readout
- add missing LANESTAGGER_STRAP_OVRD masking
- add note about missing step according to the latest BUN for
PORT_PLL_9/lockthresh
Signed-off-by: Satheeshakrishna M <satheeshakrishna.m@intel.com> (v1)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Rename vlv_cdclk_freq to cdclk_freq so that it can be used for all
platforms as required. Needed by the next patch.
Signed-off-by: Vandana Kannan <vandana.kannan@intel.com>
Signed-off-by: A.Sunil Kamath <sunil.kamath@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On Haswell and Broadwell with link in standby when exit event happens
between vblank and VSC packet, PSR exit on panel but DPA transmitter
still sends black pixel. When this condition hits, panel will intermittently
display black frame.
The known W/A for this case involve the of single_frame update
that isn't supported on Haswell and to be supported on Broadwell
3 other workarounds would be required. So it is better and safe to
just deprecate link_standby for now.
Also, link fully off saves more power than link_standby and afwk
no OEM is requesting link standby on VBT. There is no reason for that.
For Skylake let's just consider it behaves like Broadwell until
we prove otherwise.
v2: Fix commit message (Durga).
v3: Fix conflict with PSR2.
Reference: HSD: bdwgfx/1912559
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Durgadoss R <durgadoss.r@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Separate topic branch for bxt didn't work out since we needed to
refactor the gmbus code a bit to make it look decent. So backmerge.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
The obj->pin_mappable flag only exists for debug purposes and is a
hindrance that is mistreated with rotated GGTT views. For debug
purposes, it suffices to mark objects with pin_display as being of note.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We already assign a unique identifier to every request: seqno. That
someone felt like adding a second one without even mentioning why and
tweaking ABI smells very fishy.
Fixes regression from
commit b3a38998f0
Author: Nick Hoath <nicholas.hoath@intel.com>
Date: Thu Feb 19 16:30:47 2015 +0000
drm/i915: Fix a use after free, and unbalanced refcounting
v2: Rebase
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Nick Hoath <nicholas.hoath@intel.com>
Cc: Thomas Daniel <thomas.daniel@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jani Nikula <jani.nikula@intel.com>
[danvet: Fixup because different merge order.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This eliminates six needless spin lock/unlock pairs when writing out
ELSP.
v2: Respin with my preferred colour.
v3: Mostly back to the original colour
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v1]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
vma are more frequently allocated than objects and so should equally
benefit from having a dedicated slab.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
requests are even more frequently allocated than objects and equally
benefit from having a dedicated slab.
v2: Rebase
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is mostly useful for execlists where the rings switch between
contexts (and so checking that the ring's start register matches the
context is important).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now with the trimmed memcpy before the command parser, we try to
allocate many different sizes of batches, predominantly one or two
pages. We can therefore speed up searching for a good sized batch by
keeping the objects of buckets of roughly the same size.
v2: Add a comment about bucket sizes
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
I woke up one morning and found 50k objects sitting in the batch pool
and every search seemed to iterate the entire list... Painting the
screen in oils would provide a more fluid display.
One issue with the current design is that we only check for retirements
on the current ring when preparing to submit a new batch. This means
that we can have thousands of "active" batches on another ring that we
have to walk over. The simplest way to avoid that is to split the pools
per ring and then our LRU execution ordering will also ensure that the
inactive buffers remain at the front.
v2: execlists still requires duplicate code.
v3: execlists requires more duplicate code
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In the next patch, I want to use the structure elsewhere and so require
it defined earlier. Rather than move the definition to an earlier location
where it feels very odd, place it in its own header file.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With boosting for missed pageflips, we have a much stronger indication
of when we need to (temporarily) boost GPU frequency to ensure smooth
delivery of frames. So now only allow each client to perform one RPS boost
in each period of GPU activity due to stalling on results.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reuse the same reclocking strategy for Baytail as on its bigger brethren,
Sandybridge and Ivybridge. In particular, this makes the device quicker
to reclock (both up and down) though the tendency now is to downclock
more aggressively to compensate for the RPS boosts.
v2: Rebase
v3: Exclude Cherrytrail as Deepak was concerned that the increased
number of register writes would wake the common powerwell too often.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The biggest user of i915_gem_object_get_page() is the relocation
processing during execbuffer. Typically userspace passes in a set of
relocations in sorted order. Sadly, we alternate between relocations
increasing from the start of the buffers, and relocations decreasing
from the end. However the majority of consecutive lookups will still be
in the same page. We could cache the start of the last sg chain, however
for most callers, the entire sgl is inside a single chain and so we see
no improve from the extra layer of caching.
v2: Avoid the double increment inside unlikely()
References: https://bugs.freedesktop.org/show_bug.cgi?id=88308
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pipe A and b have 4 planes.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Antti Koskipää <antti.koskipaa@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Some BIOSes (e.g. the one on the Minnowboard) don't save/restore this
reg. If it's unlocked, we can just restore the previous value, and if
it's locked (in case the BIOS re-programmed it for us) the write will be
ignored and we'll still have "did it move" sanity check in the PM code to
warn us if something is still amiss.
References: https://bugs.freedesktop.org/show_bug.cgi?id=89611
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Tested-by: Darren Hart <dvhart@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Occasionally it would be interesting to read some of the DPCD registers
for debug purposes, without having to resort to logging. Add an i915
specific i915_dpcd debugfs file for DP and eDP connectors to dump parts
of the DPCD. Currently the DPCD addresses to be dumped are statically
configured, and more can be added trivially.
The implementation also makes it relatively easy to add other i915 and
connector specific debugfs files in the future, as necessary.
This is currently i915 specific just because there's no generic way to
do AUX transactions given just a drm_connector. However it's all pretty
straightforward to port to other drivers.
v2: Add more DPCD registers to dump.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Bob Paauwe <bob.j.paauwe@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We make use of HW tracking for Selective update region and enable frame sync on
sink. We use hardware's hardcoded data values for frame sync and GTC.
v2: Add 3200x2000 resolution restriction with PSR2, move psr2_support to i915_psr
struct, add aux_frame_sync to independently control aux frame sync, rename the
TP2 TIME macro for 2500us (Rodrigo, Siva)
v3: Moving the resolution restriction to intel_psr_enable so that we check it
only once(Durga)
Cc: Durgadoss R <durgadoss.r@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Sonika Jindal <sonika.jindal@intel.com>
Reviewed-by: Durgadoss R <durgadoss.r@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This will be helpful for adding future platforms. It is better to keep
the information in the single point of truth (the table) instead of
duplicating it into the validity function.
While at it, add dev_priv parameter to the function, also to prepare for
adding future platform support.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Index the gmbus tables directly using the pin instead of having a
confusing "port = i + 1" mapping. This finishes off removing the "gmbus
port" as a notion, and leaves us with just the "gmbus pin".
As pin 0 is invalid by definition and the gmbus tables will have a gap
at that index, add pin validity check to all the loops. This will be
benefitial for supporting platforms that have different numbers of pins,
or gaps.
v2: s/GMBUS_PIN_MAX/GMBUS_NUM_PINS/ (Ville, Daniel)
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Rename intel_gmbus_is_port_valid to intel_gmbus_is_valid_pin, and rename
port parameters to pin as well. This matches usage all around, as
usually a pin is passed to the validity check function. No functional
changes.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The specs refer to pin pairs. Start moving towards using pin rather than
port all around to avoid confusion. No functional changes.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The request allocation code is largely duplicated between legacy mode and
execlist mode. The actual difference between the two versions of the code is
pretty minimal.
This patch moves the common code out into a separate function. This is then
called by the execution specific version prior to setting up the one different
value.
For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The submission portion of the execbuffer code path was abstracted into a
function pointer indirection as part of the legacy vs execlist work. The two
implementation functions are called 'i915_gem_ringbuffer_submission' and
'intel_execlists_submission' but the pointer was called 'do_execbuf'. There is
already a 'i915_gem_do_execbuffer' function (which is what calls the pointer
indirection). The name of the pointer is therefore considered to be backwards
and should be changed.
This patch renames it to 'execbuf_submit' which is hopefully a bit clearer.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We were missing a convenience stub to aquire the right mutex whilst
dropping the request, so add it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
To allow for views where the view type is not defined by the view type only,
like it is in stereo or rotated 90 degree view, change the semantic to require
the whole view structure for comparison when we match a GGTT view.
This allows including parameters like offset to be included in the view which
is useful for eg. partial views.
v3:
- Rely on ggtt_view type being 0 for non-GGTT vma's, which equals to
I915_GGTT_VIEW_NORMAL. (Daniel Vetter)
- Do not use potentially slower comparison when we only want to know if
something is or is not a normal view.
- Rebase on top of rotated view patches. Add rotated view singleton.
- If one view is missing in comparison they're equal only if both are missing.
v4:
- Use comparison helper in obj_to_ggtt_view too. (Tvrtko Ursulin)
- Do WARN_ON if one view is NULL. (Tvrtko Ursulin)
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is useful for writing igts to make sure we don't break this,
without being forced to own a one of these dinosaurs.
Suggested-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pass a crtc_state to it and find whether the pipe has an encoder of a
given type by looking at the drm_atomic_state the crtc_state points to.
Until recently i9xx_get_refclk() used to be called indirectly from
vlv_force_pll_on() with a dummy crtc_state. That dummy crtc state is not
converted to be part of a full drm atomic state, so add a WARN in case
someone decides to call that again with a such dummy state. This was
removed in
commit 9cbe40c15a
Author: Vijay Purushothaman <vijay.a.purushothaman@linux.intel.com>
Date: Thu Mar 5 19:33:08 2015 +0530
drm/i915: Update prop, int co-eff and gain threshold for CHV
v2: Warn if there is no connectors for a given crtc. (Daniel)
Replace comment i9xx_get_refclk() with a WARN_ON(). (Ander)
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
[danvet: Add commit reference for when i9xx_get_refclk was removed.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Follow up patches will convert some functions called from there to use
the atomic state, instead of directly accessing the new or current
config. This patch just changes the parameters, but shouldn't have any
functional changes.
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This flag was being mostly used as a meta flag in some
cases and not covering other cases.
One of the risks is that it was masking some frontbuffer
trackings without disabling PSR.
So, better to kill this at once and avoid umbrella parameters.
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Drop unused out: label to appease gcc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The faulting virtual address is >32bits and has been moved
to different registers. Add to error state and output upper
register first, in the same line for easy reconstruction of
the fault address.
v2: correct gen masking (Michel)
v3: s/TBL/TLB (Ville)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
To support frame buffer rotation we need to be able to pass on the information
on what kind of GGTT view is required for display.
This patch just adds the parameter and makes all the callers default to the
normal view.
v2: Rebased for ggtt view changes.
v3: Don't limit PIN_MAPPABLE to normal views just yet. (Joonas Lahtinen)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v3)
[danvet: s/BUG/WARN/ in the patch hunk because. At least where the
BUG_ON isn't fatal right away.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Two code changes:
- Extract i915_gem_shrinker_init.
- Inline i915_gem_object_is_purgeable since we open-code it everywhere
else too.
This already has the benefit of pulling all the shrinker code
together, next patch adds a bit of kerneldoc.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Use both up/down manual ei calcuations for symmetry and greater
flexibility for reclocking, instead of faking the down interrupt based
on a fixed integer number of up interrupts.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When we idle, we set the GPU frequency to the hardware minimum (not user
minimum). We introduce a new variable to distinguish between the
different roles, and to allow easy tuning of the idle frequency without
impacting over aspects of RPS. Setting the minimum frequency should be a
safety blanket as the pcu on the GPU should be power gating itself
anyway. However, in order for us to do set the absolute minimum
frequency, we need to relax a few of our assertions that we do not
exceed the user limits.
v2: Add idle_freq
v3: Init idle_freq for vlv and add a bunch of WARNs
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
GGTT views are only applicable when dealing with GGTT. Change the code to
reject ggtt_view where it should not be used and require it when it should
be.
v2:
- Dropped _ppgtt_ infixes, allow both types to be passed
- Disregard other but normal views when no view is specified
- More checks that valid parameters are passed
- More readable error checking
v3:
- Prefer WARN_ONCE over BUG_ON when there is code path for failure
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
[danvet: Drop unecessary forward decl from earlier patch iterations.]
[danvet: Remove unused variable spotted by Tvrtko.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Same as
commit c883ef1b1c
Author: Mika Kuoppala <miku@iki.fi>
Date: Tue Oct 28 17:32:30 2014 +0200
drm/i915: Redefine WARN_ON to include the condition
but for WARN_ON_ONCE. Since the kernel WARN_ON_ONCE actually picks up
*our* version of WARN_ON, we end up with messages like
[ 838.285319] WARN_ON(!__warned)
which are not that helpful.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
For SKL, register definition for RPNSWREQ (A008), RPSTAT1(A01C)
have changed slightly. Also on SKL, frequency is specified in
units of 16.66 MHZ, compared to 50 MHZ for most of the earlier
platforms and the time values are expressed in units of 1.33 us,
compared to 1.28 us for earlier platforms.
Added new macros for the aforementioned changes.
v2: Renamed the GT_FREQ_FROM_PERIOD macro to GT_INTERVAL_FROM_US (Damien)
v3: Removed the implicit use of dev_priv in GT_INTERVAL_FROM_US macro (Chris)
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Assuming the PND deadline mechanism works reasonably we should do
memory requests as early as possible so that PND has schedule the
requests more intelligently. Currently we're still calculating
the watermarks as if VLV/CHV are identical to g4x, which isn't
the case.
The current code also seems to calculate insufficient watermarks
and hence we're seeing some underruns, especially on high resolution
displays.
To fix it just rip out the current code and replace is with something
that tries to utilize PND as efficiently as possible.
We now calculate the WM watermark to trigger when the FIFO still has
256us worth of data. 256us is the maximum deadline value supoorted by
PND, so issuing memory requests earlier would mean we probably couldn't
utilize the full FIFO as PND would attempt to return the data at
least in at least 256us. We also clamp the watermark to at least 8
cachelines as that's the magic watermark that enabling trickle feed
would also impose. I'm assuming it matches some burst size.
In theory we could just enable trickle feed and ignore the WM values,
except trickle feed doesn't work with max fifo mode anyway, so we'd
still need to calculate the SR watermarks. It seems cleaner to just
disable trickle feed and calculate all watermarks the same way. Also
trickle feed wouldn't account for the 256us max deadline value, thoguh
that may be a moot point in non-max fifo mode sicne the FIFOs are fairly
small.
On VLV max fifo mode can be used with either primary or sprite planes.
So the code now also checks all the planes (apart from the cursor)
when calculating the SR plane watermark.
We don't have to worry about the WM1 watermarks since we're using the
PND deadline scheme which means the hardware ignores WM1 values.
v2: Use plane->state->fb instead of plane->fb
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Introduce struct vlv_wm_values to house VLV watermark/drain latency
values. We start by using it when computing the drain latency values.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If we have a single unclaimed register, we will have lots. A WARN for
each one makes the machine unusable and does not aid debugging. Convert
the i915.mmio_debug option to a counter for how many WARNs to fire
before shutting up. Even when i915.mmio_debug was disabled it would
continue to shout an *ERROR* for every interrupt, without any
information at all for debugging.
The massive verbiage was added in
commit 5978118c39
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date: Wed Jul 16 17:49:29 2014 -0300
drm/i915: reorganize the unclaimed register detection code
v2: Automatically enable invalid mmio reporting for the *next* invalid
access if mmio_debug is disabled by default. This should give us clearer
debug information without polluting the logs too much.
v3: Compile fixes, rebase.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
[danvet: Update modparam text per the thread.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Apparently, this has never worked reliably and is currently disabled. Also, the
gains are not particularly impressive. Thus rather than try to keep unused code
from decaying and having to update it for other driver changes, it was decided
to simply remove it.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Kill the blt/render tracking we currently have and use the frontbuffer
tracking infrastructure.
Don't enable things by default yet.
v2: (Rodrigo) Fix small conflict on rebase and typo at subject.
v3: (Paulo) Rebase on RENDER_CS change.
v4: (Paulo) Rebase.
v5: (Paulo) Simplify: flushes don't have origin (Daniel).
Also rebase due to patch order changes.
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We have similar macros for crtcs and encoders, and the pattern happens
often enough to justify the macro.
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We want to port FBC to the frontbuffer tracking infrastructure, but
for that we need to know what caused the object invalidation so
we can react accordingly: CPU mmaps need manual, GTT mmaps and
flips don't need handling and ring rendering needs nukes.
v2: - s/ORIGIN_RENDER/ORIGIN_CS/ (Daniel, Rodrigo)
- Fix copy/pasted wrong documentation
- Rebase
v3: - Rebase
v4: - Don't pass the operation to flushes (Daniel).
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Implicit usage of local variables in macros isn't exactly the greatest
thing in the world, especially when that variable is the drm device and
we want to move towards a broader use of the i915 device structure.
Let's make for_each_sprite() take dev_priv as its first argument then.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Chris Wilson <chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Implicit usage of local variables in macros isn't exactly the greatest
thing in the world, especially when that variable is the drm device and
we want to move towards a broader use of the i915 device structure.
Let's make for_each_plane() take dev_priv as its first argument then.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Chris Wilson <chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJU/NacAAoJEHm+PkMAQRiGdUcIAJU5dHclwd9HRc7LX5iOwYN6
mN0aCsYjMD8Pjx2VcPCgJvkIoESQO5pkwYpFFWCwILup1bVEidqXfr8EPOdThzdh
kcaT0FwUvd19K+0jcKVNCX1RjKBtlUfUKONk6sS2x4RrYZpv0Ur8Gh+yXV8iMWtf
fAusNEYlxQJvEz5+NSKw86EZTr4VVcykKLNvj+/t/JrXEuue7IG8EyoAO/nLmNd2
V/TUKKttqpE6aUVBiBDmcMQl2SUVAfp5e+KJAHmizdDpSE80nU59UC1uyV8VCYdM
qwHXgttLhhKr8jBPOkvUxl4aSXW7S0QWO8TrMpNdEOeB3ZB8AKsiIuhe1JrK0ro=
=Xkue
-----END PGP SIGNATURE-----
Merge tag 'v4.0-rc3' into drm-next
Linux 4.0-rc3 backmerge to fix two i915 conflicts, and get
some mainline bug fixes needed for my testing box
Conflicts:
drivers/gpu/drm/i915/i915_drv.h
drivers/gpu/drm/i915/intel_display.c
- Y tiling support for scanout from Tvrtko&Damien
- Remove more UMS support
- some small prep patches for OLR removal from John Harrison
- first few patches for dynamic pagetable allocation from Ben Widawsky, rebased
by tons of other people
- DRRS support patches (Sonika&Vandana)
- fbc patches from Paulo
- make sure our vblank callbacks aren't called when the pipes are off
- various patches all over
* tag 'drm-intel-next-2015-02-27' of git://anongit.freedesktop.org/drm-intel: (61 commits)
drm/i915: Update DRIVER_DATE to 20150227
drm/i915: Clarify obj->map_and_fenceable
drm/i915/skl: Allow Y (and Yf) frame buffer creation
drm/i915/skl: Update watermarks for Y tiling
drm/i915/skl: Updated watermark programming
drm/i915/skl: Adjust get_plane_config() to support Yb/Yf tiling
drm/i915/skl: Teach pin_and_fence_fb_obj() about Y tiling constraints
drm/i915/skl: Adjust intel_fb_align_height() for Yb/Yf tiling
drm/i915/skl: Allow scanning out Y and Yf fbs
drm/i915/skl: Add new displayable tiling formats
drm/i915: Remove DRIVER_MODESET checks from modeset code
drm/i915: Remove regfile code&data for UMS suspend/resume
drm/i915: Remove DRIVER_MODESET checks from gem code
drm/i915: Remove DRIVER_MODESET checks in the gpu reset code
drm/i915: Remove DRIVER_MODESET checks from suspend/resume code
drm/i915: Remove DRIVER_MODESET checks in load/unload/close code
drm/i915: fix a printk format
drm/i915: Add media rc6 residency file to sysfs
drm/i915: Add missing description to parameter in alloc_pt_range
drm/i915: Removed the read of RP_STATE_CAP from sysfs/debugfs functions
...
- use the atomic helpers for plane_upate/disable hooks (Matt Roper)
- refactor the initial plane config code (Damien)
- ppgtt prep patches for dynamic pagetable alloc (Ben Widawsky, reworked and
rebased by a lot of other people)
- framebuffer modifier support from Tvrtko Ursulin, drm core code from Rob Clark
- piles of workaround patches for skl from Damien and Nick Hoath
- vGPU support for xengt on the client side (Yu Zhang)
- and the usual smaller things all over
* tag 'drm-intel-next-2015-02-14' of git://anongit.freedesktop.org/drm-intel: (88 commits)
drm/i915: Update DRIVER_DATE to 20150214
drm/i915: Remove references to previously removed UMS config option
drm/i915/skl: Use a LRI for WaDisableDgMirrorFixInHalfSliceChicken5
drm/i915/skl: Fix always true comparison in a revision id check
drm/i915/skl: Implement WaEnableLbsSlaRetryTimerDecrement
drm/i915/skl: Implement WaSetDisablePixMaskCammingAndRhwoInCommonSliceChicken
drm/i915: Add process identifier to requests
drm/i915/skl: Implement WaBarrierPerformanceFixDisable
drm/i915/skl: Implement WaCcsTlbPrefetchDisable:skl
drm/i915/skl: Implement WaDisableChickenBitTSGBarrierAckForFFSliceCS
drm/i915/skl: Implement WaDisableHDCInvalidation
drm/i915/skl: Implement WaDisableLSQCROPERFforOCL
drm/i915/skl: Implement WaDisablePartialResolveInVc
drm/i915/skl: Introduce a SKL specific init_workarounds()
drm/i915/skl: Document that we implement WaRsClearFWBitsAtReset
drm/i915/skl: Implement WaSetGAPSunitClckGateDisable
drm/i915/skl: Make the init clock gating function skylake specific
drm/i915/skl: Provide a gen9 specific init_render_ring()
drm/i915/skl: Document the WM read latency W/A with its name
drm/i915/skl: Also detect eDRAM on SKL
...
Lots of lines to remove!
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[danvet: Fixup makefile.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In execlist mode, the ringbuf is a function of the ring and context whereas in
legacy mode, it is derived from the ring alone. Thus the calculation required to
determine the ringbuf pointer from the ring (and context) also needs to test
execlist mode or not. This is messy.
Further, the request structure holds a pointer to both the ring and the context
for which it was created. Thus, given a request, it is possible to derive the
ringbuf in either legacy or execlist mode. Hence it is necessary to pass just
the request in to all the low level functions rather than some combination of
request, ring, context and ringbuf. However, rather than recalculating it each
time, it is much simpler to just cache the ringbuf pointer in the request
structure itself.
Caching the pointer means the calculation is done once at request creation time
and all further code and simply read it directly from the request structure.
OTC-Jira: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
[danvet: Drop contentless comment in lrc alloc request entirely. And
spelling fix in the commit message.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: Adding VBT version check for low_vswing field, and correcting parsing
v3: (Damien)
- Restrain the scope of the 'vswing' variable
- Use the more idiomatic "ev_priv->vbt.edp_low_vswing = vswing == 0;"
instead of if (foo) var = true; else var = false;
- Shorten edp_vswing_premph_setting to edp_vswing_premph to fit in 80 chars
- Add the version from which the edp_vswing_premph field is valid in the
struct definition
Reviewed-by: Satheeshakrishna M <satheeshakrishna.m@intel.com> (v2)
Signed-off-by: Sonika Jindal <sonika.jindal@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When converting from implicitly tracked execlist queue items to ref counted
requests, not all frees of requests were replaced with unrefs, and extraneous
refs/unrefs of contexts were added.
Correct the unbalanced refcount & replace the frees.
Remove a noisy warning when hitting the request creation path.
drm_i915_gem_request and intel_context are both kref reference counted
structures. Upon allocation, drm_i915_gem_request's ref count should be
bumped using kref_init. When a context is assigned to the request,
the context's reference count should be bumped using i915_gem_context_reference.
i915_gem_request_reference will reduce the context reference count when
the request is freed.
Problem introduced in
commit 6d3d8274bc
Author: Nick Hoath <nicholas.hoath@intel.com>
AuthorDate: Thu Jan 15 13:10:39 2015 +0000
drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request
v2: Added comments explaining how the ctx pointer and the request object should
be ref-counted. Removed noisy warning.
v3: Cleaned up the language used in the commit & the header
description (Thanks David Gordon)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88652
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
When one EU is disabled in a particular subslice, we can tune how the
work is spread between subslices to improve EU utilization.
v2: - Use a bitfield to record which subslice(s) has(have) 7 EUs. That
will also make the machinery work if several sublices have 7 EUs.
(Jeff Mcgee)
- Only apply the different hashing algorithm if the slice is
effectively unbalanced by checking there's a single subslice with
7 EUs. (Jeff Mcgee)
v3: Fix typo in comment (Jeff Mcgee)
Issue: VIZ-3845
Cc: Jeff Mcgee <jeff.mcgee@intel.com>
Reviewed-by: Jeff Mcgee <jeff.mcgee@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Read fuse registers to determine the available slice total,
subslice total, subslice per slice, EU total, and EU per subslice
counts of the SKL device. The EU per subslice attribute is more
precisely defined as the maximum EU available on any one subslice,
since available EU counts may vary across subslices due to fusing.
Set flags indicating the SKL device's slice/subslice/EU (SSEU)
power gating capability. Make all values available via debugfs
entry 'i915_sseu_status'.
v2: Several small clean-ups suggested by Damien. Most notably,
used smaller types for the new device info fields to reduce
memory usage and improved the clarity/readability of the
method used to extract attribute values from the fuse
registers.
Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When reviewing patch that fixes VGA on BDW Halo Jani noticed that
we also had other ULT IDs that weren't listed there.
So this follow-up patch add these pci-ids as halo and fix comments
on i915_pciids.h
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Pull drm updates from Dave Airlie:
"This is the main drm pull, it has a shared branch with some alsa
crossover but everything should be acked by relevant people.
New drivers:
- ATMEL HLCDC driver
- designware HDMI core support (used in multiple SoCs).
core:
- lots more atomic modesetting work, properties and atomic ioctl
(hidden under option)
- bridge rework allows support for Samsung exynos chromebooks to
work finally.
- some more panels supported
i915:
- atomic plane update support
- DSI uses shared DSI infrastructure
- Skylake basic support is all merged now
- component framework used for i915/snd-hda interactions
- write-combine cpu memory mappings
- engine init code refactored
- full ppgtt enabled where execlists are enabled.
- cherryview rps/gpu turbo and pipe CRC support.
radeon:
- indirect draw support for evergreen/cayman
- SMC and manual fan control for SI/CI
- Displayport audio support
amdkfd:
- SDMA usermode queue support
- replace suballocator usage with more suitable one
- rework for allowing interfacing to more than radeon
nouveau:
- major renaming in prep for later splitting work
- merge arm platform driver into nouveau
- GK20A reclocking support
msm:
- conversion to atomic modesetting
- YUV support for mdp4/5
- eDP support
- hw cursor for mdp5
tegra:
- conversion to atomic modesetting
- better suspend/resume support for child devices
rcar-du:
- interlaced support
imx:
- move to using dw_hdmi shared support
- mode_fixup support
sti:
- DVO support
- HDMI infoframe support
exynos:
- refactoring and cleanup, removed lots of internal unnecessary
abstraction
- exynos7 DECON display controller support
Along with the usual bunch of fixes, cleanups etc"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (724 commits)
drm/radeon: fix voltage setup on hawaii
drm/radeon/dp: Set EDP_CONFIGURATION_SET for bridge chips if necessary
drm/radeon: only enable kv/kb dpm interrupts once v3
drm/radeon: workaround for CP HW bug on CIK
drm/radeon: Don't try to enable write-combining without PAT
drm/radeon: use 0-255 rather than 0-100 for pwm fan range
drm/i915: Clamp efficient frequency to valid range
drm/i915: Really ignore long HPD pulses on eDP
drm/exynos: Add DECON driver
drm/i915: Correct the base value while updating LP_OUTPUT_HOLD in MIPI_PORT_CTRL
drm/i915: Insert a command barrier on BLT/BSD cache flushes
drm/i915: Drop vblank wait from intel_dp_link_down
drm/exynos: fix NULL pointer reference
drm/exynos: remove exynos_plane_dpms
drm/exynos: remove mode property of exynos crtc
drm/exynos: Remove exynos_plane_dpms() call with no effect
drm/i915: Squelch overzealous uncore reset WARN_ON
drm/i915: Take runtime pm reference on hangcheck_info
drm/i915: Correct the IOSF Dev_FN field for IOSF transfers
drm/exynos: fix DMA_ATTR_NO_KERNEL_MAPPING usage
...
We use the pid of the process which opened our device when
we track which was the culprit of the gpu hang. But as that
file descriptor might get inherited, we might blame the
wrong process when we record the error state.
Track process identifiers in requests to always find
the correct offender.
v2: Track only user processes (Chris)
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: drop NULL check before put_pid as suggested by Chris.]
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The last (only?) user of this was removed in:
commit 2208d655a9
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Fri Nov 14 09:25:29 2014 +0100
drm/i915: drop WaSetupGtModeTdRowDispatch:snb
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There have been quite a bit of development lately, leaving behing lonely
protypes. Time to bid them farewell.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Introduce a PV INFO structure, to facilitate the Intel GVT-g
technology, which is a GPU virtualization solution with mediated
pass-through. This page contains the shared information between
i915 driver and the host emulator. For now, this structure utilizes
an area of 4K bytes on HSW GPU's unused MMIO space. Future hardware
will have the reserved window architecturally defined, and layout
of the page will be added in future BSpec.
The i915 driver load routine detects if it is running in a VM by
reading the contents of this PV INFO page. Thereafter a flag,
vgpu.active is set, and intel_vgpu_active() is used by checking
this flag to conclude if GPU is virtualized with Intel GVT-g. By
now, intel_vgpu_active() will return true, only when the driver
is running as a guest in the Intel GVT-g enhanced environment on
HSW platform.
v2:
take Chris' comments:
- call the i915_check_vgpu() in intel_uncore_init()
- sanitize i915_check_vgpu() by adding BUILD_BUG_ON() and debug info
take Daniel's comments:
- put the definition of PV INFO into a new header - i915_vgt_if.h
other changes:
- access mmio regs by readq/readw in i915_check_vgpu()
v3:
take Daniel's comments:
- move the i915/vgt interfaces into a new i915_vgpu.c
- update makefile
- add kerneldoc to functions which are non-static
- add a DOC: section describing some of the high-level design
- update drm docbook
other changes:
- rename i915_vgt_if.h to i915_vgpu.h
v4:
take Tvrtko's comments:
- fix a typo in commit message
- add debug message when vgt version mismatches
- rename low_gmadr/high_gmadr to mappable/non-mappable in PV INFO
structure
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Jike Song <jike.song@intel.com>
Signed-off-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>