linux

Author	SHA1	Message	Date
Oscar Mateo	821d66dd7c	drm/i915: Emphasize that ctx->id is merely a user handle This is an Execlists preparatory patch, since they make context ID become an overloaded term: - In the software, it was used to distinguish which context userspace was trying to use. - In the BSpec, the term is used to describe the 20-bits long field the hardware uses to it to discriminate the contexts that are submitted to the ELSP and inform the driver about their current status (via Context Switch Interrupts and Context Status Buffers). Initially, I tried to make the different meanings converge, but it proved impossible: - The software ctx->id is per-filp, while the hardware one needs to be globally unique. - Also, we multiplex several backing states objects per intel_context, and all of them need unique HW IDs. - I tried adding a per-filp ID and then composing the HW context ID as: ctx->id + file_priv->id + ring->id, but the fact that the hardware only uses 20-bits means we have to artificially limit the number of filps or contexts the userspace can create. The ctx->user_handle renaming bits are done with this Cocci patch (plus manual frobbing of the struct declaration): @@ struct intel_context c; @@ - (c).id + c.user_handle @@ struct intel_context *c; @@ - (c)->id + c->user_handle Also, while we are at it, s/DEFAULT_CONTEXT_ID/DEFAULT_CONTEXT_HANDLE and change the type to unsigned 32 bits. v2: s/handle/user_handle and change the type to uint32_t as suggested by Chris Wilson. Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v1) Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-08 12:30:41 +02:00
Oscar Mateo	ea0c76f8c3	drm/i915: Emphasize that ctx->obj & ctx->is_initialized refer to the legacy rcs ctx We have already advanced that Logical Ring Contexts have their own kind of backing objects, but everything will be better explained in the Execlists series. For now, suffice it to say that the current backing object is only ever used with the render ring, so we're making this fact more explicit (which is a good reason on its own). As for the is_initialized flag, we only use to signify that the render state has been initialized (a.k.a. golden context, a.k.a. null context). It doesn't mean anything for the other engines, so make that distinction obvious. Done with the following Coccinelle patch (plus manual frobbing of the struct): @@ struct intel_context c; @@ - (c).obj + c.legacy_hw_ctx.rcs_state @@ struct intel_context c; @@ - (c)->obj + c->legacy_hw_ctx.rcs_state @@ struct intel_context c; @@ - (c).is_initialized + c.legacy_hw_ctx.initialized @@ struct intel_context c; @@ - (c)->is_initialized + c->legacy_hw_ctx.initialized This Execlists prep-work patch has been suggested by Chris Wilson and Daniel Vetter separately. Initially, it was two separate patches: drm/i915: Rename ctx->obj to ctx->rcs_state drm/i915: Make it obvious that ctx->id is merely a user handle Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: s/id/is_initialized/ to fix the subject and resolve a conflict in i915_gem_context_reset. Also introduce a new lctx local variable to avoid overtly long lines.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-08 12:30:35 +02:00
Ben Widawsky	0ca36d7839	drm/i915/bdw: collect semaphore error state Since the semaphore information is in an object, just dump it, and let the user parse it later. NOTE: The page being used for the semaphores are incoherent with the CPU. No matter what I do, I cannot figure out a way to read anything but 0s. Note that the semaphore waits are indeed working. v2: Don't print signal, and wait (they should be the same). Instead, print sync_seqno (Chris) v3: Free the semaphore error object (Chris) v4: Fix semaphore offset calculation during error state collection (Ville) v5: VCS2 rebase Make semaphore object error capture coding style consistent (Ville) Do the proper math for the signal offset (Ville) v6: Fix small conflicts on rebase and s/ring_buffer/engine_cs (Rodrigo) Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-07 23:16:54 +02:00
Ben Widawsky	3e78998a58	drm/i915/bdw: implement semaphore signal Semaphore signalling works similarly to previous GENs with the exception that the per ring mailboxes no longer exist. Instead you must define your own space, somewhere in the GTT. The comments in the code define the layout I've opted for, which should be fairly future proof. Ie. I tried to define offsets in abstract terms (NUM_RINGS, seqno size, etc). NOTE: If one wanted to move this to the HWSP they could. I've decided one 4k object would be easier to deal with, and provide potential wins with cache locality, but that's all speculative. v2: Update the macro to not need the other ring's ring->id (Chris) Update the comment to use the correct formula (Chris) v3: Move the macros the ringbuffer.h to prevent churn in next patch (Ville) v4: Fixed compilation rebase conflict commit `1ec9e26dda` Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Fri Feb 14 14:01:11 2014 +0100 drm/i915: Consolidate binding parameters into flags v5: VCS2 rebase Replace hweight_long with hweight32 v6 (Rodrigo): * Add missed VC2 gen8 ring signal init * fixing conflicst on rebase * minor fixes on address table * remove WARN_ON Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [danvet: s/BUG_ON/WARN_ON/] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-07 22:16:23 +02:00
John Harrison	2885f6ac07	drm/i915: Corrected 'file_priv' to 'file' in 'i915_driver_preclose()' The 'i915_driver_preclose()' function has a parameter called 'file_priv'. However, this is misleading as the structure it points to is a 'drm_file' not a 'drm_i915_file_private'. It should be named just 'file' to avoid confusion. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-07 20:00:44 +02:00
Dave Airlie	13cf550448	drm/i915: rework digital port IRQ handling (v2) The digital ports from Ironlake and up have the ability to distinguish between long and short HPD pulses. Displayport 1.1 only uses the short form to request link retraining usually, so we haven't really needed support for it until now. However with DP 1.2 MST we need to handle the short irqs on their own outside the modesetting locking the long hpd's involve. This patch adds the framework to distinguish between short/long to the current code base, to lay the basis for future DP 1.2 MST work. This should mean we get better bisectability in case of regression due to the new irq handling. v2: add GM45 support (untested, due to lack of hw) Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Todd Previte <tprevite@gmail.com> [danvet: Fix conflicts in i915_irq.c with Oscar Mateo's irq handling race fixes and a trivial one in intel_drv.h with the psr code.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-07 15:08:51 +02:00
Imre Deak	5209b1f4c4	drm/i915: gmch: factor out intel_set_memory_cxsr This functionality will be also needed by an upcoming patch, so factor it out. As a bonus this also makes things a bit more uniform across platforms. Note that this also changes the register read-modify-write to a simple write during disabling. This is what we do during enabling anyway and according to the spec all the relevant bits are reserved-MBZ or reserved with a 0 default value. v2: - unchanged v3: - fix missing cxsr disabling on pineview (Deepak) Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Deepak S <deepak.s@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-07 11:33:36 +02:00
Daniel Vetter	f1615bbe9b	Linux 3.16-rc4 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJTuaWZAAoJEHm+PkMAQRiGfkIH/2Hhwrg51GWazUYIXVxz5zLU kPMlaws3vankbhka9HCg02eS3tkzr6shO3F/qlBba+5GUkUDKCcCisIsvk4hgZZg 7YqepTvcaupNxIp4TmTGm1FYVK1GpaWFdJVgg2PDdGFahw3HSlfZoTkBzirNCwga p/jfeRzathbUixpz9OAC1AEn2gP1AxNRpSt1wShL5rexBb1YRXCPuCEt9B0UsVoR mzKf5xEsuaZnpCuvWK4S60fjfVhTe8UJ/xGPPfdLyIXU0rvhaKzfeVQO6F5nIQBy Xvrar1f7oOPZaJRdlmPvAimS7iS8lq/YctuHu7ia1NdJSihtA5sRPf7cWAw2d7s= =4PrL -----END PGP SIGNATURE----- Merge tag 'v3.16-rc4' into drm-intel-next-queued Due to Dave's vacation drm-next hasn't opened yet for 3.17 so I couldn't move my drm-intel-next queue forward yet like I usually do. Just pull in the latest upstream -rc to unblock patch merging - I don't want to needlessly rebase my current patch pile really and void all the testing we've done already. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-07 10:17:56 +02:00
Daniel Vetter	cfb3c0ab09	Merge patches merged by Jani while I was on vacation. Jani apparently didn't rebase onto latest drm-intel-next so a merge is in order. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-07-07 10:15:44 +02:00
Ben Widawsky	5e59f7175f	drm/i915: Try harder to get FBC The GEN FBC unit provides the ability to set a low pass on frames it attempts to compress. If a frame is less than a certain amount compressibility (2:1, 4:1) it will not bother. This allows the driver to reduce the size it requests out of stolen memory. Unluckily, a few months ago, Ville actually began using this feature for framebuffers that are 16bpp (not sure why not 8bpp). In those cases, we are already using this mechanism for a different purpose, and so we can only achieve one further level of compression (2:1 -> 4:1) FBC GEN1, ie. pre-G45 is ignored. The cleverness of the patch is Art's. The bugs are mine. v2: Update message and including missing threshold case 3 (Spotted by Arthur). Cc: Art Runyan <arthur.j.runyan@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2014-07-03 11:27:57 +03:00
Ben Widawsky	c4213885cd	drm/i915: Move compressed_fb to static allocation We are already using the size to determine whether or not to free the object, so there is no functional change there. Almost everything else has changed to static allocations of the drm_mm_node too. Aside from bringing this inline with much of our other code, this makes error paths slightly simpler, which benefits the look of an upcoming patch. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2014-07-03 11:27:08 +03:00
Imre Deak	bfafe93a1c	drm/i915: cache hw power well enabled state Jesse noticed that the punit communication needed to query the VLV power well status can cause substantial delays. Since we can query the state frequently, for example during I2C transfers, maintain a cached version of the HW state to get rid of this delay. This fixes at least one reported regression where boot time increased by ~4 seconds due to frequent power well state queries on VLV during eDP EDID read. This regression has been introduced in commit `bb4932c4f1` Author: Imre Deak <imre.deak@intel.com> Date: Mon Apr 14 20:24:33 2014 +0300 drm/i915: vlv: check port power domain instead of only D0 for eDP VDD on Reported-by: Jesse Barnes <jesse.barnes@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-23 10:02:03 +03:00
Daniel Vetter	34882298b9	drm/i915: Update DRIVER_DATE to 20140620 Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-20 10:36:06 +02:00
Daniel Vetter	f99d70690e	drm/i915: Track frontbuffer invalidation/flushing So these are the guts of the new beast. This tracks when a frontbuffer gets invalidated (due to frontbuffer rendering) and hence should be constantly scaned out, and when it's flushed again and can be compressed/one-shot-upload. Rules for flushing are simple: The frontbuffer needs one more full upload starting from the next vblank. Which means that the flushing can _only_ be called once the frontbuffer update has been latched. But this poses a problem for pageflips: We can't just delay the flushing until the pageflip is latched, since that would pose the risk that we override frontbuffer rendering that has been scheduled in-between the pageflip ioctl and the actual latching. To handle this track asynchronous invalidations (and also pageflip) state per-ring and delay any in-between flushing until the rendering has completed. And also cancel any delayed flushing if we get a new invalidation request (whether delayed or not). Also call intel_mark_fb_busy in both cases in all cases to make sure that we keep the screen at the highest refresh rate both on flips, synchronous plane updates and for frontbuffer rendering. v2: Lots of improvements Suggestions from Chris: - Move invalidate/flush in flush__domain and set_to__domain. - Drop the flush in busy_ioctl since it's redundant. Was a leftover from an earlier concept to track flips/delayed flushes. - Don't forget about the initial modeset enable/final disable. Suggested by Chris. Track flips accurately, too. Since flips complete independently of rendering we need to track pending flips in a separate mask. Again if an invalidate happens we need to cancel the evenutal flush to avoid races. v3: Provide correct header declarations for flip functions. Currently not needed outside of intel_display.c, but part of the proper interface. v4: Add proper domain management to fbcon so that the fbcon buffer is also tracked correctly. v5: Fixup locking around the fbcon set_to_gtt_domain call. v6: More comments from Chris: - Split out fbcon changes. - Drop superflous checks for potential scanout before calling intel_fb functions - we can micro-optimize this later. - s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem object. We already have precedence for fb_obj in the pin_and_fence functions. v7: Clarify the semantics of the flip flush handling by renaming things a bit: - Don't go through a gem object but take the relevant frontbuffer bits directly. These functions center on the plane, the actual object is irrelevant - even a flip to the same object as already active should cause a flush. - Add a new intel_frontbuffer_flip for synchronous plane updates. It currently just calls intel_frontbuffer_flush since the implemenation differs. This way we achieve a clear split between one-shot update events on one side and frontbuffer rendering with potentially a very long delay between the invalidate and flush. Chris and I also had some discussions about mark_busy and whether it is appropriate to call from flush. But mark busy is a state which should be derived from the 3 events (invalidate, flush, flip) we now have by the users, like psr does by tracking relevant information in psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for frontbuffer) needs to have similar logic. With that the overall mark_busy in the core could be removed. v8: Only when retiring gpu buffers only flush frontbuffer bits we actually invalidated in a batch. Just for safety since before any additional usage/invalidate we should always retire current rendering. Suggested by Chris Wilson. v9: Actually use intel_frontbuffer_flip in all appropriate places. Spotted by Chris. v10: Address more comments from Chris: - Don't call _flip in set_base when the crtc is inactive, avoids redunancy in the modeset case with the initial enabling of all planes. - Add comments explaining that the initial/final plane enable/disable still has work left to do before it's fully generic. v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris. v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-19 18:14:47 +02:00
Daniel Vetter	cc36513ca3	drm/i915: Use new frontbuffer bits to increase pll clock The downclocking checks a few more things, so not that simple to convert. Also, this should get unified with the drrs handling and also use the locking of that. Otoh the drrs locking is about as hapzardous as no locking, at least on first sight. For easier conversion ditch the upclocking on unload - we'll turn off everything anyway. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-19 18:13:40 +02:00
Daniel Vetter	a071fa0064	drm/i915: Introduce accurate frontbuffer tracking So from just a quick look we seem to have enough information to accurately figure out whether a given gem bo is used as a frontbuffer and where exactly: We have obj->pin_count as a first check with no false negatives and only negligible false positives. And then we can just walk the modeset objects and figure out where exactly a buffer is used as scanout. Except that we can't due to locking order: If we already hold dev->struct_mutex we can't acquire any modeset locks, so could potential chase freed pointers and other evil stuff. So we need something else. For that introduce a new set of bits obj->frontbuffer_bits to track where a buffer object is used. That we can then chase without grabbing any modeset locks. Of course the consumers of this (DRRS, PSR, FBC, ...) still need to be able to do their magic both when called from modeset and from gem code. But that can be easily achieved by adding locks for these specific subsystems which always nest within either kms or gem locking. This patch just adds the relevant update code to all places. Note that if we ever support multi-planar scanout targets then we need one frontbuffer tracking bit per attachment point that we expose to userspace. v2: - Fix more oopsen. Oops. - WARN if we leak obj->frontbuffer_bits when freeing a gem buffer. Fix the bugs this brought to light. - s/update_frontbuffer_bits/update_fb_bits/. More consistent with the fb tracking functions (fb for gem object, frontbuffer for raw bits). And the function name was way too long. v3: Size obj->frontbuffer_bits correctly so that all pipes fit in. v4: Don't update fb bits in set_base on failure. Noticed by Chris. v5: s/i915_gem_update_fb_bits/i915_gem_track_fb/ Also remove a few local enum pipe variables which are now no longer needed to make the function arguments no drop over the 80 char limit. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-19 10:04:41 +02:00
Oscar Mateo	14d8ec544f	drm/i915: Remove ctx->last_ring The original comment that introduced it said: commit `0009e46cd5` Author: Ben Widawsky <ben@bwidawsk.net> Date: Fri Dec 6 14:11:02 2013 -0800 drm/i915: Track which ring a context ran on Previously we dropped the association of a context to a ring. It is however very important to know which ring a context ran on (we could have reused the other member, but I was nitpicky). This is very important when we switch address spaces, which unlike context objects, do change per ring. As an example, if we have: RCS BCS ctx A ctx A ctx B ctx B Without tracking the last ring B ran on, we wouldn't know to switch the address space on BCS in the last row. But this is not really true, because we are already checking to != from (with "from" being = ring->last_context) and that should be enough to make sure we switch to the right address space. We would have a problem if we switched the context object for every ring (since then we would fail to do it in some situations) but we only switch it for the render ring, so we don't care. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-18 21:42:42 +02:00
Daniel Vetter	5d0cf3d6e0	Merge branch 'topic/soix' into drm-intel-next-queued Jesse's SOix work required some patches from acpi-next, so pull it in through a topic barnch. Conflicts: drivers/gpu/drm/i915/i915_drv.c drivers/gpu/drm/i915/intel_pm.c Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-18 11:44:05 +02:00
Sourab Gupta	84c33a64b4	drm/i915: Replaced Blitter ring based flips with MMIO flips This patch enables the framework for using MMIO based flip calls, in contrast with the CS based flip calls which are being used currently. MMIO based flip calls can be enabled on architectures where Render and Blitter engines reside in different power wells. The decision to use MMIO flips can be made based on workloads to give 100% residency for Media power well. v2: The MMIO flips now use the interrupt driven mechanism for issuing the flips when target seqno is reached. (Incorporating Ville's idea) v3: Rebasing on latest code. Code restructuring after incorporating Damien's comments v4: Addressing Ville's review comments -general cleanup -updating only base addr instead of calling update_primary_plane -extending patch for gen5+ platforms v5: Addressed Ville's review comments -Making mmio flip vs cs flip selection based on module parameter -Adding check for DRIVER_MODESET feature in notify_ring before calling notify mmio flip. -Other changes mostly in function arguments v6: -Having a seperate function to check condition for using mmio flips (Ville) -propogating error code from i915_gem_check_olr (Ville) v7: -Adding __must_check with i915_gem_check_olr (Chris) -Renaming mmio_flip_data to mmio_flip (Chris) -Rebasing on latest nightly v8: -Rebasing on latest code -squash 3rd patch in series(mmio setbase vs page flip race) with this patch -Added new tiling mode update in intel_do_mmio_flip (Chris) v9: -check for obj->last_write_seqno being 0 instead of obj->ring being NULL in intel_postpone_flip, as this is a more restrictive condition (Chris) v10: -Applied Chris's suggestions for squashing patches 2,3 into this patch. These patches make the selection of CS vs MMIO flip at the page flip time, and make the module parameter for using mmio flips as tristate, the states being 'force CS flips', 'force mmio flips', 'driver discretion'. Changed the logic for driver discretion (Chris) v11: Minor code cleanup(better readability, fixing whitespace errors, using lockdep to check mutex locked status in postpone_flip, removal of __must_check in function definition) (Chris) Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com> Signed-off-by: Akash Goel <akash.goel@intel.com> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> # snb, ivb [danvet: Fix up parameter alignement checkpatch spotted.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-17 16:16:20 +02:00
Akash Goel	24f3a8cf77	drm/i915: Added write-enable pte bit supportt This adds support for a write-enable bit in the entry of GTT. This is handled via a read-only flag in the GEM buffer object which is then used to see how to set the bit when writing the GTT entries. Currently by default the Batch buffer & Ring buffers are marked as read only. v2: Moved the pte override code for read-only bit to 'byt_pte_encode'. (Chris) Fixed the issue of leaving 'gt_old_ro' as unused. (Chris) v3: Removed the 'gt_old_ro' field, now setting RO bit only for Ring Buffers(Daniel). v4: Added a new 'flags' parameter to all the pte(gen6) encode & insert_entries functions, in lieu of overloading the cache_level enum (Daniel). v5: Removed the superfluous VLV check & changed the definition location of PTE_READ_ONLY flag (Imre) Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Akash Goel <akash.goel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-17 09:21:47 +02:00
Rodrigo Vivi	7c8f8a7007	drm/i915: Force PSR exit by inactivating it. The perfect solution for psr_exit is the hardware tracking the changes and doing the psr exit by itself. This scenario works for HSW and BDW with some environments like Gnome and Wayland. However there are many other scenarios that this isn't true. Mainly one right now is KDE users on HSW and BDW with PSR on. User would miss many screen updates. For instances any key typed could be seen only when mouse cursor is moved. So this patch introduces the ability of trigger PSR exit on kernel side on some common cases that. Most of the cases are coverred by psr_exit at set_domain. The remaining cases are coverred by triggering it at set_domain, busy_ioctl, sw_finish and mark_busy. The downside here might be reducing the residency time on the cases this already work very wall like Gnome environment. But so far let's get focused on fixinge issues sio PSR couild be used for everybody and we could even get it enabled by default. Later we can add some alternatives to choose the level of PSR efficiency over boot flag of even over crtc property. v2: remove exit from connector_dpms. Daniel pointed this is the wrong way and also this isn't needed for BDW and HSW anyway. Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Vijay Purushothaman <vijay.a.purushothaman@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-13 21:21:36 +02:00
Daniel Vetter	75a91c975b	drm/i915: Update DRIVER_DATE to 20140606 Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-13 15:17:40 +02:00
Imre Deak	10018603a4	drm/i915: preserve user forcewake over system suspend/resume Atm, the forcewake refcount will be incorrectly set to zero during system suspend if there is any reference held via the i915_forcewake_user debugfs entry. Fix this by simply not zeroing the sw counters during suspend and restoring the original state using them. Note that the only other places where we zeroed the counters were driver load and unload time, where it was redundant anyway. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78059 Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-13 15:17:40 +02:00
Chris Wilson	ef0cf27c4d	drm/i915: Use the .release hook to drop the stolen drm_mm tracking Now that we have a release hook into i915_gem_object_free, we can move the explicit call to the internal stolen function and hook it up throught the callback instead. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-13 15:17:36 +02:00
Jesse Barnes	156c7ca081	drm/i915: leave rc6 enabled at suspend time v4 This allows the system to enter the lowest power mode during system freeze. v2: delete force wake timer at suspend (Imre) v3: add GT work suspend function (Imre) v4: use uncore forcewake reset (Daniel) Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-12 17:47:05 +02:00
Jesse Barnes	7365fb7896	drm/i915: enable PPGTT on VLV Working for real this time. i915_ppgtt_info has all sorts of good stuff in it and X is running nicely on top. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-11 16:57:40 +02:00
Ville Syrjälä	2d401b175f	drm/i915: Don't use pipe_offset stuff for DPLL registers These are just single registers so wasting space for the pipe offsets seems a bit pointless. So just use the _PIPE3() macro instead. Also rewrite the _PIPE3() macro to be more obvious, and protect the arguments properly. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Frob conflict.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-11 16:57:31 +02:00
Rodrigo Vivi	6118efe596	drm/i915: move psr_setup_done to psr struct "Because our driver assumes only one panel is PSR capable, and we already have other PSR information on dev_priv instead of intel_dp. If we ever support multiple PSR panels, we'll have to move struct i915_psr to intel_dp anyway." (by Paulo) v2: Avoid more than one setup. Removing initialization and trusting allocation. (By Paulo Zanoni). v3: rebase. v4: Adding comment. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-11 16:57:23 +02:00
Dave Airlie	ecb889e620	Merge tag 'drm-intel-fixes-2014-06-06' of git://anongit.freedesktop.org/drm-intel into drm-next > Bunch of stuff for 3.16 still: > - Mipi dsi panel support for byt. Finally! From Shobhit&others. I've > squeezed this in since it's a regression compared to vbios and we've > been ridiculed about it a bit too often ... > - connection_mutex deadlock fix in get_connector (only affects i915). > - Core patches from Matt's primary plane from Matt Roper, I've pushed the > i915 stuff to 3.17. > - vlv power well sequencing fixes from Jesse. > - Fix for cursor size changes from Chris. > - agpbusy fixes from Ville. > - A few smaller things. > * tag 'drm-intel-fixes-2014-06-06' of git://anongit.freedesktop.org/drm-intel: (32 commits) drm/i915: BDW: Adding missing cursor offsets. drm: Fix getconnector connection_mutex locking drm/i915/bdw: Only use 2g GGTT for 32b platforms drm/i915: Nuke pipe A quirk on i830M drm/i915: fix display power sw state reporting drm/i915: Always apply cursor width changes drm/i915: tell the user if both KMS and UMS are disabled drm/plane-helper: Add drm_plane_helper_check_update() (v3) drm: Check CRTC compatibility in setplane drm/i915: use VBT to determine whether to enumerate the VGA port drm/i915: Don't WARN about ring idle bit on gen2 drm/i915: Silence the WARN if the user tries to GTT mmap an incoherent object drm/i915: Move the C3 LP write bit setup to gen3_init_clock_gating() for KMS drm/i915: Enable interrupt-based AGPBUSY# enable on 85x drm/i915: Flip the sense of AGPBUSY_DIS bit drm/i915: Set AGPBUSY# bit in init_clock_gating drm/i915/vlv: add pll assertion when disabling DPIO common well drm/i915/vlv: move DPIO common reset de-assert into __vlv_set_power_well drm/i915/vlv: re-order power wells so DPIO common comes after TX drm/i915/vlv: move CRI refclk enable into __vlv_set_power_well ...	2014-06-06 19:07:09 +10:00
Dave Airlie	8d4ad9d4bb	Merge commit '9e9a928eed8796a0a1aaed7e0b676db86ba84594' into drm-next Merge drm-fixes into drm-next. Both i915 and radeon need this done for later patches. Conflicts: drivers/gpu/drm/drm_crtc_helper.c drivers/gpu/drm/i915/i915_drv.h drivers/gpu/drm/i915/i915_gem.c drivers/gpu/drm/i915/i915_gem_execbuffer.c drivers/gpu/drm/i915/i915_gem_gtt.c	2014-06-05 20:28:59 +10:00
Shobhit Kumar	3e6bd01178	drm/i915: Detect if MIPI panel based on VBT and initialize only if present It seems by default the VBT has MIPI configuration block as well. The Generic driver will assume always MIPI if MIPI configuration block is found. This is causing probelm when actually there is eDP. Fix this by looking into general definition block which will have device configurations. From here we can figure out what is the LFP type and initialize MIPI only if MIPI is found. v2: Addressed review comments by Damien - Moved PORT definitions to intel_bios.h and renamed as DVO_PORT_MIPIA - renamed is_mipi to has_mipi and moved definition as suggested - Check has_mipi inside parse_mipi and intel_dsi_init insted of outside v3: Make has_mipi as a bitfield as suggested Signed-off-by: Shobhit Kumar <shobhit.kumar@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: fold in conditions to pack everything neatly below 80 chars.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-06-05 08:52:33 +02:00
Chris Wilson	d23db88c3a	drm/i915: Prevent negative relocation deltas from wrapping This is pure evil. Userspace, I'm looking at you SNA, repacks batch buffers on the fly after generation as they are being passed to the kernel for execution. These batches also contain self-referenced relocations as a single buffer encompasses the state commands, kernels, vertices and sampler. During generation the buffers are placed at known offsets within the full batch, and then the relocation deltas (as passed to the kernel) are tweaked as the batch is repacked into a smaller buffer. This means that userspace is passing negative relocations deltas, which subsequently wrap to large values if the batch is at a low address. The GPU hangs when it then tries to use the large value as a base for its address offsets, rather than wrapping back to the real value (as one would hope). As the GPU uses positive offsets from the base, we can treat the relocation address as the minimum address read by the GPU. For the upper bound, we trust that userspace will not read beyond the end of the buffer. So, how do we fix negative relocations from wrapping? We can either check that every relocation looks valid when we write it, and then position each object such that we prevent the offset wraparound, or we just special-case the self-referential behaviour of SNA and force all batches to be above 256k. Daniel prefers the latter approach. This fixes a GPU hang when it tries to use an address (relocation + offset) greater than the GTT size. The issue would occur quite easily with full-ppgtt as each fd gets its own VM space, so low offsets would often be handed out. However, with the rearrangement of the low GTT due to capturing the BIOS framebuffer, it is already affecting kernels 3.15 onwards. I think only IVB+ is susceptible to this bug, but the workaround should only kick in rarely, so it seems sensible to always apply it. v3: Use a bias for batch buffers to prevent small negative delta relocations from wrapping. v4 from Daniel: - s/BIAS/BATCH_OFFSET_BIAS/ - Extract eb_vma_misplaced/i915_vma_misplaced since the conditions were growing rather cumbersome. - Add a comment to eb_get_batch explaining why we do this. - Apply the batch offset bias everywhere but mention that we've only observed it on gen7 gpus. - Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch. v5: Add static to eb_get_batch, spotted by 0-day tester. Testcase: igt/gem_bad_reloc Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3) Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-27 11:18:40 +03:00
Chris Wilson	00731155a7	drm/i915: Fix dynamic allocation of physical handles A single object may be referenced by multiple registers fundamentally breaking the static allotment of ids in the current design. When the object is used the second time, the physical address of the first assignment is relinquished and a second one granted. However, the hardware is still reading (and possibly writing) to the old physical address now returned to the system. Eventually hilarity will ensue, but in the short term, it just means that cursors are broken when using more than one pipe. v2: Fix up leak of pci handle when handling an error during attachment, and avoid a double kmap/kunmap. (Ville) Rebase against -fixes. v3: And fix the error handling added in v2 (Ville) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77351 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: stable@vger.kernel.org Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-27 11:18:39 +03:00
Oscar Mateo	f83d6518a1	drm/i915: Kill private_default_ctx off It's barely alive now anyway, so give it the "coup de grâce". Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-22 23:44:44 +02:00
Oscar Mateo	273497e5cd	drm/i915: s/i915_hw_context/intel_context Up until now, contexts had one (and only one) backing object that was used by the hardware to save/restore render ring contexts (via the MI_SET_CONTEXT command). Other rings did not have or need this, so our i915_hw_context struct had a 1:1 relationship with a a real HW context. With Logical Ring Contexts and Execlists, this is not possible anymore: all rings need a backing object, and it cannot be reused. To prepare for that, rename our contexts to the more generic term intel_context. No functional changes. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-22 23:41:17 +02:00
Oscar Mateo	a4872ba6d0	drm/i915: s/intel_ring_buffer/intel_engine_cs In the upcoming patches we plan to break the correlation between engine command streamers (a.k.a. rings) and ringbuffers, so it makes sense to refactor the code and make the change obvious. No functional changes. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-22 23:01:05 +02:00
Daniel Vetter	bdf1e7e3db	drm/i915: move bsd dispatch index somewhere better Adding stuff at the bottom is really no how this should be done, since that's the place for ums/dri dungeons. This was added in commit `a8ebba75b3` Author: Zhao Yakui <yakui.zhao@intel.com> Date: Thu Apr 17 10:37:40 2014 +0800 drm/i915: Use the coarse ping-pong mechanism based on drm fd to dispatch the BSD command on BDW GT3 Also add a note to prevent this from happening again - people really should be less lazy and take more time to look for a good home of their new driver-global state. Cc: Imre Deak <imre.deak@intel.com> Cc: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-22 15:06:31 +02:00
Ville Syrjälä	4fa62c890c	drm/i915: Move buffer pinning and ring selection to intel_crtc_page_flip() All of the .queue_flip() callbacks duplicate the same code to pin the buffers and calculate the gtt_offset. Move that code to intel_crtc_page_flip(). In order to do that we must also move the ring selection logic there. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-21 09:55:53 +02:00
Ville Syrjälä	5efb3e2838	drm/i915/chv: Add cursor pipe offsets Unsurprisingly the cursor C regiters are also at a weird offset on CHV. Add more pipe offsets to handle them. This also gets rid of most of the differences between the i9xx vs. ivb cursor code. We can unify the remaining code as well, but I'll leave that for another patch. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Antti Koskipää <antti.koskipaa@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-20 15:32:30 +02:00
Chris Wilson	2cfcd32a92	drm/i915: Implement an oom-notifier for last resort shrinking Before the process killer is invoked, oom-notifiers are executed for one last try at recovering pages. We can hook into this callback to be sure that everything that can be is purged from our page lists, and to give a summary of how much memory is still pinned by the GPU in the case of an oom. This should be really valuable for debugging OOM issues. Note that the last-ditch effort call to shrink_all we've previously called from our normal shrinker when we could free as much as the vm demaned is moved into the oom notifier. Since the shrinker accounting races against bind/unbind operations we might have called shrink_all prematurely, which this approach with an oom notifier avoids. References: https://bugs.freedesktop.org/show_bug.cgi?id=72742 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: lu hua <huax.lu@intel.com> [danvet: Bikeshed logical \| into \|\| and pimp commit message.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-20 10:57:13 +02:00
Chris Wilson	ceabbba524	drm/i915: Include bound and active pages in the count of shrinkable objects When the machine is under a lot of memory pressure and being stressed by multiple GPU threads, we quite often report fewer than shrinker->batch (i.e. SHRINK_BATCH) pages to be freed. This causes the shrink_control to skip calling into i915.ko to release pages, despite the GPU holding onto most of the physical pages in its active lists. References: https://bugs.freedesktop.org/show_bug.cgi?id=72742 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Robert Beckett <robert.beckett@intel.com> Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-20 09:46:06 +02:00
Shashank Sharma	b6fdd0f2b9	drm/i915: Add MIPI mmio reg base This patch adds a mmio base address variable for DSI display, to make the DSI code generic, so that, if required, the same code can be re-used for future platforms with different mmio base. Signed-off-by: Shashank Sharma <shashank.sharma@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Appease checkpatch.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-19 17:56:40 +02:00
Daniel Vetter	29b9bde6f4	drm/i915: Make ->update_primary_plane infallible Way back we've used this to reject framebuffers with unsupported pixel formats. But since the modesetting reorg with the compute config stage we reject those much earlier and just BUG() in this callback. So switch to a void return type. Reviewed-by: Akash Goel <akash.goel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-19 15:31:04 +02:00
Chris Wilson	5cc9ed4b9a	drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl By exporting the ability to map user address and inserting PTEs representing their backing pages into the GTT, we can exploit UMA in order to utilize normal application data as a texture source or even as a render target (depending upon the capabilities of the chipset). This has a number of uses, with zero-copy downloads to the GPU and efficient readback making the intermixed streaming of CPU and GPU operations fairly efficient. This ability has many widespread implications from faster rendering of client-side software rasterisers (chromium), mitigation of stalls due to read back (firefox) and to faster pipelining of texture data (such as pixel buffer objects in GL or data blobs in CL). v2: Compile with CONFIG_MMU_NOTIFIER v3: We can sleep while performing invalidate-range, which we can utilise to drop our page references prior to the kernel manipulating the vma (for either discard or cloning) and so protect normal users. v4: Only run the invalidate notifier if the range intercepts the bo. v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers v6: Recheck after reacquire mutex for lost mmu. v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary. v8: Fix rebasing error after forwarding porting the back port. v9: Limit the userptr to page aligned entries. We now expect userspace to handle all the offset-in-page adjustments itself. v10: Prevent vma from being copied across fork to avoid issues with cow. v11: Drop vma behaviour changes -- locking is nigh on impossible. Use a worker to load user pages to avoid lock inversions. v12: Use get_task_mm()/mmput() for correct refcounting of mm. v13: Use a worker to release the mmu_notifier to avoid lock inversion v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer with its own locking and tree of objects for each mm/mmu_notifier. v15: Prevent overlapping userptr objects, and invalidate all objects within the mmu_notifier range v16: Fix a typo for iterating over multiple objects in the range and rearrange error path to destroy the mmu_notifier locklessly. Also close a race between invalidate_range and the get_pages_worker. v17: Close a race between get_pages_worker/invalidate_range and fresh allocations of the same userptr range - and notice that struct_mutex was presumed to be held when during creation it wasn't. v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory for the struct sg_table and to clear it before reporting an error. v19: Always error out on read-only userptr requests as we don't have the hardware infrastructure to support them at the moment. v20: Refuse to implement read-only support until we have the required infrastructure - but reserve the bit in flags for future use. v21: use_mm() is not required for get_user_pages(). It is only meant to be used to fix up the kernel thread's current->mm for use with copy_user(). v22: Use sg_alloc_table_from_pages for that chunky feeling v23: Export a function for sanity checking dma-buf rather than encode userptr details elsewhere, and clean up comments based on suggestions by Bradley. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com> Cc: Akash Goel <akash.goel@intel.com> Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> [danvet: Frob ioctl allocation to pick the next one - will cause a bit of fuss with create2 apparently, but such are the rules.] [danvet2: oops, forgot to git add after manual patch application] [danvet3: Appease sparse.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-16 19:31:29 +02:00
Mika Kuoppala	9d0a6fa6c5	drm/i915: add render state initialization HW guys say that it is not a cool idea to let device go into rc6 without proper 3d pipeline state. For each new uninitialized context, generate a valid null render state to be run on context creation. This patch introduces a skeleton with empty states. v2: - No need to vmap (Chris Wilson) - use .c files for state (Daniel Vetter) - no need to flush as i915_add_request does it - remove parameter for batch alloc size - don't wait for the init (Ben Widawsky) v3: - move to cpu/gpu (Chris Wilson) Tested-by: Kristen Carlson Accardi <kristen@linux.intel.com> (v1) Tested-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-14 19:16:13 +02:00
Damien Lespiau	d79b814d2f	drm/i915: Introduce a for_each_crtc() macro Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-14 00:38:38 +02:00
Damien Lespiau	d063ae48c4	drm/i915: Introduce a for_each_intel_crtc() macro Fed up with having that long list_for_each_entry() invocation? Use for_each_intel_crtc()! Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-14 00:38:18 +02:00
Daniel Vetter	d8ffa60b52	drm/i915: WARN_ON fence pin leaks The fence pin count should always be <= the bo pin count. If that's not the case then we have a funny problem and are leaking references somewhere. Which means we can catch fence pin leaks by checking for the same upper limit as we do for the bo pin count. Inspired by a discussion with Ville about a fence leak igt testcase. v2: Also check for fence->pin_count <= ggtt_vma->pin_count, since that might catch a leak even quicker. Also de-inline them, they're getting too big. v3: Don't separately check for MAX_PIN_COUNT since the > vma->pin_count check will catch that already (Chris). Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-13 17:16:12 +02:00
Chon Ming Lee	a09cadddde	drm/i915/chv: Add DPIO offset for Cherryview. v3 CHV has 2 display phys. First phy (IOSF offset 0x1A) has two channels, and second phy (IOSF offset 0x12) has single channel. The first phy is used for port B and port C, while second phy is only for port D. v2: Move the pipe to determine which phy to select for vlv_dpio_read/vlv_dpio_write to another patch. (Daniel) v3: Rebase the code based on rework on how to calculate DPIO offset. Signed-off-by: Chon Ming Lee <chon.ming.lee@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-12 19:50:11 +02:00
Brad Volkin	44e895a8a2	drm/i915: Use hash tables for the command parser For clients that submit large batch buffers the command parser has a substantial impact on performance. On my HSW ULT system performance drops as much as ~20% on some tests. Most of the time is spent in the command lookup code. Converting that from the current naive search to a hash table lookup reduces the performance drop to ~10%. The choice of value for I915_CMD_HASH_ORDER allows all commands currently used in the parser tables to hash to their own bucket (except for one collision on the render ring). The tradeoff is that it wastes memory. Because the opcodes for the commands in the tables are not particularly well distributed, reducing the order still leaves many buckets empty. The increased collisions don't seem to have a huge impact on the performance gain, but for now anyhow, the parser trades memory for performance. NB: Ville noticed that the error paths through the ring init code will leak memory. I've not addressed that here. We can do a follow up pass to handle all of the leaks. v2: improved comment describing selection of hash key mask (Damien) replace a BUG_ON() with an error return (Tvrtko, Ville) commit message improvements Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-05-12 19:15:51 +02:00

1 2 3 4 5 ...

1122 Commits