linux

Author	SHA1	Message	Date
Ashutosh Dixit	6352219c39	drm/i915/perf: Do not clear pollin for small user read buffers It is wrong to block the user thread in the next poll when OA data is already available which could not fit in the user buffer provided in the previous read. In several cases the exact user buffer size is not known. Blocking user space in poll can lead to data loss when the buffer size used is smaller than the available data. This change fixes this issue and allows user space to read all OA data even when using a buffer size smaller than the available data using multiple non-blocking reads rather than staying blocked in poll till the next timer interrupt. v2: Fix ret value for blocking reads (Umesh) v3: Mistake during patch send (Ashutosh) v4: Remove -EAGAIN from comment (Umesh) v5: Improve condition for clearing pollin and return (Lionel) v6: Improve blocking read loop and other cleanups (Lionel) v7: Added Cc stable Testcase: igt/perf/polling-small-buf Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200403010120.3067-1-ashutosh.dixit@intel.com	2020-04-03 18:55:02 +01:00
Lionel Landwerlin	d16e137e7f	drm/i915/perf: don't read head/tail pointers outside critical section Reading or writing those fields should only happen under stream->oa_buffer.ptr_lock. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `d1df41eb72` ("drm/i915/perf: rework aging tail workaround") Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200330091411.37357-1-lionel.g.landwerlin@intel.com	2020-03-31 11:47:19 +03:00
Chris Wilson	d7d50f801d	drm/i915/perf: Schedule oa_config after modifying the contexts We wish that the scheduler emit the context modification commands prior to enabling the oa_config, for which we must explicitly inform it of the ordering constraints. This is especially important as we now wait for the final oa_config setup to be completed and as this wait may be on a distinct context to the state modifications, we need that command packet to be always last in the queue. We borrow the i915_active for its ability to track multiple timelines and the last dma_fence on each; a flexible dma_resv. Keeping track of each dma_fence is important for us so that we can efficiently schedule the requests and reprioritise as required. Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200327112212.16046-3-chris@chris-wilson.co.uk	2020-03-30 18:20:34 +01:00
Lionel Landwerlin	4ef10fe05b	drm/i915/perf: add new open param to configure polling of OA buffer This new parameter let's the application choose how often the OA buffer should be checked on the CPU side for data availability. Longer polling period tend to reduce CPU overhead if the application does not care about somewhat real time data collection. v2: Allow disabling polling completely with 0 value (Lionel) v3: Version the new parameter (Joonas) v4: Rebase (Umesh) v5: Make poll delay value of 0 invalid (Umesh) v6: - Describe poll_oa_period (Ashutosh) - Fix comment for new poll parameter (Lionel) - Drop open_flags in read_properties_unlocked (Lionel) - Rename uapi parameter (Ashutosh) v7: Reword the comment in uapi (Ashutosh) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200324185457.14635-4-umesh.nerlige.ramappa@intel.com	2020-03-27 13:10:05 +02:00
Lionel Landwerlin	c51dbc6e8f	drm/i915/perf: move pollin setup to non hw specific code This isn't really gen specific stuff, so just move it to the common code. v2: Rebase (Umesh) v3: Remove comment, pollin is a per stream state already (Ashutosh) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200324185457.14635-3-umesh.nerlige.ramappa@intel.com	2020-03-27 13:10:04 +02:00
Lionel Landwerlin	d1df41eb72	drm/i915/perf: rework aging tail workaround We're about to introduce an options to open the perf stream, giving the user ability to configure how often it wants the kernel to poll the OA registers for available data. Right now the workaround against the OA tail pointer race condition requires at least twice the internal kernel polling timer to make any data available. This changes introduce checks on the OA data written into the circular buffer to make as much data as possible available on the first iteration of the polling timer. v2: Use OA_TAKEN macro without the gtt_offset (Lionel) v3: (Umesh) - Rebase - Change report to report32 from below review https://patchwork.freedesktop.org/patch/330704/?series=66697&rev=1 v4: (Ashutosh, Lionel) - Fix checkpatch errors - Fix aging_timestamp initialization - Check for only one valid landed report - Fix check for unlanded report v5: (Ashutosh) - Fix bug in accurately determining landed report. - Optimize the check for landed reports by going as far as the previously determined aged tail. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200324185457.14635-2-umesh.nerlige.ramappa@intel.com	2020-03-27 13:10:03 +02:00
Umesh Nerlige Ramappa	c06aa1b438	drm/i915/perf: Invalidate OA TLB on when closing perf stream On running several back to back perf capture sessions involving closing and opening the perf stream, invalid OA reports are seen in the beginning of the OA buffer in some sessions. Fix this by invalidating OA TLB when the perf stream is closed or disabled on gen12. Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `00a7f0d715` ("drm/i915/tgl: Add perf support on TGL") Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200309211057.38575-1-umesh.nerlige.ramappa@intel.com (cherry picked from commit `a639b0c150`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2020-03-20 07:04:44 -07:00
Umesh Nerlige Ramappa	a639b0c150	drm/i915/perf: Invalidate OA TLB on when closing perf stream On running several back to back perf capture sessions involving closing and opening the perf stream, invalid OA reports are seen in the beginning of the OA buffer in some sessions. Fix this by invalidating OA TLB when the perf stream is closed or disabled on gen12. Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `00a7f0d715` ("drm/i915/tgl: Add perf support on TGL") Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200309211057.38575-1-umesh.nerlige.ramappa@intel.com	2020-03-18 00:30:52 +02:00
Lionel Landwerlin	11ecbdddf2	drm/i915/perf: introduce global sseu pinning On Gen11 powergating half the execution units is a functional requirement when using the VME samplers. Not fullfilling this requirement can lead to hangs. This unfortunately plays fairly poorly with the NOA requirements. NOA requires a stable power configuration to maintain its configuration. As a result using OA (and NOA feeding into it) so far has required us to use a power configuration that can work for all contexts. The only power configuration fullfilling this is powergating half the execution units. This makes performance analysis for 3D workloads somewhat pointless. Failing to find a solution that would work for everybody, this change introduces a new i915-perf stream open parameter that punts the decision off to userspace. If this parameter is omitted, the existing Gen11 behavior remains (half EU array powergating). This change takes the initiative to move all perf related sseu configuration into i915_perf.c v2: Make parameter priviliged if different from default v3: Fix context modifying its sseu config while i915-perf is enabled v4: Always consider global sseu a privileged operation (Tvrtko) Override req_sseu point in intel_sseu_make_rpcs() (Tvrtko) Remove unrelated changes (Tvrtko) v5: Some typos (Tvrtko) Process sseu param in read_properties_unlocked() (Tvrtko) v6: Actually commit the bits from v5... Fixup some checkpath warnings v7: Only compare engine uabi field (Chris) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200317132222.2638719-3-lionel.g.landwerlin@intel.com	2020-03-17 15:27:55 +02:00
Lionel Landwerlin	371aba6e26	drm/i915/perf: remove redundant power configuration register override The caller of i915_oa_init_reg_state() already sets this. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200317132222.2638719-2-lionel.g.landwerlin@intel.com	2020-03-17 15:27:54 +02:00
Lionel Landwerlin	9aba9c188d	drm/i915/perf: remove generated code A little bit of history : Back when i915-perf was introduced (4.13), there was no way to dynamically add new OA configurations to i915. Only the generated configs baked in at build time were allowed. It quickly became obvious that we would need to allow applications to upload their own configurations, for instance to be able to test new ones, and so by the next stable version (4.14) we added uAPIs to allow uploading new configurations. When adding that capability, we took the opportunity to remove most HW configurations except the TestOa one which is a configuration IGT would rely on to verify that the HW is outputting correct values. At the time it made sense to have that confiuration in at the same time a given HW platform added to the i915-perf driver. Now that IGT has become the reference point for HW configurations (see commit 53f8f541ca ("lib: Add i915_perf library"), previously this was located in the GPUTop repository), the need for having those configurations in i915-perf is gone. On the Mesa side, we haven't relied on this test configuration for a while. The MDAPI library always required 4.14 feature level and always loaded its configuration into i915. I'm sure nobody will miss this generated stuff in i915 :) v2: Fix selftests by creating an empty config v3: Fix unlocking on allocation error (Dan Carpenter) v4: Fixup checkpatch warnings v5: Fix incorrect unlock in error path (Umesh) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200317132222.2638719-1-lionel.g.landwerlin@intel.com	2020-03-17 15:27:50 +02:00
Chris Wilson	4b4e973d5e	drm/i915/perf: Reintroduce wait on OA configuration completion We still need to wait for the initial OA configuration to happen before we enable OA report writes to the OA buffer. Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `15d0ace1f8` ("drm/i915/perf: execute OA configuration from command stream") Closes: https://gitlab.freedesktop.org/drm/intel/issues/1356 Testcase: igt/perf/stream-open-close Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200302085812.4172450-7-chris@chris-wilson.co.uk	2020-03-02 20:34:18 +00:00
Chris Wilson	d236e2ac53	drm/i915/perf: Manually acquire engine-wakeref around use of kernel_context The engine->kernel_context is a special case for request emission. Since it is used as the barrier within the engine's wakeref, we must acquire the wakeref before submitting a request to the kernel_context. Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-3-chris@chris-wilson.co.uk	2020-02-28 14:05:34 +00:00
Chris Wilson	a5af081d01	drm/i915/perf: Mark up the racy use of perf->exclusive_stream Inside the general i915_oa_init_reg_state() we avoid using the perf->mutex. However, we rely on perf->exclusive_stream being valid to access at that point, and for that we have to control the race with disabling perf. This relies on the disabling being a heavy barrier that inspects all active contexts, after marking the perf->exclusive_stream as not available. This should ensure that there are no more concurrent accesses to the perf->exclusive_stream as we destroy it. Mark up the races around the perf->exclusive_stream so that they stand out much more. (And hopefully we will be running kcsan to start validating that the only races we have are carefully controlled.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-2-chris@chris-wilson.co.uk	2020-02-28 14:05:33 +00:00
Wambui Karuga	0bf857358f	drm/i915/perf: conversion to struct drm_device based logging macros. Manual conversion of instances of printk based drm logging macros to the struct drm_device based logging macros in i915/i915_perf.c. Also involves extraction of the struct drm_i915_private device from various intel types for use in the macros. Instances of the DRM_DEBUG printk macro were not converted due to the lack of an analogous struct drm_device based logging macro. v2: remove instances of DRM_DEBUG that were converted. References: https://lists.freedesktop.org/archives/dri-devel/2020-January/253381.html Signed-off-by: Wambui Karuga <wambui.karugax@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200218173936.19664-1-wambui.karugax@gmail.com	2020-02-21 11:20:42 +02:00
Wambui Karuga	00376ccfb2	drm/i915: conversion to drm_device logging macros when drm_i915_private is present. Converts various instances of the printk drm logging macros to the struct drm_device based logging macros in the drm/i915 folder using the following coccinelle script that transforms based on the existence of the struct drm_i915_private device pointer: @@ identifier fn, T; @@ fn(...) { ... struct drm_i915_private T = ...; <+... ( -DRM_INFO( +drm_info(&T->drm, ...) \| -DRM_ERROR( +drm_err(&T->drm, ...) \| -DRM_WARN( +drm_warn(&T->drm, ...) \| -DRM_DEBUG_KMS( +drm_dbg_kms(&T->drm, ...) \| -DRM_DEBUG_DRIVER( +drm_dbg(&T->drm, ...) \| -DRM_DEBUG_ATOMIC( +drm_dbg_atomic(&T->drm, ...) ) ...+> } @@ identifier fn, T; @@ fn(...,struct drm_i915_private T,...) { <+... ( -DRM_INFO( +drm_info(&T->drm, ...) \| -DRM_ERROR( +drm_err(&T->drm, ...) \| -DRM_WARN( +drm_warn(&T->drm, ...) \| -DRM_DEBUG_DRIVER( +drm_dbg(&T->drm, ...) \| -DRM_DEBUG_KMS( +drm_dbg_kms(&T->drm, ...) \| -DRM_DEBUG_ATOMIC( +drm_dbg_atomic(&T->drm, ...) ) ...+> } Checkpatch warnings were fixed manually. Instances of the DRM_DEBUG macro were not converted due to lack of a consensus of an analogous struct drm_device based macro. References: https://lists.freedesktop.org/archives/dri-devel/2020-January/253381.html Signed-off-by: Wambui Karuga <wambui.karugax@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200131093416.28431-2-wambui.karugax@gmail.com	2020-02-04 11:29:27 +02:00
Umesh Nerlige Ramappa	6f280b133d	drm/i915/perf: Fix OA context id overlap with idle context id Engine context pinned in perf OA was set to same context id as the idle context. Set the context id to an unused value. Clear the sw context id field in lrc descriptor before ORing with ce->tag (Chris) Closes: https://gitlab.freedesktop.org/drm/intel/issues/756 Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200124013701.40609-1-umesh.nerlige.ramappa@intel.com	2020-01-27 21:11:59 +00:00
Pankaj Bharadiya	a9f236d1fc	drm/i915: Make WARN* drm specific where uncore or stream ptr is available drm specific WARN* calls include device information in the backtrace, so we know what device the warnings originate from. Covert all the calls of WARN* with device specific drm_WARN* variants in functions where intel_uncore/i915_perf_stream struct pointer is readily available. The conversion was done automatically with below coccinelle semantic patch. checkpatch errors/warnings are fixed manually. @@ identifier func, T; @@ func(...) { ... struct intel_uncore T = ...; <... ( -WARN( +drm_WARN(&T->i915->drm, ...) \| -WARN_ON( +drm_WARN_ON(&T->i915->drm, ...) \| -WARN_ONCE( +drm_WARN_ONCE(&T->i915->drm, ...) \| -WARN_ON_ONCE( +drm_WARN_ON_ONCE(&T->i915->drm, ...) ) ...> } @@ identifier func, T; @@ func(struct intel_uncore T,...) { <... ( -WARN( +drm_WARN(&T->i915->drm, ...) \| -WARN_ON( +drm_WARN_ON(&T->i915->drm, ...) \| -WARN_ONCE( +drm_WARN_ONCE(&T->i915->drm, ...) \| -WARN_ON_ONCE( +drm_WARN_ON_ONCE(&T->i915->drm, ...) ) ...> } @@ identifier func, T; @@ func(struct i915_perf_stream T,...) { +struct drm_i915_private i915 = T->perf->i915; <+... ( -WARN( +drm_WARN(&i915->drm, ...) \| -WARN_ON( +drm_WARN_ON(&i915->drm, ...) \| -WARN_ONCE( +drm_WARN_ONCE(&i915->drm, ...) \| -WARN_ON_ONCE( +drm_WARN_ON_ONCE(&i915->drm, ...) ) ...+> } command: ls drivers/gpu/drm/i915/*.c \| xargs spatch --sp-file <script> \ --linux-spacing --in-place Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200115034455.17658-11-pankaj.laxminarayan.bharadiya@intel.com	2020-01-22 17:57:39 +02:00
Chris Wilson	feed5c7be2	drm/i915: Pin the context as we work on it Since we now allow the intel_context_unpin() to run unserialised, we risk our operations under the intel_context_lock_pinned() being run as the context is unpinned (and thus invalidating our state). We can atomically acquire the pin, testing to see if it is pinned in the process, thus ensuring that the state remains consistent during the course of the whole operation. Fixes: `8413502238` ("drm/i915/gt: Drop mutex serialisation between context pin/unpin") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200109085142.871563-1-chris@chris-wilson.co.uk	2020-01-09 12:50:26 +00:00
Chris Wilson	e6ba764802	drm/i915: Remove i915->kernel_context Allocate only an internal intel_context for the kernel_context, forgoing a global GEM context for internal use as we only require a separate address space (for our own protection). Now having weaned GT from requiring ce->gem_context, we can stop referencing it entirely. This also means we no longer have to create random and unnecessary GEM contexts for internal use. GEM contexts are now entirely for tracking GEM clients, and intel_context the execution environment on the GPU. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Acked-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191221160324.1073045-1-chris@chris-wilson.co.uk	2019-12-21 16:37:10 +00:00
Chris Wilson	9f3ccd40ac	drm/i915: Drop GEM context as a direct link from i915_request Keep the intel_context as being the primary state for i915_request, with the GEM context a backpointer from the low level state for the rarer cases we need client information. Our goal is to remove such references to clients from the backend, and leave the HW submission agnostic to client interfaces and self-contained. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191220101230.256839-1-chris@chris-wilson.co.uk	2019-12-20 10:52:21 +00:00
Venkata Sandeep Dhanalakota	3dc716fd3c	drm/i915/perf: Register sysctl path globally We do not require to register the sysctl paths per instance, so making registration global. v2: make sysctl path register and unregister function driver specific (Tvrtko and Lucas). Cc: Sudeep Dutt <sudeep.dutt@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191213155152.69182-1-venkata.s.dhanalakota@intel.com	2019-12-13 20:16:23 +00:00
Umesh Nerlige Ramappa	ccdeed4970	drm/i915/perf: Configure OAR for specific context Gen12 supports saving/restoring render counters per context. Apply OAR configuration only for the context that is passed in to perf. v2: - Fix OACTXCONTROL value to only stop/resume counters. - Remove gen12_update_reg_state_unlocked as power state is already applied by the caller. v3: (Lionel) - Move register initialization into the array - Assume a valid oa_config in enable_metric_set Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Fixes: `00a7f0d715` ("drm/i915/tgl: Add perf support on TGL") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191206194339.31356-2-umesh.nerlige.ramappa@intel.com	2019-12-09 15:38:17 +02:00
Umesh Nerlige Ramappa	322d56aa31	drm/i915/perf: Allow non-privileged access when OA buffer is not sampled SAMPLE_OA_REPORT enables sampling of OA reports from the OA buffer. Since reports from OA buffer had system wide visibility, collecting samples from the OA buffer was a privileged operation on previous platforms. Prior to TGL, it was also necessary to sample the OA buffer to normalize reports from MI REPORT PERF COUNT. TGL has a dedicated OAR unit to sample perf reports for a specific render context. This removes the necessity to sample OA buffer. - If not sampling the OA buffer, allow non-privileged access. An earlier patch allows the non-privilege access: https://patchwork.freedesktop.org/patch/337716/?series=68582&rev=1 - Clear up the path for non-privileged access in this patch Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Fixes: `00a7f0d715` ("drm/i915/tgl: Add perf support on TGL") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191206194339.31356-1-umesh.nerlige.ramappa@intel.com	2019-12-09 15:38:16 +02:00
Mao Wenan	c415ef2a26	drm/i915/perf: drop pointless static qualifier in i915_perf_add_config_ioctl() There is no need to have the 'T *v' variable static since new value always be assigned before use it. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191204010154.152396-1-maowenan@huawei.com	2019-12-04 15:11:44 +00:00
Chris Wilson	de5825beae	drm/i915: Serialise with engine-pm around requests on the kernel_context As the engine->kernel_context is used within the engine-pm barrier, we have to be careful when emitting requests outside of the barrier, as the strict timeline locking rules do not apply. Instead, we must ensure the engine_park() cannot be entered as we build the request, which is simplest by taking an explicit engine-pm wakeref around the request construction. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-1-chris@chris-wilson.co.uk	2019-11-25 13:17:18 +00:00
Lionel Landwerlin	dd590f6800	drm/i915/perf: Add preemption check while waiting for OA While we're waiting for the OA configuration to apply, let's give a chance to other contexts that might need to run other workloads. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191114140224.21818-1-lionel.g.landwerlin@intel.com	2019-11-15 16:43:33 +00:00
Lionel Landwerlin	93937659dc	drm/i915/perf: don't forget noa wait after oa config I'm observing incoherence metric values, changing from run to run. It appears the patches introducing noa wait & reconfiguration from command stream switched places in the series multiple times during the review. This lead to the dependency of one onto the order to go missing... Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `15d0ace1f8` ("drm/i915/perf: execute OA configuration from command stream") Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191113154639.27144-1-lionel.g.landwerlin@intel.com	2019-11-14 07:20:56 +00:00
Lionel Landwerlin	0b0120d4c7	drm/i915/perf: always consider holding preemption a privileged op The ordering of the checks in the existing code can lead to holding preemption not being considered as privileged op. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `9cd20ef780` ("drm/i915/perf: allow holding preemption on filtered ctx") Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191111095308.2550-1-lionel.g.landwerlin@intel.com	2019-11-11 12:14:01 +02:00
Chris Wilson	9278bbb6e4	drm/i915/perf: Reverse a ternary to make sparse happy Avoid drivers/gpu/drm/i915/i915_perf.c:2442:85: warning: dubious: x \| !y simply by inverting the predicate and reversing the ternary. v2: Move the long lines into their own function so there is no confusion on operator precedence. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191101192116.12647-1-chris@chris-wilson.co.uk	2019-11-01 19:45:56 +00:00
Lionel Landwerlin	00a7f0d715	drm/i915/tgl: Add perf support on TGL The design of the OA unit has been split into several units. We now have a global unit (OAG) and a render specific unit (OAR). This leads to some changes on how we program things. Some details : OAR: - has its own set of counter registers, they are per-context saved/restored - counters are not written to the circular OA buffer - a snapshot of the counters can be acquired with MI_RECORD_PERF_COUNT, or a single counter can be read with MI_STORE_REGISTER_MEM. OAG: - has global counters that increment across context switches - counters are written into the circular OA buffer (if requested) v2: Fix checkpatch warnings on code style (Lucas) v3: (Umesh) - Update register from which tail, status and head are read - Update logic to sample context reports - Update whitelist mux and b counter regs v4: Fix a bug when updating context image for new contexts (Umesh) v5: Squash patch enabling save/restore of counters into context image We want this so we can preempt performance queries and keep the system responsive even when long running queries are ongoing. We avoid doing it for all contexts. - use LRI to modify context control (Chris) - use MASKED_FIELD to program just the masked bits (Chris) - disable save/restore of counters on cleanup (Chris) v6: Do not use implicit parameters (Chris) BSpec: 28727, 30021 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Chris Wilson <chris.p.wilson@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191025193746.47155-2-umesh.nerlige.ramappa@intel.com	2019-10-29 12:53:54 +02:00
Umesh Nerlige Ramappa	fc21523041	drm/i915/perf: Add helper macros for comparing with whitelisted registers Add helper macros for range and equality comparisons and use them to check with whitelisted registers in oa configurations. Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191025193746.47155-1-umesh.nerlige.ramappa@intel.com	2019-10-29 12:53:54 +02:00
Michal Wajdeczko	19c17b763f	drm/i915/execlists: Use vfunc to check engine submission mode While processing CSB there is no need to look at GuC submission settings, just check if engine is configured for execlists mode. While today GuC submission is disabled it's settings are still based on modparam values that might not correctly reflect actual submission status in case of any fallback. Until that is fully fixed, use alternate method to confirm that engine really runs in execlists mode by comparing set_default_submission vfunc. v2: add other immediate use of new helper Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191028164520.31772-1-michal.wajdeczko@intel.com	2019-10-28 20:40:07 +00:00
Chris Wilson	2871ea85c1	drm/i915/gt: Split intel_ring_submission Split the legacy submission backend from the common CS ring buffer handling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191024100344.5041-1-chris@chris-wilson.co.uk	2019-10-24 12:14:21 +01:00
Chris Wilson	0587152bf9	drm/i915: Drop assertion that ce->pin_mutex guards state updates The actual conditions are that we know the GPU is not accessing the context, and we hold a pin on the context image to allow CPU access. We used a fake lock on ce->pin_mutex so that we could try and use lockdep to assert that access is serialised, but the various different hardirq/softirq contexts where we need to fake holding the pin_mutex are causing more trouble. Still it would be nice if we did have a way to reassure ourselves that the direct update to the context image is serialised with GPU execution. In the meantime, stop lockdep complaining about false irq inversions. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111923 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191022122845.25038-1-chris@chris-wilson.co.uk	2019-10-22 13:32:01 +01:00
Lionel Landwerlin	8814c6d01f	drm/i915/perf: fix oa config reconfiguration The current logic just reapplies the same configuration already stored into stream->oa_config instead of the newly selected one. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `7831e9a965` ("drm/i915/perf: Allow dynamic reconfiguration of the OA stream") Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191019214647.27866-1-lionel.g.landwerlin@intel.com	2019-10-20 10:01:11 +01:00
Lionel Landwerlin	9cd20ef780	drm/i915/perf: allow holding preemption on filtered ctx We would like to make use of perf in Vulkan. The Vulkan API is much lower level than OpenGL, with applications directly exposed to the concept of command buffers (pretty much equivalent to our batch buffers). In Vulkan, queries are always limited in scope to a command buffer. In OpenGL, the lack of command buffer concept meant that queries' duration could span multiple command buffers. With that restriction gone in Vulkan, we would like to simplify measuring performance just by measuring the deltas between the counter snapshots written by 2 MI_RECORD_PERF_COUNT commands, rather than the more complex scheme we currently have in the GL driver, using 2 MI_RECORD_PERF_COUNT commands and doing some post processing on the stream of OA reports, coming from the global OA buffer, to remove any unrelated deltas in between the 2 MI_RECORD_PERF_COUNT. Disabling preemption only apply to a single context with which want to query performance counters for and is considered a privileged operation, by default protected by CAP_SYS_ADMIN. It is possible to enable it for a normal user by disabling the paranoid stream setting. v2: Store preemption setting in intel_context (Chris) v3: Use priorities to avoid preemption rather than the HW mechanism v4: Just modify the port priority reporting function v5: Add nopreempt flag on gem context and always flag requests appropriately, regarless of OA reconfiguration. Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/932 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191014201404.22468-4-chris@chris-wilson.co.uk	2019-10-14 21:30:28 +01:00
Chris Wilson	7831e9a965	drm/i915/perf: Allow dynamic reconfiguration of the OA stream Introduce a new perf_ioctl command to change the OA configuration of the active stream. This allows the OA stream to be reconfigured between batch buffers, giving greater flexibility in sampling. We inject a request into the OA context to reconfigure the stream asynchronously on the GPU in between and ordered with execbuffer calls. Original patch for dynamic reconfiguration by Lionel Landwerlin. Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/932 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191014201404.22468-3-chris@chris-wilson.co.uk	2019-10-14 21:30:27 +01:00
Lionel Landwerlin	4f6ccc74a8	drm/i915: add support for perf configuration queries Listing configurations at the moment is supported only through sysfs. This might cause issues for applications wanting to list configurations from a container where sysfs isn't available. This change adds a way to query the number of configurations and their content through the i915 query uAPI. v2: Fix sparse warnings (Lionel) Add support to query configuration using uuid (Lionel) v3: Fix some inconsistency in uapi header (Lionel) Fix unlocking when not locked issue (Lionel) Add debug messages (Lionel) v4: Fix missing unlock (Dan) v5: Drop lock when copying config content to userspace (Chris) v6: Drop lock when copying config list to userspace (Chris) Fix deadlock when calling i915_perf_get_oa_config() under perf.metrics_lock (Lionel) Add i915_oa_config_get() (Chris) Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/932 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191014201404.22468-2-chris@chris-wilson.co.uk	2019-10-14 21:30:26 +01:00
Lionel Landwerlin	b8d49f28aa	drm/i915/perf: introduce a versioning of the i915-perf uapi Reporting this version will help application figure out what level of the support the running kernel provides. v2: Add i915_perf_ioctl_version() (Chris) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191014201404.22468-1-chris@chris-wilson.co.uk	2019-10-14 21:30:25 +01:00
Chris Wilson	c2fba936d3	drm/i915/perf: Avoid polluting the i915_oa_config with error pointers Use a local variable to track the allocation errors to avoid polluting the struct and keep the free simple. Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191013095211.2922-1-chris@chris-wilson.co.uk	2019-10-13 13:17:19 +01:00
Chris Wilson	5f5c382ecf	drm/i915/perf: Prefer using the pinned_ctx for emitting delays on config When we are watching a particular context, we want the OA config to be applied inline with that context such that it takes effect before the next submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191012091056.28686-1-chris@chris-wilson.co.uk	2019-10-12 17:04:08 +01:00
Lionel Landwerlin	15d0ace1f8	drm/i915/perf: execute OA configuration from command stream We haven't run into issues with programming the global OA/NOA registers configuration from CPU so far, but HW engineers actually recommend doing this from the command streamer. On TGL in particular one of the clock domain in which some of that programming goes might not be powered when we poke things from the CPU. Since we have a command buffer prepared for the execbuffer side of things, we can reuse that approach here too. This also allows us to significantly reduce the amount of time we hold the main lock. v2: Drop the global lock as much as possible v3: Take global lock to pin global v4: Create i915 request in emit_oa_config() to avoid deadlocks (Lionel) v5: Move locking to the stream (Lionel) v6: Move active reconfiguration request into i915_perf_stream (Lionel) v7: Pin VMA outside request creation (Chris) Lock VMA before move to active (Chris) v8: Fix double free on stream->initial_oa_config_bo (Lionel) Don't allow interruption when waiting on active config request (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191012072308.30312-3-chris@chris-wilson.co.uk	2019-10-12 09:08:40 +01:00
Lionel Landwerlin	daed3e4439	drm/i915/perf: implement active wait for noa configurations NOA configuration take some amount of time to apply. That amount of time depends on the size of the GT. There is no documented time for this. For example, past experimentations with powergating configuration changes seem to indicate a 60~70us delay. We go with 500us as default for now which should be over the required amount of time (according to HW architects). v2: Don't forget to save/restore registers used for the wait (Chris) v3: Name used CS_GPR registers (Chris) Fix compile issue due to rebase (Lionel) v4: Fix save/restore helpers (Umesh) v5: Move noa_wait from drm_i915_private to i915_perf_stream (Lionel) v6: Add missing struct declarations in i915_perf.h Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191012072308.30312-2-chris@chris-wilson.co.uk	2019-10-12 09:08:33 +01:00
Lionel Landwerlin	6a45008ab7	drm/i915/perf: allow for CS OA configs to be created lazily Here we introduce a mechanism by which the execbuf part of the i915 driver will be able to request that a batch buffer containing the programming for a particular OA config be created. We'll execute these OA configuration buffers right before executing a set of userspace commands so that a particular user batchbuffer be executed with a given OA configuration. This mechanism essentially allows the userspace driver to go through several OA configuration without having to open/close the i915/perf stream. v2: No need for locking on object OA config object creation (Chris) Flush cpu mapping of OA config (Chris) v3: Properly deal with the perf_metric lock (Chris/Lionel) v4: Fix oa config unref/put when not found (Lionel) v5: Allocate BOs for configurations on the stream instead of globally (Lionel) v6: Fix 64bit division (Chris) v7: Store allocated config BOs into the stream (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191012072308.30312-1-chris@chris-wilson.co.uk	2019-10-12 09:08:27 +01:00
Chris Wilson	a5efcde69b	drm/i915/perf: Replace global wakeref tracking with engine-pm As we now have a specific engine to use OA on, exchange the top-level runtime-pm wakeref with the engine-pm. This still results in the same top-level runtime-pm, but with more nuances to keep the engine and its gt awake. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191011190325.10979-1-chris@chris-wilson.co.uk	2019-10-12 07:53:20 +01:00
Chris Wilson	52111c4628	drm/i915/perf: Store shortcut to intel_uncore Now that we have the engine stored in i915_perf, we have a means of accessing intel_gt should we require it. However, we are currently only using the intel_gt to find the right intel_uncore, so replace our i915_perf.gt pointer with the more useful i915_perf.uncore. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191010150520.26488-2-chris@chris-wilson.co.uk	2019-10-10 18:44:24 +01:00
Lionel Landwerlin	9a61363a63	drm/i915/perf: store the associated engine of a stream We'll use this information later to verify that a client trying to reconfigure the stream does so on the right engine. For now, we want to pull the knowledge of which engine we use into a central property. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191010150520.26488-1-chris@chris-wilson.co.uk	2019-10-10 18:44:13 +01:00
Lionel Landwerlin	23b9e41a3d	drm/i915/perf: drop list of streams At some point in time there was the idea that we could have multiple stream from the same piece of HW but that never materialized and given the hard time we already have making everything work with the submission side, there is no real point having this list of 1 element around. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191008140111.5437-1-chris@chris-wilson.co.uk	2019-10-08 16:22:19 +01:00
Chris Wilson	a4c969d107	drm/i915/perf: Set the exclusive stream under perf->lock The BKL struct_mutex is no more, the only serialisation we required for setting the exclusive stream is already managed by ce->pin_mutex in gen8_configure_all_contexts(). As such, we can manipulate i915_perf.exclusive_stream underneath our own (already held) perf->lock. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191007140812.10963-2-chris@chris-wilson.co.uk Link: https://patchwork.freedesktop.org/patch/msgid/20191007210942.18145-2-chris@chris-wilson.co.uk	2019-10-08 07:52:36 +01:00

1 2 3 4 5

235 Commits