linux

Author	SHA1	Message	Date
Dave Airlie	c18c889111	Merge tag 'drm-misc-next-2021-11-18' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 5.17: UAPI Changes: * Remove restrictions on DMA_BUF_SET_NAME ioctl * connector: State of privacy screen * sysfs: Send hotplug uevent Cross-subsystem Changes: * clk/bmc-2835: Fixes * dma-buf: Add dma_resv selftest; Error-handling fixes; Add debugfs helpers; Remove dma_resv_get_excl_unlocked(); Documentation fixes * pwm: Introduce of_pwm_single_xlate() Core Changes: * Support for privacy screens * Make drm_irq.c legacy * Fix __stack_depot_* name conflict * Documentation fixes * Fixes and cleanups * dp-helper: Reuse 8b/10b link-training delay helpers * format-helper: Update interfaces * fb-helper: Allocate shadow buffer of correct size * gem: Link GEM SHMEM and CMA helpers into separate modules; Use dma_resv iterator; Import DMA_BUF namespace into GEM-helper modules * gem/shmem-helper: Interface cleanups * scheduler: Grab fence in drm_sched_job_add_implicit_dependencies(); Lockdep fixes * kms-helpers: Link several files from core into the KMS-helper module Driver Changes: * Use dma_resv_iter in several places * Fixes and cleanups * amdgpu: Use drm_kms_helper_connector_hotplug_event(); Get all fences at once * bridge: Switch to managed MIPI DSI helpers in several places; Register and attach during probe in several places; Convert to YAML in several places * bridge/anx7625: Support MIPI DPI input; Support HDMI audio; Fixes * bridge/dw-hdmi: Allow interlace on bridge * bridge/ps8640: Enable PM; Support aux-bus * bridge/tc358768: Enabled reference clock; Support pulse mode; Modesetting fixes * bridge/ti-sn65dsi86: Use regmap_bulk_write(); Implement PWM * etnaviv: Get all fences at once * gma500: GEM object cleanups; Remove generic drivers in probe function * i915: Support VESA panel backlights * ingenic: Fixes and cleanups * kirin: Adjust probe order * kmb: Enable framebuffer console * lima: Kconfig fixes * meson: Refactoring to supperot DRM_BRIDGE_ATTACH_NO_ENCODER * msm: Fixes and cleanups * msm/dsi: Adjust probe order * omap: Fixes and cleanups * nouveau: CRC fixes; Validate LUTs in atomic check; Set HDMI AVI RGB quantization to FULL; Fixes and cleanups * panel: Support Innolux G070Y2-T02, Vivax TPC-9150, JDI R63452, Newhaven 1.8-128160EF, Wanchanglong W552964ABA, Novatek NT35950, BOE BF060Y8M, Sony Tulip Truly NT35521; Use dev_err_probe() throughout drivers; Fixes and cleanups * panel/ili9881c: Orientation fixes * radeon: Use dma_resv_wait_timeout() * rockchip: Add timeout for DSP hold; Suspend/resume fixes; PLL clock fixes; Implement mmap in GEM object functions * simpledrm: Support FB_DAMAGE_CLIPS and virtual screen sizes * sun4i: Use CMA helpers without vmap support * tidss: Fixes and cleanups * v3d: Cleanups * vc4: Fix HDMI-CEC hang when display is off; Power on HDMI controller while disabling; Support 4k@60 Hz modes; Fixes and cleanups * video: Convert to sysfs_emit() in several places * video/omapfb: Fix fall-through * virtio: Overflow fixes * xen: Implement mmap as GEM object functions Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/YZYZSypIrr+qcih3@linux-uq9g.fritz.box	2021-11-23 09:38:55 +10:00
Rob Clark	963d0b3569	drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder drm_sched_job_add_dependency() could drop the last ref, so we need to do the dma_fence_get() first. Cc: Christian König <christian.koenig@amd.com> Fixes: `9c2ba26535` ("drm/scheduler: use new iterator in drm_sched_job_add_implicit_dependencies v2") Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20211116155545.473311-1-robdclark@gmail.com Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Christian König <christian.koenig@amd.com> Tested-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Christian König <christian.koenig@amd.com>	2021-11-17 08:21:03 +01:00
Christian König	4eaf02d607	drm/scheduler: fix drm_sched_job_add_implicit_dependencies Trivial fix since we now need to grab a reference to the fence we have added. Previously the dma_resv function where doing that for us. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: `9c2ba26535` ("drm/scheduler: use new iterator in drm_sched_job_add_implicit_dependencies v2") Link: https://patchwork.freedesktop.org/patch/msgid/20211019112706.27769-1-christian.koenig@amd.com Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reported-by: Nicolas Frattaroli <frattaroli.nicolas@gmail.com> References: https://lore.kernel.org/dri-devel/2023306.UmlnhvANQh@archbook/ Tested-by: Nicolas Frattaroli <frattaroli.nicolas@gmail.com> Tested-by: Yassine Oudjana <y.oudjana@protonmail.com>	2021-11-16 10:00:58 +01:00
Andrey Grodzovsky	542cff7893	drm/sched: Avoid lockdep spalt on killing a processes Probelm: Singlaning one sched fence from within another's sched fence singal callback generates lockdep splat because the both have same lockdep class of their fence->lock Fix: Fix bellow stack by rescheduling to irq work of signaling and killing of jobs that left when entity is killed. [11176.741181] dump_stack+0x10/0x12 [11176.741186] __lock_acquire.cold+0x208/0x2df [11176.741197] lock_acquire+0xc6/0x2d0 [11176.741204] ? dma_fence_signal+0x28/0x80 [11176.741212] _raw_spin_lock_irqsave+0x4d/0x70 [11176.741219] ? dma_fence_signal+0x28/0x80 [11176.741225] dma_fence_signal+0x28/0x80 [11176.741230] drm_sched_fence_finished+0x12/0x20 [gpu_sched] [11176.741240] drm_sched_entity_kill_jobs_cb+0x1c/0x50 [gpu_sched] [11176.741248] dma_fence_signal_timestamp_locked+0xac/0x1a0 [11176.741254] dma_fence_signal+0x3b/0x80 [11176.741260] drm_sched_fence_finished+0x12/0x20 [gpu_sched] [11176.741268] drm_sched_job_done.isra.0+0x7f/0x1a0 [gpu_sched] [11176.741277] drm_sched_job_done_cb+0x12/0x20 [gpu_sched] [11176.741284] dma_fence_signal_timestamp_locked+0xac/0x1a0 [11176.741290] dma_fence_signal+0x3b/0x80 [11176.741296] amdgpu_fence_process+0xd1/0x140 [amdgpu] [11176.741504] sdma_v4_0_process_trap_irq+0x8c/0xb0 [amdgpu] [11176.741731] amdgpu_irq_dispatch+0xce/0x250 [amdgpu] [11176.741954] amdgpu_ih_process+0x81/0x100 [amdgpu] [11176.742174] amdgpu_irq_handler+0x26/0xa0 [amdgpu] [11176.742393] __handle_irq_event_percpu+0x4f/0x2c0 [11176.742402] handle_irq_event_percpu+0x33/0x80 [11176.742408] handle_irq_event+0x39/0x60 [11176.742414] handle_edge_irq+0x93/0x1d0 [11176.742419] __common_interrupt+0x50/0xe0 [11176.742426] common_interrupt+0x80/0x90 Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Suggested-by: Christian König <christian.koenig@amd.com> Tested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/dri-devel/msg321250.html	2021-11-01 11:08:21 -04:00
Christian König	13e9e30caf	drm/scheduler: fix drm_sched_job_add_implicit_dependencies Trivial fix since we now need to grab a reference to the fence we have added. Previously the dma_resv function where doing that for us. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: `9c2ba26535` ("drm/scheduler: use new iterator in drm_sched_job_add_implicit_dependencies v2") Link: https://patchwork.freedesktop.org/patch/msgid/20211019112706.27769-1-christian.koenig@amd.com Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reported-by: Nicolas Frattaroli <frattaroli.nicolas@gmail.com> References: https://lore.kernel.org/dri-devel/2023306.UmlnhvANQh@archbook/ Tested-by: Nicolas Frattaroli <frattaroli.nicolas@gmail.com>	2021-10-19 15:11:26 +02:00
Christian König	9c2ba26535	drm/scheduler: use new iterator in drm_sched_job_add_implicit_dependencies v2 Simplifying the code a bit. v2: use dma_resv_for_each_fence Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20211005113742.1101-17-christian.koenig@amd.com	2021-10-07 14:49:11 +02:00
Monk Liu	bcf26654a3	drm/sched: fix the bug of time out calculation(v4) issue: in cleanup_job the cancle_delayed_work will cancel a TO timer even the its corresponding job is still running. fix: do not cancel the timer in cleanup_job, instead do the cancelling only when the heading job is signaled, and if there is a "next" job we start_timeout again. v2: further cleanup the logic, and do the TDR timer cancelling if the signaled job is the last one in its scheduler. v3: change the issue description remove the cancel_delayed_work in the begining of the cleanup_job recover the implement of drm_sched_job_begin. v4: remove the kthread_should_park() checking in cleanup_job routine, we should cleanup the signaled job asap TODO: 1)introduce pause/resume scheduler in job_timeout to serial the handling of scheduler and job_timeout. 2)drop the bad job's del and insert in scheduler due to above serialization (no race issue anymore with the serialization) Tested-by: jingwen <jingwen.chen@@amd.com> Signed-off-by: Monk Liu <Monk.Liu@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/1630457207-13107-1-git-send-email-Monk.Liu@amd.com	2021-09-15 10:21:30 -04:00
Boris Brezillon	d4c16733e7	drm/sched: Fix drm_sched_fence_free() so it can be passed an uninitialized fence drm_sched_job_cleanup() will pass an uninitialized fence to drm_sched_fence_free(), which will cause to_drm_sched_fence() to return a NULL fence object, causing a NULL pointer deref when this NULL object is passed to kmem_cache_free(). Let's create a new drm_sched_fence_free() function that takes a drm_sched_fence pointer and suffix the old function with _rcu. While at it, complain if drm_sched_fence_free() is passed an initialized fence or if drm_sched_fence_free_rcu() is passed an uninitialized fence. Fixes: `dbe48d030b` ("drm/sched: Split drm_sched_job_init") Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210903120554.444101-1-boris.brezillon@collabora.com	2021-09-07 09:58:26 +02:00
Christian König	d72277b6c3	dma-buf: nuke DMA_FENCE_TRACE macros v2 Only the DRM GPU scheduler, radeon and amdgpu where using them and they depend on a non existing config option to actually emit some code. v2: keep the signal path as is for now Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210818105443.1578-1-christian.koenig@amd.com	2021-09-02 12:40:52 +02:00
Daniel Vetter	981b04d968	drm/sched: improve docs around drm_sched_entity I found a few too many things that are tricky and not documented, so I started typing. I found a few more things that looked broken while typing, see the varios FIXME in drm_sched_entity. Also some of the usual logics: - actually include sched_entity.c declarations, that was lost in the move here: `620e762f9a` ("drm/scheduler: move entity handling into separate file") - Ditch the kerneldoc for internal functions, keep the comments where they're describing more than what the function name already implies. - Switch drm_sched_entity to inline docs. Acked-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (v1) Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: "Christian König" <christian.koenig@amd.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Steven Price <steven.price@arm.com> Cc: Emma Anholt <emma@anholt.net> Cc: Lee Jones <lee.jones@linaro.org> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210805104705.862416-7-daniel.vetter@ffwll.ch	2021-08-30 10:57:18 +02:00
Daniel Vetter	0e10e9a1db	drm/sched: drop entity parameter from drm_sched_push_job Originally a job was only bound to the queue when we pushed this, but now that's done in drm_sched_job_init, making that parameter entirely redundant. Remove it. The same applies to the context parameter in lima_sched_context_queue_task, simplify that too. v2: Rebase on top of msm adopting drm/sched Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Emma Anholt <emma@anholt.net> Acked-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Steven Price <steven.price@arm.com> (v1) Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (v1) Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Russell King <linux+etnaviv@armlinux.org.uk> Cc: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Qiang Yu <yuq825@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Steven Price <steven.price@arm.com> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Emma Anholt <emma@anholt.net> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Nirmoy Das <nirmoy.das@amd.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Chen Li <chenli@uniontech.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Deepak R Varma <mh12gx2825@gmail.com> Cc: Kevin Wang <kevin1.wang@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: "Marek Olšák" <marek.olsak@amd.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Dennis Li <Dennis.Li@amd.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: etnaviv@lists.freedesktop.org Cc: lima@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: Rob Clark <robdclark@gmail.com> Cc: Sean Paul <sean@poorly.run> Cc: Melissa Wen <mwen@igalia.com> Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Link: https://patchwork.freedesktop.org/patch/msgid/20210805104705.862416-6-daniel.vetter@ffwll.ch	2021-08-30 10:54:45 +02:00
Daniel Vetter	ebd5f74255	drm/sched: Add dependency tracking Instead of just a callback we can just glue in the gem helpers that panfrost, v3d and lima currently use. There's really not that many ways to skin this cat. v2/3: Rebased. v4: Repaint this shed. The functions are now called _add_dependency() and _add_implicit_dependency() Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (v3) Reviewed-by: Steven Price <steven.price@arm.com> (v1) Acked-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Nirmoy Das <nirmoy.aiemd@gmail.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Link: https://patchwork.freedesktop.org/patch/msgid/20210805104705.862416-5-daniel.vetter@ffwll.ch	2021-08-30 10:54:12 +02:00
Daniel Vetter	b0a5303d4e	drm/sched: Barriers are needed for entity->last_scheduled It might be good enough on x86 with just READ_ONCE, but the write side should then at least be WRITE_ONCE because x86 has total store order. It's definitely not enough on arm. Fix this proplery, which means - explain the need for the barrier in both places - point at the other side in each comment Also pull out the !sched_list case as the first check, so that the code flow is clearer. While at it sprinkle some comments around because it was very non-obvious to me what's actually going on here and why. Note that we really need full barriers here, at first I thought store-release and load-acquire on ->last_scheduled would be enough, but we actually requiring ordering between that and the queue state. v2: Put smp_rmp() in the right place and fix up comment (Andrey) Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Steven Price <steven.price@arm.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Boris Brezillon <boris.brezillon@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210805104705.862416-4-daniel.vetter@ffwll.ch	2021-08-30 10:53:48 +02:00
Daniel Vetter	dbe48d030b	drm/sched: Split drm_sched_job_init This is a very confusingly named function, because not just does it init an object, it arms it and provides a point of no return for pushing a job into the scheduler. It would be nice if that's a bit clearer in the interface. But the real reason is that I want to push the dependency tracking helpers into the scheduler code, and that means drm_sched_job_init must be called a lot earlier, without arming the job. v2: - don't change .gitignore (Steven) - don't forget v3d (Emma) v3: Emma noticed that I leak the memory allocated in drm_sched_job_init if we bail out before the point of no return in subsequent driver patches. To be able to fix this change drm_sched_job_cleanup() so it can handle being called both before and after drm_sched_job_arm(). Also improve the kerneldoc for this. v4: - Fix the drm_sched_job_cleanup logic, I inverted the booleans, as usual (Melissa) - Christian pointed out that drm_sched_entity_select_rq() also needs to be moved into drm_sched_job_arm, which made me realize that the job->id definitely needs to be moved too. Shuffle things to fit between job_init and job_arm. v5: Reshuffle the split between init/arm once more, amdgpu abuses drm_sched.ready to signal gpu reset failures. Also document this somewhat. (Christian) v6: Rebase on top of the msm drm/sched support. Note that the drm_sched_job_init() call is completely misplaced, and hence also the split-out drm_sched_entity_push_job(). I've put in a FIXME which the next patch will address. v7: Drop the FIXME in msm, after discussions with Rob I agree it shouldn't be a problem where it is now. Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Melissa Wen <mwen@igalia.com> Cc: Melissa Wen <melissa.srw@gmail.com> Acked-by: Emma Anholt <emma@anholt.net> Acked-by: Steven Price <steven.price@arm.com> (v2) Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (v5) Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Russell King <linux+etnaviv@armlinux.org.uk> Cc: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Qiang Yu <yuq825@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Steven Price <steven.price@arm.com> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Adam Borowski <kilobyte@angband.pl> Cc: Nick Terrell <terrelln@fb.com> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Paul Menzel <pmenzel@molgen.mpg.de> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Nirmoy Das <nirmoy.das@amd.com> Cc: Deepak R Varma <mh12gx2825@gmail.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Kevin Wang <kevin1.wang@amd.com> Cc: Chen Li <chenli@uniontech.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: "Marek Olšák" <marek.olsak@amd.com> Cc: Dennis Li <Dennis.Li@amd.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Sonny Jiang <sonny.jiang@amd.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Tian Tao <tiantao6@hisilicon.com> Cc: etnaviv@lists.freedesktop.org Cc: lima@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: Emma Anholt <emma@anholt.net> Cc: Rob Clark <robdclark@gmail.com> Cc: Sean Paul <sean@poorly.run> Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Link: https://patchwork.freedesktop.org/patch/msgid/20210817084917.3555822-1-daniel.vetter@ffwll.ch	2021-08-30 10:50:44 +02:00
Boris Brezillon	78efe21b6f	drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr Mali Midgard/Bifrost GPUs have 3 hardware queues but only a global GPU reset. This leads to extra complexity when we need to synchronize timeout works with the reset work. One solution to address that is to have an ordered workqueue at the driver level that will be used by the different schedulers to queue their timeout work. Thanks to the serialization provided by the ordered workqueue we are guaranteed that timeout handlers are executed sequentially, and can thus easily reset the GPU from the timeout handler without extra synchronization. v5: * Add a new paragraph to the timedout_job() method v3: * New patch v4: * Actually use the timeout_wq to queue the timeout work Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Christian König <christian.koenig@amd.com> Cc: Qiang Yu <yuq825@gmail.com> Cc: Emma Anholt <emma@anholt.net> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210630062751.2832545-3-boris.brezillon@collabora.com	2021-07-01 08:53:25 +02:00
Boris Brezillon	3b5ac97ad4	drm/sched: Declare entity idle only after HW submission The panfrost driver tries to kill in-flight jobs on FD close after destroying the FD scheduler entities. For this to work properly, we need to make sure the jobs popped from the scheduler entities have been queued at the HW level before declaring the entity idle, otherwise we might iterate over a list that doesn't contain those jobs. Suggested-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Cc: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Link: https://patchwork.freedesktop.org/patch/msgid/20210624140850.2229697-1-boris.brezillon@collabora.com	2021-06-28 13:16:49 +02:00
Andrey Grodzovsky	0b10ab8069	drm/sched: Avoid data corruptions Wait for all dependencies of a job to complete before killing it to avoid data corruptions. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210519141407.88444-1-andrey.grodzovsky@amd.com	2021-05-19 23:50:28 -04:00
Andrey Grodzovsky	c61cdbdbff	drm/scheduler: Fix hang when sched_entity released Problem: If scheduler is already stopped by the time sched_entity is released and entity's job_queue not empty I encountred a hang in drm_sched_entity_flush. This is because drm_sched_entity_is_idle never becomes false. Fix: In drm_sched_fini detach all sched_entities from the scheduler's run queues. This will satisfy drm_sched_entity_is_idle. Also wakeup all those processes stuck in sched_entity flushing as the scheduler main thread which wakes them up is stopped by now. v2: Reverse order of drm_sched_rq_remove_entity and marking s_entity as stopped to prevent reinserion back to rq due to race. v3: Drop drm_sched_rq_remove_entity, only modify entity->stopped and check for it in drm_sched_entity_is_idle Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-14-andrey.grodzovsky@amd.com	2021-05-19 23:50:28 -04:00
Andrey Grodzovsky	75973e5802	drm/sched: Make timeout timer rearm conditional. We don't want to rearm the timer if driver hook reports that the device is gone. v5: Update drm_gpu_sched_stat values in code. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-11-andrey.grodzovsky@amd.com	2021-05-19 23:50:28 -04:00
Roy Sun	1774baa64f	drm/scheduler: Change scheduled fence track v2 Update the timestamp of scheduled fence on HW completion of the previous fences This allow more accurate tracking of the fence execution in HW v2 (chk): drop the flag check and improve the comment Signed-off-by: David M Nieto <david.nieto@amd.com> Signed-off-by: Roy Sun <Roy.Sun@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210426062701.39732-1-Roy.Sun@amd.com	2021-05-05 09:26:36 +02:00
Maxime Ripard	355b602961	Merge drm/drm-next into drm-misc-next Christian needs some patches from drm/next Signed-off-by: Maxime Ripard <maxime@cerno.tech>	2021-04-26 14:03:09 +02:00
Lee Jones	04be0c5b40	drm/scheduler/sched_entity: Fix some function name disparity Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/scheduler/sched_entity.c:204: warning: expecting prototype for drm_sched_entity_kill_jobs(). Prototype was for drm_sched_entity_kill_jobs_cb() instead drivers/gpu/drm/scheduler/sched_entity.c:262: warning: expecting prototype for drm_sched_entity_cleanup(). Prototype was for drm_sched_entity_fini() instead drivers/gpu/drm/scheduler/sched_entity.c:305: warning: expecting prototype for drm_sched_entity_fini(). Prototype was for drm_sched_entity_destroy() instead Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: dri-devel@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20210416143725.2769053-25-lee.jones@linaro.org Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com>	2021-04-22 14:05:17 +02:00
Jack Zhang	e6c6338f39	drm/amd/amdgpu implement tdr advanced mode [Why] Previous tdr design treats the first job in job_timeout as the bad job. But sometimes a later bad compute job can block a good gfx job and cause an unexpected gfx job timeout because gfx and compute ring share internal GC HW mutually. [How] This patch implements an advanced tdr mode.It involves an additinal synchronous pre-resubmit step(Step0 Resubmit) before normal resubmit step in order to find the real bad job. 1. At Step0 Resubmit stage, it synchronously submits and pends for the first job being signaled. If it gets timeout, we identify it as guilty and do hw reset. After that, we would do the normal resubmit step to resubmit left jobs. 2. For whole gpu reset(vram lost), do resubmit as the old way. v2: squash in build fix (Alex) Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2021-04-09 16:45:45 -04:00
Christian König	ac4eb83ab2	drm/sched: select new rq even if there is only one v3 This is necessary when changing priorities of an entity. v2: test the sched_list instead of num_sched. v3: set the sched_list to NULL when there is only one entry Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sonny Jiang <sonny.jiang@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210305125155.2312-1-christian.koenig@amd.com	2021-03-08 14:06:17 +01:00
Christian König	f2f12eb9c3	drm/scheduler: provide scheduler score externally Allow multiple schedulers to share the load balancing score. This is useful when one engine has different hw rings. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-and-Tested-by: Leo Liu <leo.liu@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210204144405.2737-1-christian.koenig@amd.com	2021-02-05 10:47:11 +01:00
Luben Tuikov	a6a1f036c7	drm/scheduler: Job timeout handler returns status (v3) This patch does not change current behaviour. The driver's job timeout handler now returns status indicating back to the DRM layer whether the device (GPU) is no longer available, such as after it's been unplugged, or whether all is normal, i.e. current behaviour. All drivers which make use of the drm_sched_backend_ops' .timedout_job() callback have been accordingly renamed and return the would've-been default value of DRM_GPU_SCHED_STAT_NOMINAL to restart the task's timeout timer--this is the old behaviour, and is preserved by this patch. v2: Use enum as the status of a driver's job timeout callback method. v3: Return scheduler/device information, rather than task information. Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Russell King <linux+etnaviv@armlinux.org.uk> Cc: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Qiang Yu <yuq825@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Steven Price <steven.price@arm.com> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Eric Anholt <eric@anholt.net> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Steven Price <steven.price@arm.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/415095/	2021-01-29 11:30:22 +01:00
Andrey Grodzovsky	e582951baa	drm/sched: Cancel and flush all outstanding jobs before finish. To avoid any possible use after free. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/414814/ CC: stable@vger.kernel.org Signed-off-by: Christian König <christian.koenig@amd.com>	2021-01-19 10:22:35 +01:00
Maarten Lankhorst	ae75a0431f	Merge drm/drm-next into drm-misc-next Required backmerge since we will be based on top of v5.11, and there has been a request to backmerge already to upstream some features. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>	2020-12-15 11:05:43 +01:00
Dave Airlie	b10733527b	Merge tag 'amd-drm-next-5.11-2020-12-09' of git://people.freedesktop.org/~agd5f/linux into drm-next amd-drm-next-5.11-2020-12-09: amdgpu: - SR-IOV fixes - Navy Flounder updates - Sienna Cichlid updates - Dimgrey Cavefish updates - Vangogh updates - Misc SMU fixes - Misc display fixes - Last big hunk of W=1 warning fixes - Cursor validation fixes - CI BACO updates From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201210045344.21566-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>	2020-12-10 16:55:53 +10:00
Luben Tuikov	71173e787c	drm/scheduler: Essentialize the job done callback The job done callback is called from various places, in two ways: in job done role, and as a fence callback role. Essentialize the callback to an atom function to just complete the job, and into a second function as a prototype of fence callback which calls to complete the job. This is used in latter patches by the completion code. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/405574/ Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Christian König <christian.koenig@amd.com>	2020-12-08 14:38:09 +01:00
Luben Tuikov	6efa4b465c	gpu/drm: ring_mirror_list --> pending_list Rename "ring_mirror_list" to "pending_list", to describe what something is, not what it does, how it's used, or how the hardware implements it. This also abstracts the actual hardware implementation, i.e. how the low-level driver communicates with the device it drives, ring, CAM, etc., shouldn't be exposed to DRM. The pending_list keeps jobs submitted, which are out of our control. Usually this means they are pending execution status in hardware, but the latter definition is a more general (inclusive) definition. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/405573/ Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Christian König <christian.koenig@amd.com>	2020-12-08 14:38:03 +01:00
Luben Tuikov	8935ff00e3	drm/scheduler: "node" --> "list" Rename "node" to "list" in struct drm_sched_job, in order to make it consistent with what we see being used throughout gpu_scheduler.h, for instance in struct drm_sched_entity, as well as the rest of DRM and the kernel. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/403515/ Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Christian König <christian.koenig@amd.com>	2020-12-08 14:37:55 +01:00
Mauro Carvalho Chehab	e9d2871f69	drm: fix some kernel-doc markups Some identifiers have different names between their prototypes and the kernel-doc markup. Others need to be fixed, as kernel-doc markups should use this format: identifier - description Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/12d4ca26f6843618200529ce5445063734d38c04.1605521731.git.mchehab+huawei@kernel.org	2020-11-16 20:48:20 +01:00
Lee Jones	00d44b966d	gpu: drm: scheduler: sched_entity: Demote non-conformant kernel-doc headers Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/scheduler/sched_entity.c:316: warning: Function parameter or member 'f' not described in 'drm_sched_entity_clear_dep' drivers/gpu/drm/scheduler/sched_entity.c:316: warning: Function parameter or member 'cb' not described in 'drm_sched_entity_clear_dep' drivers/gpu/drm/scheduler/sched_entity.c:330: warning: Function parameter or member 'f' not described in 'drm_sched_entity_wakeup' drivers/gpu/drm/scheduler/sched_entity.c:330: warning: Function parameter or member 'cb' not described in 'drm_sched_entity_wakeup' Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Nirmoy Das <nirmoy.aiemd@gmail.com> Cc: dri-devel@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-11-13 00:03:31 -05:00
Lee Jones	26b5cf49cd	gpu: drm: scheduler: sched_main: Provide missing description for 'sched' paramter Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/scheduler/sched_main.c:74: warning: Function parameter or member 'sched' not described in 'drm_sched_rq_init' Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: dri-devel@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-11-13 00:03:24 -05:00
Dave Airlie	1cd260a790	Merge tag 'drm-misc-next-2020-10-27' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 5.11: UAPI Changes: - doc: rules for EBUSY on non-blocking commits; requirements for fourcc modifiers; on parsing EDID - fbdev/sbuslib: Remove unused FBIOSCURSOR32 - fourcc: deprecate DRM_FORMAT_MOD_NONE - virtio: Support blob resources for memory allocations; Expose host-visible and cross-device features Cross-subsystem Changes: - devicetree: Add vendor Prefix for Yes Optoelectronics, Shanghai Top Display Optoelectronics - dma-buf: Add struct dma_buf_map that stores DMA pointer and I/O-memory flag; dma_buf_vmap()/vunmap() return address in dma_buf_map; Use struct_size() macro Core Changes: - atomic: pass full state to CRTC atomic enable/disable; warn for EBUSY during non-blocking commits - dp: Prepare for DP 2.0 DPCD - dp_mst: Receive extended DPCD caps - dma-buf: Documentation - doc: Format modifiers; dma-buf-map; Cleanups - fbdev: Don't use compat_alloc_user_space(); mark as orphaned - fb-helper: Take lock in drm_fb_helper_restore_work_fb() - gem: Convert implementation and drivers to GEM object functions, remove GEM callbacks from struct drm_driver (expect gem_prime_mmap) - panel: Cleanups - pci: Add legacy infix to drm_irq_by_busid() - sched: Avoid infinite waits in drm_sched_entity_destroy() - switcheroo: Cleanups - ttm: Remove AGP support; Don't modify caching during swapout; Major refactoring of the implementation and API that affects all depending drivers; Add ttm_bo_wait_ctx(); Add ttm_bo_pin()/unpin() in favor of TTM_PL_FLAG_NO_EVICT; Remove ttm_bo_create(); Remove fault_reserve_notify() callback; Push move() implementation into drivers; Remove TTM_PAGE_FLAG_WRITE; Replace caching flags with init-time cache setting; Push ttm_tt_bind() into drivers; Replace move_notify() with delete_mem_notify(); No overlapping memcpy(); no more ttm_set_populated() - vram-helper: Fix BO top-down placement; TTM-related changes; Init GEM object functions with defaults; Default placement in system memory; Cleanups Driver Changes: - amdgpu: Use GEM object functions - armada: Use GEM object functions - aspeed: Configure output via sysfs; Init struct drm_driver with - ast: Reload LUT after FB format changes - bridge: Add driver and DT bindings for anx7625; Cleanups - bridge/dw-hdmi: Constify ops - bridge/ti-sn65dsi86: Add retries for link training - bridge/lvds-codec: Add support for regulator - bridge/tc358768: Restore connector support DRM_GEM_CMA_DRIVEROPS; Cleanups - display/ti,j721e-dss: Add DT properies assigned-clocks, assigned-clocks-parent and dma-coherent - display/ti,am65s-dss: Add DT properies assigned-clocks, assigned-clocks-parent and dma-coherent - etnaviv: Use GEM object functions - exynos: Use GEM object functions - fbdev: Cleanups and compiler fixes throughout framebuffer drivers - fbdev/cirrusfb: Avoid division by 0 - gma500: Use GEM object functions; Fix double-free of connector; Cleanups - hisilicon/hibmc: I2C-based DDC support; Use to_hibmc_drm_device(); Cleanups - i915: Use GEM object functions - imx/dcss: Init driver with DRM_GEM_CMA_DRIVER_OPS; Cleanups - ingenic: Reset pixel clock when parent clock changes; support reserved memory; Alloc F0 and F1 DMA channels at once; Support different pixel formats; Revert support for cached mmap buffers on F0/F1; support 30-bit/24-bit/8-bit-palette modes - komeda: Use DEFINE_SHOW_ATTRIBUTE - mcde: Detect platform_get_irq() errors - mediatek: Use GEM object functions - msm: Use GEM object functions - nouveau: Cleanups; TTM-related changes; Use GEM object functions - omapdrm: Use GEM object functions - panel: Add driver and DT bindings for Novatak nt36672a; Add driver and DT bindings for YTC700TLAG-05-201C; Add driver and DT bindings for TDO TL070WSH30; Cleanups - panel/mantix: Fix reset; Fix deref of NULL pointer in mantix_get_modes() - panel/otm8009a: Allow non-continuous dsi clock; Cleanups - panel/rm68200: Allow non-continuous dsi clock; Fix mode to 50 FPS - panfrost: Fix job timeout handling; Cleanups - pl111: Use GEM object functions - qxl: Cleanups; TTM-related changes; Pin new BOs with ttm_bo_init_reserved() - radeon: Cleanups; TTM-related changes; Use GEM object functions - rockchip: Use GEM object functions - shmobile: Cleanups - tegra: Use GEM object functions - tidss: Set drm_plane_helper_funcs.prepare_fb - tilcdc: Don't keep vblank interrupt enabled all the time - tve200: Detect platform_get_irq() errors - vc4: Use GEM object functions; Only register components once DSI is attached; Add Maxime as maintainer - vgem: Use GEM object functions - via: Simplify critical section in via_mem_alloc() - virtgpu: Use GEM object functions - virtio: Implement blob resources, host-visible and cross-device features; Support mapping of host-allocated resources; Use UUID APi; Cleanups - vkms: Use GEM object functions; Switch to SHMEM - vmwgfx: TTM-related changes; Inline ttm_bo_swapout_all() - xen: Use GEM object functions - xlnx: Use GEM object functions Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20201027100936.GA4858@linux-uq9g	2020-11-04 11:49:10 +10:00
Maxime Ripard	c489573b5b	Merge drm/drm-next into drm-misc-next Daniel needs -rc2 in drm-misc-next to merge some patches Signed-off-by: Maxime Ripard <maxime@cerno.tech>	2020-11-02 11:17:54 +01:00
Boris Brezillon	170fb58ee3	drm/sched: Avoid infinite waits in the drm_sched_entity_destroy() path If we don't initialize the entity to idle and the entity is never scheduled before being destroyed we end up with an infinite wait in the destroy path. v2: - Add Steven's R-b Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/393486/	2020-10-05 15:06:33 +02:00
Tian Tao	05f59762bc	drm/scheduler: fix sched_fence.c kernel-doc warnings Fix kernel-doc warnings. drivers/gpu/drm/scheduler/sched_fence.c:110: warning: Function parameter or member 'f' not described in 'drm_sched_fence_release_scheduled' drivers/gpu/drm/scheduler/sched_fence.c:110: warning: Excess function parameter 'fence' description in 'drm_sched_fence_release_scheduled' Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-09-15 17:52:42 -04:00
Dave Airlie	0c8d22fcae	Merge tag 'amd-drm-next-5.10-2020-09-03' of git://people.freedesktop.org/~agd5f/linux into drm-next amd-drm-next-5.10-2020-09-03: amdgpu: - RAS fixes - Sienna Cichlid updates - Navy Flounder updates - DCE6 (SI) support in DC - Enable plane rotation - Rework pre-OS vram reservation handling during driver init - Add standard interface to dump GPU metrics table from SMU - Rework tiling and tmz state handling in atomic commits - Pstate fixes - Add voltage and power hwmon interfaces for renoir - SW CTF fixes - S/G display fix for Raven - Print client strings for vmfaults for vega and newer - Manual fan control fixes - Display updates - Reorg power management directory structure - Misc bug fixes - Misc code cleanups amdkfd: - Topology fixes - Add SMI events for thermal throttling and GPU resets radeon: - switch from pci_* to dma_* for dma allocations - PLL fix Scheduler: - Clean up priority levels UAPI: - amdgpu INFO IOCTL query update for TMZ state https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6049 - amdkfd SMI event interface updates https://github.com/RadeonOpenCompute/rocm_smi_lib/tree/therm_thrott From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200903222921.4152-1-alexander.deucher@amd.com	2020-09-08 16:40:13 +10:00
Luben Tuikov	e2d732fdb7	drm/scheduler: Scheduler priority fixes (v2) Remove DRM_SCHED_PRIORITY_LOW, as it was used in only one place. Rename and separate by a line DRM_SCHED_PRIORITY_MAX to DRM_SCHED_PRIORITY_COUNT as it represents a (total) count of said priorities and it is used as such in loops throughout the code. (0-based indexing is the the count number.) Remove redundant word HIGH in priority names, and rename KERNEL to HIGH, as it really means that, high. v2: Add back KERNEL and remove SW and HW, in lieu of a single HIGH between NORMAL and KERNEL. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-08-18 18:20:17 -04:00
Linus Torvalds	6d2b84a4e5	Merge tag 'sched-fifo-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull sched/fifo updates from Ingo Molnar: "This adds the sched_set_fifo() encapsulation APIs to remove static priority level knowledge from non-scheduler code. The three APIs for non-scheduler code to set SCHED_FIFO are: - sched_set_fifo() - sched_set_fifo_low() - sched_set_normal() These are two FIFO priority levels: default (high), and a 'low' priority level, plus sched_set_normal() to set the policy back to non-SCHED_FIFO. Since the changes affect a lot of non-scheduler code, we kept this in a separate tree" tag 'sched-fifo-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) sched,tracing: Convert to sched_set_fifo() sched: Remove sched_set_() return value sched: Remove sched_setscheduler() EXPORTs sched,psi: Convert to sched_set_fifo_low() sched,rcutorture: Convert to sched_set_fifo_low() sched,rcuperf: Convert to sched_set_fifo_low() sched,locktorture: Convert to sched_set_fifo() sched,irq: Convert to sched_set_fifo() sched,watchdog: Convert to sched_set_fifo() sched,serial: Convert to sched_set_fifo() sched,powerclamp: Convert to sched_set_fifo() sched,ion: Convert to sched_set_normal() sched,powercap: Convert to sched_set_fifo() sched,spi: Convert to sched_set_fifo() sched,mmc: Convert to sched_set_fifo() sched,ivtv: Convert to sched_set_fifo() sched,drm/scheduler: Convert to sched_set_fifo() sched,msm: Convert to sched_set_fifo() sched,psci: Convert to sched_set_fifo() sched,drbd: Convert to sched_set_fifo() ...	2020-08-06 11:55:43 -07:00
Maarten Lankhorst	60e9eabf41	Backmerge remote-tracking branch 'drm/drm-next' into drm-misc-next Some conflicts with ttm_bo->offset removal, but drm-misc-next needs updating to v5.8. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>	2020-06-29 12:16:26 +02:00
Nirmoy Das	d41a39dda1	drm/scheduler: improve job distribution with multiple queues This patch uses score to select a new drm scheduler for better loadbalance between multiple drm schedulers instead of num_jobs. Below are test results after running amdgpu_test for ~10 times. Before this patch: sched_name num of many times it got schedule ========= ================================== sdma0 1463 sdma1 198 comp_1.0.1 280 After this patch: sched_name num of many times it got schedule ========= ================================== sdma0 925 sdma1 928 comp_1.0.1 177 comp_1.1.1 44 comp_1.2.1 43 comp_1.3.1 44 Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/373000/ Signed-off-by: Christian König <christian.koenig@amd.com>	2020-06-26 14:16:29 +02:00
Peter Zijlstra	7b31e940b1	sched,drm/scheduler: Convert to sched_set_fifo*() Because SCHED_FIFO is a broken scheduler model (see previous patches) take away the priority field, the kernel can't possibly make an informed decision. In this case, use fifo_low, because it only cares about being above SCHED_NORMAL. Effectively no change in behaviour. Cc: alexander.deucher@amd.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org>	2020-06-15 14:10:21 +02:00
Christian König	8623b5255a	drm/scheduler: fix drm_sched_get_cleanup_job We are racing to initialize sched->thread here, just always check the current thread. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Link: https://patchwork.freedesktop.org/patch/361303/	2020-04-15 11:09:13 +02:00
Yintian Tao	77bb2f204f	drm/scheduler: fix rare NULL ptr race There is one one corner case at dma_fence_signal_locked which will raise the NULL pointer problem just like below. ->dma_fence_signal ->dma_fence_signal_locked ->test_and_set_bit here trigger dma_fence_release happen due to the zero of fence refcount. ->dma_fence_put ->dma_fence_release ->drm_sched_fence_release_scheduled ->call_rcu here make the union fled “cb_list” at finished fence to NULL because struct rcu_head contains two pointer which is same as struct list_head cb_list Therefore, to hold the reference of finished fence at drm_sched_process_job to prevent the null pointer during finished fence dma_fence_signal [ 732.912867] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 732.914815] #PF: supervisor write access in kernel mode [ 732.915731] #PF: error_code(0x0002) - not-present page [ 732.916621] PGD 0 P4D 0 [ 732.917072] Oops: 0002 [#1] SMP PTI [ 732.917682] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G OE 5.4.0-rc7 #1 [ 732.918980] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 [ 732.920906] RIP: 0010:dma_fence_signal_locked+0x3e/0x100 [ 732.938569] Call Trace: [ 732.939003] <IRQ> [ 732.939364] dma_fence_signal+0x29/0x50 [ 732.940036] drm_sched_fence_finished+0x12/0x20 [gpu_sched] [ 732.940996] drm_sched_process_job+0x34/0xa0 [gpu_sched] [ 732.941910] dma_fence_signal_locked+0x85/0x100 [ 732.942692] dma_fence_signal+0x29/0x50 [ 732.943457] amdgpu_fence_process+0x99/0x120 [amdgpu] [ 732.944393] sdma_v4_0_process_trap_irq+0x81/0xa0 [amdgpu] v2: hold the finished fence at drm_sched_process_job instead of amdgpu_fence_process v3: resume the blank line Signed-off-by: Yintian Tao <yttao@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-03-25 17:00:11 -04:00
Nirmoy Das	ec2edcc279	drm/sched: implement and export drm_sched_pick_best Remove drm_sched_entity_get_free_sched() and use the logic of picking the least loaded drm scheduler from a drm scheduler list to implement drm_sched_pick_best(). This patch also exports drm_sched_pick_best() so that it can be utilized by other drm drivers. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-03-16 16:21:32 -04:00
changzhu	d164bebb95	Revert "drm/scheduler: improve job distribution with multiple queues" It needs to revert this patch to avoid amdgpu_test compute hang problem on picasso. This reverts commit `56822db194`. Signed-off-by: changzhu <Changfeng.Zhu@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-03-16 16:21:32 -04:00
Lucas Stach	a7fbb630c5	drm/scheduler: fix inconsistent locking of job_list_lock `1db8c142b6` (drm/scheduler: Add drm_sched_suspend/resume_timeout()) made the job_list_lock IRQ safe in as the suspend/resume calls were expected to be called from IRQ context. This usage never materialized in upstream. Instead amdgpu started locking the job_list_lock in an IRQ unsafe way in amdgpu_ib_preempt_mark_partial_job() and amdgpu_ib_preempt_job_recovery(), which leads to potential deadlock if one would actually start to call the drm_sched_suspend/resume_timeout functions from IRQ context. As no current user needs the locking to be IRQ safe, the local IRQ disable/enable is pure overhead. Fix the inconsistent locking by changing all uses of job_list_lock to use the IRQ unsafe locking primitives. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-03-13 11:52:36 -04:00

1 2 3

140 Commits