Files
linux/drivers/gpu/drm/i915/gem/i915_gemfs.c

61 lines
1.5 KiB
C
Raw Normal View History

/*
* SPDX-License-Identifier: MIT
*
* Copyright © 2017 Intel Corporation
*/
#include <linux/fs.h>
#include <linux/mount.h>
#include "i915_drv.h"
#include "i915_gemfs.h"
#include "i915_utils.h"
void i915_gemfs_init(struct drm_i915_private *i915)
{
char huge_opt[] = "huge=within_size"; /* r/w */
struct file_system_type *type;
struct vfsmount *gemfs;
/*
* By creating our own shmemfs mountpoint, we can pass in
* mount flags that better match our usecase.
*
* One example, although it is probably better with a per-file
* control, is selecting huge page allocations ("huge=within_size").
drm/i915: Enable THP on Icelake and beyond We have a statement from HW designers that the GPU read regression when using 2M pages was fixed from Icelake onwards, which was also confirmed by bencharking Eero did last year: """ When IOMMU is disabled, enabling THP causes following perf changes on TGL-H (GT1): 10-15% SynMark Batch[0-3] 5-10% MemBW GPU texture, SynMark ShMapVsm 3-5% SynMark TerrainFly* + Geom* + Fill* + CSCloth + Batch4 1-3% GpuTest Triangle, SynMark TexMem* + DeferredAA + Batch[5-7] + few others -7% MemBW GPU blend In the above 3D benchmark names, * means all the variants of tests with the same prefix. For example "SynMark TexMem*", means both TexMem128 & TexMem512 tests in the synthetic (Intel internal) SynMark test suite. In the (public, but proprietary) GfxBench & GLB(enchmark) test suites, there are both onscreen and offscreen variants of each test. Unless explicitly stated otherwise, numbers are for both variants. All tests are run with FullHD monitor. All tests are fullscreen except for GLB and GpuTest ones, which are run in 1/2 screen window (GpuTest triangle is run both in fullscreen and 1/2 screen window). """ Since the only regression is MemBW GPU blend, against many more gains, it sounds it is time to enable THP on Gen11+. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> References: https://gitlab.freedesktop.org/drm/intel/-/issues/430 Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Eero Tamminen <eero.t.tamminen@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220429100414.647857-1-tvrtko.ursulin@linux.intel.com
2022-04-29 11:04:13 +01:00
* However, we only do so on platforms which benefit from it, or to
* offset the overhead of iommu lookups, where with latter it is a net
* win even on platforms which would otherwise see some performance
* regressions such a slow reads issue on Broadwell and Skylake.
*/
if (GRAPHICS_VER(i915) < 11 && !i915_vtd_active(i915))
return;
if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
goto err;
drm/i915: Use Transparent Hugepages when IOMMU is enabled Usage of Transparent Hugepages was disabled in 9987da4b5dcf ("drm/i915: Disable THP until we have a GPU read BW W/A"), but since it appears majority of performance regressions reported with an enabled IOMMU can be almost eliminated by turning them on, lets just do that. To err on the side of safety we keep the current default in cases where IOMMU is not active, and only when it is default to the "huge=within_size" mode. Although there probably would be wins to enable them throughout, more extensive testing across benchmarks and platforms would need to be done. With the patch and IOMMU enabled my local testing on a small Skylake part shows OglVSTangent regression being reduced from ~14% (IOMMU on versus IOMMU off) to ~2% (same comparison but with THP on). More detailed testing done in the below referenced Gitlab issue by Eero: Skylake GT4e: Performance drops from enabling IOMMU: 30-35% SynMark CSDof 20-25% Unigine Heaven, MemBW GPU write, SynMark VSTangent ~20% GLB Egypt (1/2 screen window) 10-15% GLB T-Rex (1/2 screen window) 8-10% GfxBench T-Rex, MemBW GPU blit 7-8% SynMark DeferredAA + TerrainFly* + ZBuffer 6-7% GfxBench Manhattan 3.0 + 3.1, SynMark TexMem128 & CSCloth 5-6% GfxBench CarChase, Unigine Valley 3-5% GfxBench Vulkan & GL AztecRuins + ALU2, MemBW GPU texture, SynMark Fill*, Deferred, TerrainPan* 1-2% Most of the other tests With the patch drops become: 20-25% SynMark TexMem* 15-20% GLB Egypt (1/2 screen window) 10-15% GLB T-Rex (1/2 screen window) 4-7% GfxBench T-Rex, GpuTest Triangle 1-8% GfxBench ALU2 (offscreen 1%, onscreen 8%) 3% GfxBench Manhattan 3.0, SynMark CSDof 2-3% Unigine Heaven + Valley, MemBW GPU texture 1-3 GfxBench Manhattan 3.1 + CarChase + Vulkan & GL AztecRuins Broxton: Performance drops from IOMMU, without patch: 30% MemBW GPU write 25% SynMark ZBuffer + Fill* 20% MemBW GPU blit 15% MemBW GPU blend, GpuTest Triangle 10-15% MemBW GPU texture 10% GLB Egypt, Unigine Heaven (had hangs), SynMark TerrainFly* 7-9% GLB T-Rex, GfxBench Manhattan 3.0 + T-Rex, SynMark Deferred* + TexMem* 6-8% GfxBench CarChase, Unigine Valley, SynMark CSCloth + ShMapVsm + TerrainPan* 5-6% GfxBench Manhattan 3.1 + GL AztecRuins, SynMark CSDof + TexFilterTri 2-4% GfxBench ALU2, SynMark DrvRes + GSCloth + ShMapPcf + Batch[0-5] + TexFilterAniso, GpuTest GiMark + 32-bit Julia And with patch: 15-20% MemBW GPU texture 10% SynMark TexMem* 8-9% GLB Egypt (1/2 screen window) 4-5% GLB T-Rex (1/2 screen window) 3-6% GfxBench Manhattan 3.0, GpuTest FurMark, SynMark Deferred + TexFilterTri 3-4% GfxBench Manhattan 3.1 + T-Rex, SynMark VSInstancing 2-4% GpuTest Triangle, SynMark DeferredAA 2-3% Unigine Heaven + Valley 1-3% SynMark Terrain* 1-2% GfxBench CarChase, SynMark TexFilterAniso + ZBuffer Tigerlake-H: 20-25% MemBW GPU texture 15-20% GpuTest Triangle 13-15% SynMark TerrainFly* + DeferredAA + HdrBloom 8-10% GfxBench Manhattan 3.1, SynMark TerrainPan* + DrvRes 6-7% GfxBench Manhattan 3.0, SynMark TexMem* 4-8% GLB onscreen Fill + T-Rex + Egypt (more in onscreen than offscreen versions of T-Rex/Egypt) 4-6% GfxBench CarChase + GLES AztecRuins + ALU2, GpuTest 32-bit Julia, SynMark CSDof + DrvState 3-5% GfxBench T-Rex + Egypt, Unigine Heaven + Valley, GpuTest Plot3D 1-7% Media tests 2-3% MemBW GPU blit 1-3% Most of the rest of 3D tests With the patch: 6-8% MemBW GPU blend => the only regression in these tests (compared to IOMMU without THP) 4-6% SynMark DrvState (not impacted) + HdrBloom (improved) 3-4% GLB T-Rex ~3% GLB Egypt, SynMark DrvRes 1-3% GfxBench T-Rex + Egypt, SynMark TexFilterTri 1-2% GfxBench CarChase + GLES AztecRuins, Unigine Valley, GpuTest Triangle ~1% GfxBench Manhattan 3.0/3.1, Unigine Heaven Perf of several tests actually improved with IOMMU + THP, compared to no IOMMU / no THP: 10-15% SynMark Batch[0-3] 5-10% MemBW GPU texture, SynMark ShMapVsm 3-4% SynMark Fill* + Geom* 2-3% SynMark TexMem512 + CSCloth 1-2% SynMark TexMem128 + DeferredAA As a summary across all platforms, these are the benchmarks where enabling THP on top of IOMMU enabled brings regressions: * Skylake GT4e: 20-25% SynMark TexMem* (whereas all MemBW GPU tests either improve or are not affected) * Broxton J4205: 7% MemBW GPU texture 2-3% SynMark TexMem* * Tigerlake-H: 7% MemBW GPU blend Other benchmarks show either lowering of regressions or improvements. v2: * Add Kconfig dependency to transparent hugepages and some help text. * Move to helper for easier handling of kernel build options. v3: * Drop Kconfig. (Daniel) v4: * Add some benchmark results to commit message. v5: * Add explicit regression summary to commit message. (Eero) References: b901bb89324a ("drm/i915/gemfs: enable THP") References: 9987da4b5dcf ("drm/i915: Disable THP until we have a GPU read BW W/A") References: https://gitlab.freedesktop.org/drm/intel/-/issues/430 Co-developed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Eero Tamminen <eero.t.tamminen@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909114448.508493-1-tvrtko.ursulin@linux.intel.com
2021-09-09 12:44:48 +01:00
type = get_fs_type("tmpfs");
if (!type)
goto err;
gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name, huge_opt);
if (IS_ERR(gemfs))
goto err;
i915->mm.gemfs = gemfs;
drm_info(&i915->drm, "Using Transparent Hugepages\n");
return;
err:
drm_notice(&i915->drm,
"Transparent Hugepage support is recommended for optimal performance%s\n",
GRAPHICS_VER(i915) >= 11 ? " on this platform!" :
" when IOMMU is enabled!");
}
void i915_gemfs_fini(struct drm_i915_private *i915)
{
kern_unmount(i915->mm.gemfs);
}