mirror of
https://github.com/torvalds/linux.git
synced 2024-11-10 06:01:57 +00:00
sched/numa-balancing: Move some document to make it consistent with the code
After commit 8a99b6833c
("sched: Move SCHED_DEBUG sysctl to
debugfs"), some NUMA balancing sysctls enclosed with SCHED_DEBUG has
been moved to debugfs. This patch move the document for these
sysctls from
Documentation/admin-guide/sysctl/kernel.rst
to
Documentation/scheduler/sched-debug.rst
to make the document consistent with the code.
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lkml.kernel.org/r/20220210052514.3038279-1-ying.huang@intel.com
This commit is contained in:
parent
e496132ebe
commit
3624ba7b5e
@ -609,51 +609,7 @@ be migrated to a local memory node.
|
||||
The unmapping of pages and trapping faults incur additional overhead that
|
||||
ideally is offset by improved memory locality but there is no universal
|
||||
guarantee. If the target workload is already bound to NUMA nodes then this
|
||||
feature should be disabled. Otherwise, if the system overhead from the
|
||||
feature is too high then the rate the kernel samples for NUMA hinting
|
||||
faults may be controlled by the `numa_balancing_scan_period_min_ms,
|
||||
numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms,
|
||||
numa_balancing_scan_size_mb`_, and numa_balancing_settle_count sysctls.
|
||||
|
||||
|
||||
numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb
|
||||
===============================================================================================================================
|
||||
|
||||
|
||||
Automatic NUMA balancing scans tasks address space and unmaps pages to
|
||||
detect if pages are properly placed or if the data should be migrated to a
|
||||
memory node local to where the task is running. Every "scan delay" the task
|
||||
scans the next "scan size" number of pages in its address space. When the
|
||||
end of the address space is reached the scanner restarts from the beginning.
|
||||
|
||||
In combination, the "scan delay" and "scan size" determine the scan rate.
|
||||
When "scan delay" decreases, the scan rate increases. The scan delay and
|
||||
hence the scan rate of every task is adaptive and depends on historical
|
||||
behaviour. If pages are properly placed then the scan delay increases,
|
||||
otherwise the scan delay decreases. The "scan size" is not adaptive but
|
||||
the higher the "scan size", the higher the scan rate.
|
||||
|
||||
Higher scan rates incur higher system overhead as page faults must be
|
||||
trapped and potentially data must be migrated. However, the higher the scan
|
||||
rate, the more quickly a tasks memory is migrated to a local node if the
|
||||
workload pattern changes and minimises performance impact due to remote
|
||||
memory accesses. These sysctls control the thresholds for scan delays and
|
||||
the number of pages scanned.
|
||||
|
||||
``numa_balancing_scan_period_min_ms`` is the minimum time in milliseconds to
|
||||
scan a tasks virtual memory. It effectively controls the maximum scanning
|
||||
rate for each task.
|
||||
|
||||
``numa_balancing_scan_delay_ms`` is the starting "scan delay" used for a task
|
||||
when it initially forks.
|
||||
|
||||
``numa_balancing_scan_period_max_ms`` is the maximum time in milliseconds to
|
||||
scan a tasks virtual memory. It effectively controls the minimum scanning
|
||||
rate for each task.
|
||||
|
||||
``numa_balancing_scan_size_mb`` is how many megabytes worth of pages are
|
||||
scanned for a given scan.
|
||||
|
||||
feature should be disabled.
|
||||
|
||||
oops_all_cpu_backtrace
|
||||
======================
|
||||
|
@ -17,6 +17,7 @@ Linux Scheduler
|
||||
sched-nice-design
|
||||
sched-rt-group
|
||||
sched-stats
|
||||
sched-debug
|
||||
|
||||
text_files
|
||||
|
||||
|
54
Documentation/scheduler/sched-debug.rst
Normal file
54
Documentation/scheduler/sched-debug.rst
Normal file
@ -0,0 +1,54 @@
|
||||
=================
|
||||
Scheduler debugfs
|
||||
=================
|
||||
|
||||
Booting a kernel with CONFIG_SCHED_DEBUG=y will give access to
|
||||
scheduler specific debug files under /sys/kernel/debug/sched. Some of
|
||||
those files are described below.
|
||||
|
||||
numa_balancing
|
||||
==============
|
||||
|
||||
`numa_balancing` directory is used to hold files to control NUMA
|
||||
balancing feature. If the system overhead from the feature is too
|
||||
high then the rate the kernel samples for NUMA hinting faults may be
|
||||
controlled by the `scan_period_min_ms, scan_delay_ms,
|
||||
scan_period_max_ms, scan_size_mb` files.
|
||||
|
||||
|
||||
scan_period_min_ms, scan_delay_ms, scan_period_max_ms, scan_size_mb
|
||||
-------------------------------------------------------------------
|
||||
|
||||
Automatic NUMA balancing scans tasks address space and unmaps pages to
|
||||
detect if pages are properly placed or if the data should be migrated to a
|
||||
memory node local to where the task is running. Every "scan delay" the task
|
||||
scans the next "scan size" number of pages in its address space. When the
|
||||
end of the address space is reached the scanner restarts from the beginning.
|
||||
|
||||
In combination, the "scan delay" and "scan size" determine the scan rate.
|
||||
When "scan delay" decreases, the scan rate increases. The scan delay and
|
||||
hence the scan rate of every task is adaptive and depends on historical
|
||||
behaviour. If pages are properly placed then the scan delay increases,
|
||||
otherwise the scan delay decreases. The "scan size" is not adaptive but
|
||||
the higher the "scan size", the higher the scan rate.
|
||||
|
||||
Higher scan rates incur higher system overhead as page faults must be
|
||||
trapped and potentially data must be migrated. However, the higher the scan
|
||||
rate, the more quickly a tasks memory is migrated to a local node if the
|
||||
workload pattern changes and minimises performance impact due to remote
|
||||
memory accesses. These files control the thresholds for scan delays and
|
||||
the number of pages scanned.
|
||||
|
||||
``scan_period_min_ms`` is the minimum time in milliseconds to scan a
|
||||
tasks virtual memory. It effectively controls the maximum scanning
|
||||
rate for each task.
|
||||
|
||||
``scan_delay_ms`` is the starting "scan delay" used for a task when it
|
||||
initially forks.
|
||||
|
||||
``scan_period_max_ms`` is the maximum time in milliseconds to scan a
|
||||
tasks virtual memory. It effectively controls the minimum scanning
|
||||
rate for each task.
|
||||
|
||||
``scan_size_mb`` is how many megabytes worth of pages are scanned for
|
||||
a given scan.
|
Loading…
Reference in New Issue
Block a user