docs: mm: add enable_soft_offline sysctl

Add the documentation for soft offline behaviors / costs, and what the new
enable_soft_offline sysctl is for.

[jiaqiyan@google.com: fix kerneldoc warnings]
  Link: https://lkml.kernel.org/r/CACw3F52=GxTCDw-PqFh3-GDM-fo3GbhGdu0hedxYXOTT4TQSTg@mail.gmail.com
[jiaqiyan@google.com: there are more blank lines needed]
  Link: https://lkml.kernel.org/r/CACw3F52_obAB742XeDRNun4BHBYtrxtbvp5NkUincXdaob0j1g@mail.gmail.com
Link: https://lkml.kernel.org/r/20240626050818.2277273-5-jiaqiyan@google.com
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Frank van der Linden <fvdl@google.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This commit is contained in:
Jiaqi Yan 2024-06-26 05:08:18 +00:00 committed by Andrew Morton
parent 72ead83dad
commit 44195d1eba

View File

@ -36,6 +36,7 @@ Currently, these files are in /proc/sys/vm:
- dirtytime_expire_seconds
- dirty_writeback_centisecs
- drop_caches
- enable_soft_offline
- extfrag_threshold
- highmem_is_dirtyable
- hugetlb_shm_group
@ -267,6 +268,43 @@ used::
These are informational only. They do not mean that anything is wrong
with your system. To disable them, echo 4 (bit 2) into drop_caches.
enable_soft_offline
===================
Correctable memory errors are very common on servers. Soft-offline is kernel's
solution for memory pages having (excessive) corrected memory errors.
For different types of page, soft-offline has different behaviors / costs.
- For a raw error page, soft-offline migrates the in-use page's content to
a new raw page.
- For a page that is part of a transparent hugepage, soft-offline splits the
transparent hugepage into raw pages, then migrates only the raw error page.
As a result, user is transparently backed by 1 less hugepage, impacting
memory access performance.
- For a page that is part of a HugeTLB hugepage, soft-offline first migrates
the entire HugeTLB hugepage, during which a free hugepage will be consumed
as migration target. Then the original hugepage is dissolved into raw
pages without compensation, reducing the capacity of the HugeTLB pool by 1.
It is user's call to choose between reliability (staying away from fragile
physical memory) vs performance / capacity implications in transparent and
HugeTLB cases.
For all architectures, enable_soft_offline controls whether to soft offline
memory pages. When set to 1, kernel attempts to soft offline the pages
whenever it thinks needed. When set to 0, kernel returns EOPNOTSUPP to
the request to soft offline the pages. Its default value is 1.
It is worth mentioning that after setting enable_soft_offline to 0, the
following requests to soft offline pages will not be performed:
- Request to soft offline pages from RAS Correctable Errors Collector.
- On ARM, the request to soft offline pages from GHES driver.
- On PARISC, the request to soft offline pages from Page Deallocation Table.
extfrag_threshold
=================