linux/mm
Nishanth Aravamudan d1c3fb1f8f hugetlb: introduce nr_overcommit_hugepages sysctl
hugetlb: introduce nr_overcommit_hugepages sysctl

While examining the code to support /proc/sys/vm/hugetlb_dynamic_pool, I
became convinced that having a boolean sysctl was insufficient:

1) To support per-node control of hugepages, I have previously submitted
patches to add a sysfs attribute related to nr_hugepages. However, with
a boolean global value and per-mount quota enforcement constraining the
dynamic pool, adding corresponding control of the dynamic pool on a
per-node basis seems inconsistent to me.

2) Administration of the hugetlb dynamic pool with multiple hugetlbfs
mount points is, arguably, more arduous than it needs to be. Each quota
would need to be set separately, and the sum would need to be monitored.

To ease the administration, and to help make the way for per-node
control of the static & dynamic hugepage pool, I added a separate
sysctl, nr_overcommit_hugepages. This value serves as a high watermark
for the overall hugepage pool, while nr_hugepages serves as a low
watermark. The boolean sysctl can then be removed, as the condition

	nr_overcommit_hugepages > 0

indicates the same administrative setting as

	hugetlb_dynamic_pool == 1

Quotas still serve as local enforcement of the size of the pool on a
per-mount basis.

A few caveats:

1) There is a race whereby the global surplus huge page counter is
incremented before a hugepage has allocated. Another process could then
try grow the pool, and fail to convert a surplus huge page to a normal
huge page and instead allocate a fresh huge page. I believe this is
benign, as no memory is leaked (the actual pages are still tracked
correctly) and the counters won't go out of sync.

2) Shrinking the static pool while a surplus is in effect will allow the
number of surplus huge pages to exceed the overcommit value. As long as
this condition holds, however, no more surplus huge pages will be
allowed on the system until one of the two sysctls are increased
sufficiently, or the surplus huge pages go out of use and are freed.

Successfully tested on x86_64 with the current libhugetlbfs snapshot,
modified to use the new sysctl.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: Adam Litke <agl@us.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-17 19:28:17 -08:00
..
allocpercpu.c Slab allocators: Replace explicit zeroing with __GFP_ZERO 2007-07-17 10:23:02 -07:00
backing-dev.c mm/backing-dev.c: fix percpu_counter_destroy call bug in bdi_init 2007-12-05 09:21:18 -08:00
bootmem.c [PATCH] remove EXPORT_UNUSED_SYMBOL'ed symbols 2006-12-07 08:39:44 -08:00
bounce.c block: Initial support for data-less (or empty) barrier support 2007-10-16 11:03:56 +02:00
fadvise.c [PATCH] mm: change uses of f_{dentry,vfsmnt} to use f_path 2006-12-08 08:28:43 -08:00
filemap_xip.c mm: fix XIP file writes 2007-12-05 09:21:20 -08:00
filemap.c Remove broken ptrace() special-case code from file mapping 2007-10-31 09:19:46 -07:00
fremap.c remap_file_pages: kernel-doc corrections 2007-10-17 08:43:07 -07:00
highmem.c Create the ZONE_MOVABLE zone 2007-07-17 10:22:59 -07:00
hugetlb.c hugetlb: introduce nr_overcommit_hugepages sysctl 2007-12-17 19:28:17 -08:00
internal.h Breakout page_order() to internal.h to avoid special knowledge of the buddy allocator 2007-10-16 09:43:01 -07:00
Kconfig sparsemem: make SPARSEMEM_VMEMMAP selectable 2007-12-17 19:28:16 -08:00
madvise.c speed up madvise_need_mmap_write() usage 2007-07-16 09:05:36 -07:00
Makefile memory unplug: page isolation 2007-10-16 09:43:02 -07:00
memory_hotplug.c Add IORESOUCE_BUSY flag for System RAM 2007-11-14 18:45:39 -08:00
memory.c Swap delay accounting, include lock_page() delays 2007-11-14 18:45:44 -08:00
mempolicy.c Migration: find correct vma in new_vma_page() 2007-11-14 18:45:38 -08:00
mempool.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
migrate.c Typo fixes retrun -> return 2007-10-20 02:13:26 +02:00
mincore.c [PATCH] mincore: vma crossing fix 2007-02-15 09:57:03 -08:00
mlock.c do not limit locked memory when RLIMIT_MEMLOCK is RLIM_INFINITY 2007-07-16 09:05:37 -07:00
mmap.c VM/Security: add security hook to do_brk 2007-12-05 09:21:21 -08:00
mmzone.c [PATCH] remove EXPORT_UNUSED_SYMBOL'ed symbols 2006-12-07 08:39:44 -08:00
mprotect.c fix mprotect vma_wants_writenotify prot 2007-10-23 08:32:06 -07:00
mremap.c sparse pointer use of zero as null 2007-10-18 14:37:31 -07:00
msync.c Detach sched.h from mm.h 2007-05-21 09:18:19 -07:00
nommu.c Security: round mmap hint address above mmap_min_addr 2007-12-06 00:25:10 +11:00
oom_kill.c oom_kill bug 2007-10-20 15:04:06 -07:00
page_alloc.c mm: fix page allocation for larger I/O segments 2007-12-17 19:28:16 -08:00
page_io.c Drop 'size' argument from bio_endio and bi_end_io 2007-10-10 09:25:57 +02:00
page_isolation.c memory hotremove: unset migrate type "ISOLATE" after removal 2007-11-14 18:45:38 -08:00
page-writeback.c dirty page balancing: Get rid of broken unmapped_ratio logic 2007-11-15 16:41:52 -08:00
pdflush.c Freezer: make kernel threads nonfreezable by default 2007-07-17 10:23:02 -07:00
prio_tree.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
quicklist.c Quicklists for page table pages 2007-05-07 12:12:54 -07:00
readahead.c mm: bdi init hooks 2007-10-17 08:42:45 -07:00
rmap.c [S390] Optimize storage key handling for anonymous pages 2007-11-20 11:13:46 +01:00
shmem_acl.c [PATCH] Fix typos in mm/shmem_acl.c 2006-10-11 11:14:23 -07:00
shmem.c tmpfs: restore missing clear_highpage 2007-11-28 11:04:28 -08:00
slab.c Add EXPORT_SYMBOL(ksize); 2007-12-05 09:21:18 -08:00
slob.c Avoid double memclear() in SLOB/SLUB 2007-12-09 10:17:52 -08:00
slub.c Avoid double memclear() in SLOB/SLUB 2007-12-09 10:17:52 -08:00
sparse-vmemmap.c memory hotplug fix: fix section mismatch in vmammap_allock_block() 2007-11-29 09:24:54 -08:00
sparse.c mm/sparse.c: improve the error handling for sparse_add_one_section() 2007-12-17 19:28:16 -08:00
swap_state.c mm: clarify __add_to_swap_cache locking 2007-10-16 09:42:53 -07:00
swap.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
swapfile.c Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION 2007-07-29 16:45:38 -07:00
thrash.c Bug in mm/thrash.c function grab_swap_token() 2007-05-11 08:29:32 -07:00
tiny-shmem.c r/o bind mounts: filesystem helpers for custom 'struct file's 2007-10-17 08:43:04 -07:00
truncate.c Drop some headers from mm.h 2007-10-17 08:42:55 -07:00
util.c fix mm/util.c:krealloc() 2007-11-14 18:45:41 -08:00
vmalloc.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
vmscan.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
vmstat.c vmstat: fix section mismatch warning 2007-11-14 18:45:42 -08:00