Cpuset hardwall flag: add a mem_hardwall flag to cpusets

This flag provides the hardwalling properties of mem_exclusive, without
enforcing the exclusivity.  Either mem_hardwall or mem_exclusive is sufficient
to prevent GFP_KERNEL allocations from passing outside the cpuset's assigned
nodes.

Signed-off-by: Paul Menage <menage@google.com>
Acked-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit is contained in:
Paul Menage 2008-04-29 01:00:26 -07:00 committed by Linus Torvalds
parent addf2c739d
commit 786083667e
2 changed files with 46 additions and 28 deletions

View File

@ -171,6 +171,7 @@ files describing that cpuset:
- memory_migrate flag: if set, move pages to cpusets nodes - memory_migrate flag: if set, move pages to cpusets nodes
- cpu_exclusive flag: is cpu placement exclusive? - cpu_exclusive flag: is cpu placement exclusive?
- mem_exclusive flag: is memory placement exclusive? - mem_exclusive flag: is memory placement exclusive?
- mem_hardwall flag: is memory allocation hardwalled
- memory_pressure: measure of how much paging pressure in cpuset - memory_pressure: measure of how much paging pressure in cpuset
In addition, the root cpuset only has the following file: In addition, the root cpuset only has the following file:
@ -222,17 +223,18 @@ If a cpuset is cpu or mem exclusive, no other cpuset, other than
a direct ancestor or descendent, may share any of the same CPUs or a direct ancestor or descendent, may share any of the same CPUs or
Memory Nodes. Memory Nodes.
A cpuset that is mem_exclusive restricts kernel allocations for A cpuset that is mem_exclusive *or* mem_hardwall is "hardwalled",
page, buffer and other data commonly shared by the kernel across i.e. it restricts kernel allocations for page, buffer and other data
multiple users. All cpusets, whether mem_exclusive or not, restrict commonly shared by the kernel across multiple users. All cpusets,
allocations of memory for user space. This enables configuring a whether hardwalled or not, restrict allocations of memory for user
system so that several independent jobs can share common kernel data, space. This enables configuring a system so that several independent
such as file system pages, while isolating each jobs user allocation in jobs can share common kernel data, such as file system pages, while
its own cpuset. To do this, construct a large mem_exclusive cpuset to isolating each job's user allocation in its own cpuset. To do this,
hold all the jobs, and construct child, non-mem_exclusive cpusets for construct a large mem_exclusive cpuset to hold all the jobs, and
each individual job. Only a small amount of typical kernel memory, construct child, non-mem_exclusive cpusets for each individual job.
such as requests from interrupt handlers, is allowed to be taken Only a small amount of typical kernel memory, such as requests from
outside even a mem_exclusive cpuset. interrupt handlers, is allowed to be taken outside even a
mem_exclusive cpuset.
1.5 What is memory_pressure ? 1.5 What is memory_pressure ?
@ -707,7 +709,7 @@ Now you want to do something with this cpuset.
In this directory you can find several files: In this directory you can find several files:
# ls # ls
cpus cpu_exclusive mems mem_exclusive tasks cpus cpu_exclusive mems mem_exclusive mem_hardwall tasks
Reading them will give you information about the state of this cpuset: Reading them will give you information about the state of this cpuset:
the CPUs and Memory Nodes it can use, the processes that are using the CPUs and Memory Nodes it can use, the processes that are using

View File

@ -127,6 +127,7 @@ struct cpuset_hotplug_scanner {
typedef enum { typedef enum {
CS_CPU_EXCLUSIVE, CS_CPU_EXCLUSIVE,
CS_MEM_EXCLUSIVE, CS_MEM_EXCLUSIVE,
CS_MEM_HARDWALL,
CS_MEMORY_MIGRATE, CS_MEMORY_MIGRATE,
CS_SCHED_LOAD_BALANCE, CS_SCHED_LOAD_BALANCE,
CS_SPREAD_PAGE, CS_SPREAD_PAGE,
@ -144,6 +145,11 @@ static inline int is_mem_exclusive(const struct cpuset *cs)
return test_bit(CS_MEM_EXCLUSIVE, &cs->flags); return test_bit(CS_MEM_EXCLUSIVE, &cs->flags);
} }
static inline int is_mem_hardwall(const struct cpuset *cs)
{
return test_bit(CS_MEM_HARDWALL, &cs->flags);
}
static inline int is_sched_load_balance(const struct cpuset *cs) static inline int is_sched_load_balance(const struct cpuset *cs)
{ {
return test_bit(CS_SCHED_LOAD_BALANCE, &cs->flags); return test_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
@ -1042,12 +1048,9 @@ static int update_relax_domain_level(struct cpuset *cs, char *buf)
/* /*
* update_flag - read a 0 or a 1 in a file and update associated flag * update_flag - read a 0 or a 1 in a file and update associated flag
* bit: the bit to update (CS_CPU_EXCLUSIVE, CS_MEM_EXCLUSIVE, * bit: the bit to update (see cpuset_flagbits_t)
* CS_SCHED_LOAD_BALANCE, * cs: the cpuset to update
* CS_NOTIFY_ON_RELEASE, CS_MEMORY_MIGRATE, * turning_on: whether the flag is being set or cleared
* CS_SPREAD_PAGE, CS_SPREAD_SLAB)
* cs: the cpuset to update
* buf: the buffer where we read the 0 or 1
* *
* Call with cgroup_mutex held. * Call with cgroup_mutex held.
*/ */
@ -1228,6 +1231,7 @@ typedef enum {
FILE_MEMLIST, FILE_MEMLIST,
FILE_CPU_EXCLUSIVE, FILE_CPU_EXCLUSIVE,
FILE_MEM_EXCLUSIVE, FILE_MEM_EXCLUSIVE,
FILE_MEM_HARDWALL,
FILE_SCHED_LOAD_BALANCE, FILE_SCHED_LOAD_BALANCE,
FILE_SCHED_RELAX_DOMAIN_LEVEL, FILE_SCHED_RELAX_DOMAIN_LEVEL,
FILE_MEMORY_PRESSURE_ENABLED, FILE_MEMORY_PRESSURE_ENABLED,
@ -1313,6 +1317,9 @@ static int cpuset_write_u64(struct cgroup *cgrp, struct cftype *cft, u64 val)
case FILE_MEM_EXCLUSIVE: case FILE_MEM_EXCLUSIVE:
retval = update_flag(CS_MEM_EXCLUSIVE, cs, val); retval = update_flag(CS_MEM_EXCLUSIVE, cs, val);
break; break;
case FILE_MEM_HARDWALL:
retval = update_flag(CS_MEM_HARDWALL, cs, val);
break;
case FILE_SCHED_LOAD_BALANCE: case FILE_SCHED_LOAD_BALANCE:
retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, val); retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, val);
break; break;
@ -1423,6 +1430,8 @@ static u64 cpuset_read_u64(struct cgroup *cont, struct cftype *cft)
return is_cpu_exclusive(cs); return is_cpu_exclusive(cs);
case FILE_MEM_EXCLUSIVE: case FILE_MEM_EXCLUSIVE:
return is_mem_exclusive(cs); return is_mem_exclusive(cs);
case FILE_MEM_HARDWALL:
return is_mem_hardwall(cs);
case FILE_SCHED_LOAD_BALANCE: case FILE_SCHED_LOAD_BALANCE:
return is_sched_load_balance(cs); return is_sched_load_balance(cs);
case FILE_MEMORY_MIGRATE: case FILE_MEMORY_MIGRATE:
@ -1474,6 +1483,13 @@ static struct cftype files[] = {
.private = FILE_MEM_EXCLUSIVE, .private = FILE_MEM_EXCLUSIVE,
}, },
{
.name = "mem_hardwall",
.read_u64 = cpuset_read_u64,
.write_u64 = cpuset_write_u64,
.private = FILE_MEM_HARDWALL,
},
{ {
.name = "sched_load_balance", .name = "sched_load_balance",
.read_u64 = cpuset_read_u64, .read_u64 = cpuset_read_u64,
@ -1963,14 +1979,14 @@ int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask)
} }
/* /*
* nearest_exclusive_ancestor() - Returns the nearest mem_exclusive * nearest_hardwall_ancestor() - Returns the nearest mem_exclusive or
* ancestor to the specified cpuset. Call holding callback_mutex. * mem_hardwall ancestor to the specified cpuset. Call holding
* If no ancestor is mem_exclusive (an unusual configuration), then * callback_mutex. If no ancestor is mem_exclusive or mem_hardwall
* returns the root cpuset. * (an unusual configuration), then returns the root cpuset.
*/ */
static const struct cpuset *nearest_exclusive_ancestor(const struct cpuset *cs) static const struct cpuset *nearest_hardwall_ancestor(const struct cpuset *cs)
{ {
while (!is_mem_exclusive(cs) && cs->parent) while (!(is_mem_exclusive(cs) || is_mem_hardwall(cs)) && cs->parent)
cs = cs->parent; cs = cs->parent;
return cs; return cs;
} }
@ -1984,7 +2000,7 @@ static const struct cpuset *nearest_exclusive_ancestor(const struct cpuset *cs)
* __GFP_THISNODE is set, yes, we can always allocate. If zone * __GFP_THISNODE is set, yes, we can always allocate. If zone
* z's node is in our tasks mems_allowed, yes. If it's not a * z's node is in our tasks mems_allowed, yes. If it's not a
* __GFP_HARDWALL request and this zone's nodes is in the nearest * __GFP_HARDWALL request and this zone's nodes is in the nearest
* mem_exclusive cpuset ancestor to this tasks cpuset, yes. * hardwalled cpuset ancestor to this tasks cpuset, yes.
* If the task has been OOM killed and has access to memory reserves * If the task has been OOM killed and has access to memory reserves
* as specified by the TIF_MEMDIE flag, yes. * as specified by the TIF_MEMDIE flag, yes.
* Otherwise, no. * Otherwise, no.
@ -2007,7 +2023,7 @@ static const struct cpuset *nearest_exclusive_ancestor(const struct cpuset *cs)
* and do not allow allocations outside the current tasks cpuset * and do not allow allocations outside the current tasks cpuset
* unless the task has been OOM killed as is marked TIF_MEMDIE. * unless the task has been OOM killed as is marked TIF_MEMDIE.
* GFP_KERNEL allocations are not so marked, so can escape to the * GFP_KERNEL allocations are not so marked, so can escape to the
* nearest enclosing mem_exclusive ancestor cpuset. * nearest enclosing hardwalled ancestor cpuset.
* *
* Scanning up parent cpusets requires callback_mutex. The * Scanning up parent cpusets requires callback_mutex. The
* __alloc_pages() routine only calls here with __GFP_HARDWALL bit * __alloc_pages() routine only calls here with __GFP_HARDWALL bit
@ -2030,7 +2046,7 @@ static const struct cpuset *nearest_exclusive_ancestor(const struct cpuset *cs)
* in_interrupt - any node ok (current task context irrelevant) * in_interrupt - any node ok (current task context irrelevant)
* GFP_ATOMIC - any node ok * GFP_ATOMIC - any node ok
* TIF_MEMDIE - any node ok * TIF_MEMDIE - any node ok
* GFP_KERNEL - any node in enclosing mem_exclusive cpuset ok * GFP_KERNEL - any node in enclosing hardwalled cpuset ok
* GFP_USER - only nodes in current tasks mems allowed ok. * GFP_USER - only nodes in current tasks mems allowed ok.
* *
* Rule: * Rule:
@ -2067,7 +2083,7 @@ int __cpuset_zone_allowed_softwall(struct zone *z, gfp_t gfp_mask)
mutex_lock(&callback_mutex); mutex_lock(&callback_mutex);
task_lock(current); task_lock(current);
cs = nearest_exclusive_ancestor(task_cs(current)); cs = nearest_hardwall_ancestor(task_cs(current));
task_unlock(current); task_unlock(current);
allowed = node_isset(node, cs->mems_allowed); allowed = node_isset(node, cs->mems_allowed);