linux/kernel
Paul Mackerras fa28237cfc [POWERPC] Provide a way to protect 4k subpages when using 64k pages
Using 64k pages on 64-bit PowerPC systems makes life difficult for
emulators that are trying to emulate an ISA, such as x86, which use a
smaller page size, since the emulator can no longer use the MMU and
the normal system calls for controlling page protections.  Of course,
the emulator can emulate the MMU by checking and possibly remapping
the address for each memory access in software, but that is pretty
slow.

This provides a facility for such programs to control the access
permissions on individual 4k sub-pages of 64k pages.  The idea is
that the emulator supplies an array of protection masks to apply to a
specified range of virtual addresses.  These masks are applied at the
level where hardware PTEs are inserted into the hardware page table
based on the Linux PTEs, so the Linux PTEs are not affected.  Note
that this new mechanism does not allow any access that would otherwise
be prohibited; it can only prohibit accesses that would otherwise be
allowed.  This new facility is only available on 64-bit PowerPC and
only when the kernel is configured for 64k pages.

The masks are supplied using a new subpage_prot system call, which
takes a starting virtual address and length, and a pointer to an array
of protection masks in memory.  The array has a 32-bit word per 64k
page to be protected; each 32-bit word consists of 16 2-bit fields,
for which 0 allows any access (that is otherwise allowed), 1 prevents
write accesses, and 2 or 3 prevent any access.

Implicit in this is that the regions of the address space that are
protected are switched to use 4k hardware pages rather than 64k
hardware pages (on machines with hardware 64k page support).  In fact
the whole process is switched to use 4k hardware pages when the
subpage_prot system call is used, but this could be improved in future
to switch only the affected segments.

The subpage protection bits are stored in a 3 level tree akin to the
page table tree.  The top level of this tree is stored in a structure
that is appended to the top level of the page table tree, i.e., the
pgd array.  Since it will often only be 32-bit addresses (below 4GB)
that are protected, the pointers to the first four bottom level pages
are also stored in this structure (each bottom level page contains the
protection bits for 1GB of address space), so the protection bits for
addresses below 4GB can be accessed with one fewer loads than those
for higher addresses.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-24 10:06:01 +11:00
..
irq genirq: revert lazy irq disable for simple irqs 2007-12-18 18:05:58 +01:00
power hibernate: fix lockdep report 2007-11-14 18:45:43 -08:00
time clockevents: fix reprogramming decision in oneshot broadcast 2007-12-18 18:05:58 +01:00
.gitignore
acct.c sched: fix kernel/acct.c comment 2007-11-26 21:21:49 +01:00
audit_tree.c [PATCH] audit: watching subtrees 2007-10-21 02:37:45 -04:00
audit.c [PATCH] audit: watching subtrees 2007-10-21 02:37:45 -04:00
audit.h [PATCH] audit: watching subtrees 2007-10-21 02:37:45 -04:00
auditfilter.c [PATCH] audit: watching subtrees 2007-10-21 02:37:45 -04:00
auditsc.c auditsc: fix kernel-doc param warnings 2007-10-22 19:40:02 -07:00
capability.c Uninline find_pid etc set of functions 2007-10-19 11:53:41 -07:00
cgroup_debug.c Task Control Groups: simple task cgroup debug info subsystem 2007-10-19 11:53:36 -07:00
cgroup.c Improve cgroup printks 2007-11-14 18:45:37 -08:00
compat.c Merge ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt 2007-10-18 15:12:41 -07:00
configs.c
cpu.c CPU HOTPLUG: avoid hotadd when proper possible_map isn't specified 2007-10-19 11:53:44 -07:00
cpuset.c hotplug cpu: migrate a task within its cpuset 2007-10-19 11:53:44 -07:00
delayacct.c Add scaled time to taskstats based process accounting 2007-10-18 14:37:28 -07:00
dma.c whitespace fixes: DMA channel allocator 2007-10-18 14:37:24 -07:00
exec_domain.c whitespace fixes: execution domains 2007-10-18 14:37:26 -07:00
exit.c wait_task_stopped(): pass correct exit_code to wait_noreap_copyout() 2007-11-29 09:24:55 -08:00
extable.c
fork.c fix clone(CLONE_NEWPID) 2007-12-05 09:21:18 -08:00
futex_compat.c [FUTEX] Fix address computation in compat code. 2007-11-09 16:13:08 -08:00
futex.c futex: correctly return -EFAULT not -EINVAL 2007-12-05 15:46:09 +01:00
hrtimer.c hrtimers: avoid overflow for large relative timeouts 2007-12-07 19:16:17 +01:00
itimer.c whitespace fixes: interval timers 2007-10-18 14:37:26 -07:00
kallsyms.c FRV: fix the extern declaration of kallsyms_num_syms 2007-11-29 09:24:54 -08:00
Kconfig.hz
Kconfig.instrumentation Tiny clean-up of OPROFILE/KPROBES configuration 2007-12-06 09:41:12 -08:00
Kconfig.preempt Move PREEMPT_NOTIFIERS into an always-included Kconfig 2007-10-17 08:42:55 -07:00
kexec.c Extended crashkernel command line 2007-10-19 11:53:49 -07:00
kfifo.c
kmod.c Restore call_usermodehelper_pipe() behaviour 2007-09-11 17:21:20 -07:00
kprobes.c kprobes: support kretprobe blacklist 2007-10-16 09:43:10 -07:00
ksysfs.c add-vmcore: cleanup the coding style according to Andrew's comments 2007-10-17 08:42:54 -07:00
kthread.c kthread: silence bogus section mismatch warning 2007-07-31 15:39:42 -07:00
latency.c
lockdep_internals.h
lockdep_proc.c lockdep: Avoid /proc/lockdep & lock_stat infinite output 2007-10-11 22:11:11 +02:00
lockdep.c lockdep: make cli/sti annotation warnings clearer 2007-12-07 19:02:47 +01:00
Makefile revert "Task Control Groups: example CPU accounting subsystem" 2007-11-14 18:45:40 -08:00
marker.c Linux Kernel Markers: fix marker mutex not taken upon module load 2007-11-14 18:45:40 -08:00
module.c module: fix and elaborate comments 2007-11-19 11:20:43 +11:00
mutex-debug.c
mutex-debug.h
mutex.c lockdep: fixup mutex annotations 2007-10-11 22:11:12 +02:00
mutex.h
notifier.c Add kernel/notifier.c 2007-10-19 11:53:34 -07:00
ns_cgroup.c cgroups: implement namespace tracking subsystem 2007-10-19 11:53:37 -07:00
nsproxy.c pid namespaces: allow cloning of new namespace 2007-10-19 11:53:39 -07:00
panic.c debug: add end-of-oops marker 2007-12-20 15:01:17 +01:00
params.c fix param_sysfs_builtin name length check 2007-11-14 18:45:42 -08:00
pid.c pidns: Place under CONFIG_EXPERIMENTAL 2007-11-14 18:45:43 -08:00
posix-cpu-timers.c Isolate some explicit usage of task->tgid 2007-10-19 11:53:40 -07:00
posix-timers.c Isolate some explicit usage of task->tgid 2007-10-19 11:53:40 -07:00
printk.c serial: turn serial console suspend a boot rather than compile time option 2007-10-18 14:37:19 -07:00
profile.c sched: document profile=sleep requiring CONFIG_SCHEDSTATS 2007-10-24 18:23:50 +02:00
ptrace.c Isolate some explicit usage of task->tgid 2007-10-19 11:53:40 -07:00
rcupdate.c Clean up duplicate includes in kernel/ 2007-10-17 08:42:48 -07:00
rcutorture.c Make rcutorture RNG use temporal entropy 2007-10-17 08:42:53 -07:00
relay.c whitespace fixes: relayfs 2007-10-18 14:37:24 -07:00
resource.c Add IORESOUCE_BUSY flag for System RAM 2007-11-14 18:45:39 -08:00
rtmutex_common.h
rtmutex-debug.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
rtmutex.h
rwsem.c sched: mark rwsem functions as __sched for wchan/profiling 2007-12-18 15:21:13 +01:00
sched_debug.c sched: clean up overlong line in kernel/sched_debug.c 2007-11-28 15:52:56 +01:00
sched_fair.c sched: do not hurt SCHED_BATCH on wakeup 2007-12-18 15:21:13 +01:00
sched_idletask.c sched: isolate SMP balancing code a bit more 2007-10-24 18:23:51 +02:00
sched_rt.c sched: rt: account the cpu time during the tick 2007-12-20 15:01:17 +01:00
sched_stats.h sched: clean up kernel/sched_stat.h 2007-11-28 15:52:56 +01:00
sched.c sched: touch softlockup watchdog after idling 2007-12-18 15:21:13 +01:00
seccomp.c
signal.c sigwait eats blocked default-ignore signals 2007-11-12 16:05:23 -08:00
softirq.c [KERNEL]: Unexport raise_softirq_irqoff 2007-10-10 16:49:18 -07:00
softlockup.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00
spinlock.c
srcu.c
stacktrace.c
stop_machine.c
sys_ni.c [POWERPC] Provide a way to protect 4k subpages when using 64k pages 2008-01-24 10:06:01 +11:00
sys.c x86: ignore the sys_getcpu() tcache parameter 2007-11-17 16:27:00 +01:00
sysctl_check.c sysctl: fix ax25 checks 2007-12-17 19:28:17 -08:00
sysctl.c sched: sysctl, proc_dointvec_minmax() expects int values for 2007-12-18 15:21:13 +01:00
taskstats.c kernel/taskstats.c: fix bogus nlmsg_free() 2007-11-14 18:45:44 -08:00
time.c whitespace fixes: time syscalls 2007-10-18 14:37:24 -07:00
timer.c timer: kernel/timer.c section fixes 2007-12-18 18:05:58 +01:00
tsacct.c Add scaled time to taskstats based process accounting 2007-10-18 14:37:28 -07:00
uid16.c
user_namespace.c Fix user namespace exiting OOPs 2007-09-19 11:24:18 -07:00
user.c sched: don't forget to unlock uids_mutex on error paths 2007-11-26 21:21:49 +01:00
utsname_sysctl.c Isolate the UTS namespace's domainname and hostname back 2007-11-29 09:24:53 -08:00
utsname.c Fix UTS corruption during clone(CLONE_NEWUTS) 2007-09-19 11:24:17 -07:00
wait.c
workqueue.c Use helpers to obtain task pid in printks 2007-10-19 11:53:43 -07:00