linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-10 14:11:52 +00:00

Author	SHA1	Message	Date
Oleg Nesterov	e7b2c40692	fput: task_work_add() can fail if the caller has passed exit_task_work() fput() assumes that it can't be called after exit_task_work() but this is not true, for example free_ipc_ns()->shm_destroy() can do this. In this case fput() silently leaks the file. Change it to fallback to delayed_fput_work if task_work_add() fails. The patch looks complicated but it is not, it changes the code from if (PF_KTHREAD) { schedule_work(...); return; } task_work_add(...) to if (!PF_KTHREAD) { if (!task_work_add(...)) return; /* fallback */ } schedule_work(...); As for shm_destroy() in particular, we could make another fix but I think this change makes sense anyway. There could be another similar user, it is not safe to assume that task_work_add() can't fail. Reported-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-06-15 05:39:08 +04:00
Al Viro	dd37978c50	cache the value of file_inode() in struct file Note that this thing does not contribute to inode refcount; it's pinned down by dentry. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-03-01 19:48:30 -05:00
Anatol Pomozov	39b6525274	fs: Preserve error code in get_empty_filp(), part 2 Allocating a file structure in function get_empty_filp() might fail because of several reasons: - not enough memory for file structures - operation is not allowed - user is over its limit Currently the function returns NULL in all cases and we loose the exact reason of the error. All callers of get_empty_filp() assume that the function can fail with ENFILE only. Return error through pointer. Change all callers to preserve this error code. [AV: cleaned up a bit, carved the get_empty_filp() part out into a separate commit (things remaining here deal with alloc_file()), removed pipe(2) behaviour change] Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-22 23:31:32 -05:00
Al Viro	1afc99beaf	propagate error from get_empty_filp() to its callers Based on parts from Anatol's patch (the rest is the next commit). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-22 23:31:32 -05:00
Al Viro	496ad9aa8e	new helper: file_inode(file) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-22 23:31:31 -05:00
Jan Kara	72651cac88	fs: Fix imbalance in freeze protection in mark_files_ro() File descriptors (even those for writing) do not hold freeze protection. Thus mark_files_ro() must call __mnt_drop_write() to only drop protection against remount read-only. Calling mnt_drop_write_file() as we do now results in: [ BUG: bad unlock balance detected! ] 3.7.0-rc6-00028-g88e75b6 #101 Not tainted ------------------------------------- kworker/1:2/79 is trying to release lock (sb_writers) at: [<ffffffff811b33b4>] mnt_drop_write+0x24/0x30 but there are no more locks to release! Reported-by: Zdenek Kabelac <zkabelac@redhat.com> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-12-20 13:57:36 -05:00
Lai Jiangshan	4b2c551f77	lglock: add DEFINE_STATIC_LGLOCK() When the lglock doesn't need to be exported we can use DEFINE_STATIC_LGLOCK(). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-10-10 01:15:44 -04:00
Linus Torvalds	88265322c1	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "Highlights: - Integrity: add local fs integrity verification to detect offline attacks - Integrity: add digital signature verification - Simple stacking of Yama with other LSMs (per LSS discussions) - IBM vTPM support on ppc64 - Add new driver for Infineon I2C TIS TPM - Smack: add rule revocation for subject labels" Fixed conflicts with the user namespace support in kernel/auditsc.c and security/integrity/ima/ima_policy.c. * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (39 commits) Documentation: Update git repository URL for Smack userland tools ima: change flags container data type Smack: setprocattr memory leak fix Smack: implement revoking all rules for a subject label Smack: remove task_wait() hook. ima: audit log hashes ima: generic IMA action flag handling ima: rename ima_must_appraise_or_measure audit: export audit_log_task_info tpm: fix tpm_acpi sparse warning on different address spaces samples/seccomp: fix 31 bit build on s390 ima: digital signature verification support ima: add support for different security.ima data types ima: add ima_inode_setxattr/removexattr function and calls ima: add inode_post_setattr call ima: replace iint spinblock with rwlock/read_lock ima: allocating iint improvements ima: add appraise action keywords and default rules ima: integrity appraisal extension vfs: move ima_file_free before releasing the file ...	2012-10-02 21:38:48 -07:00
Al Viro	0ee8cdfe6a	take fget() and friends to fs/file.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-09-26 21:08:56 -04:00
Mimi Zohar	4199d35cbc	vfs: move ima_file_free before releasing the file ima_file_free(), called on __fput(), currently flags files that have changed, so that the file is re-measured. For appraising a files's integrity, the file's hash must be re-calculated and stored in the 'security.ima' xattr to reflect any changes. This patch moves the ima_file_free() call to before releasing the file in preparation of ima-appraisal measuring the file and updating the 'security.ima' xattr. Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com> Acked-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>	2012-09-07 14:57:27 -04:00
Jan Kara	eb04c28288	fs: Add freezing handling to mnt_want_write() / mnt_drop_write() Most of places where we want freeze protection coincides with the places where we also have remount-ro protection. So make mnt_want_write() and mnt_drop_write() (and their _file alternative) prevent freezing as well. For the few cases that are really interested only in remount-ro protection provide new function variants. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-31 09:40:38 +04:00
Al Viro	5c33b183a3	uninline file_free_rcu() What inline? Its only use is passing its address to call_rcu(), for fuck sake! Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:24:17 +04:00
Al Viro	4a9d4b024a	switch fput to task_work_add ... and schedule_work() for interrupt/kernel_thread callers (and yes, now it is OK to call from interrupt). We are guaranteed that __fput() will be done before we return to userland (or exit). Note that for fput() from a kernel thread we get an async behaviour; it's almost always OK, but sometimes you might need to have __fput() completed before you do anything else. There are two mechanisms for that - a general barrier (flush_delayed_fput()) and explicit __fput_sync(). Both should be used with care (as was the case for fput() from kernel threads all along). See comments in fs/file_table.c for details. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-22 23:57:58 +04:00
Al Viro	85d7d618c1	mark_files_ro(): don't bother with mntget/mntput mnt_drop_write_file() is safe under any lock Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-14 16:35:46 +04:00
Andi Kleen	962830df36	brlocks/lglocks: API cleanups lglocks and brlocks are currently generated with some complicated macros in lglock.h. But there's no reason to not just use common utility functions and put all the data into a common data structure. In preparation, this patch changes the API to look more like normal function calls with pointers, not magic macros. The patch is rather large because I move over all users in one go to keep it bisectable. This impacts the VFS somewhat in terms of lines changed. But no actual behaviour change. [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:41 -04:00
Andi Kleen	eea62f831b	brlocks/lglocks: turn into functions lglocks and brlocks are currently generated with some complicated macros in lglock.h. But there's no reason to not just use common utility functions and put all the data into a common data structure. Since there are at least two users it makes sense to share this code in a library. This is also easier maintainable than a macro forest. This will also make it later possible to dynamically allocate lglocks and also use them in modules (this would both still need some additional, but now straightforward, code) [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:41 -04:00
Al Viro	b57ce9694e	vfs: drop_file_write_access() made static Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-03-20 21:29:32 -04:00
Miklos Szeredi	8e8b87964b	vfs: prevent remount read-only if pending removes If there are any inodes on the super block that have been unlinked (i_nlink == 0) but have not yet been deleted then prevent the remounting the super block read-only. Reported-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-06 23:20:13 -05:00
Arun Sharma	60063497a9	atomic: use <linux/atomic.h> This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:47 -07:00
Linus Torvalds	2e270d8422	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: fix cdev leak on O_PATH final fput()	2011-03-16 13:26:17 -07:00
Miklos Szeredi	60ed8cf78f	fix cdev leak on O_PATH final fput() __fput doesn't need a cdev_put() for O_PATH handles. Signed-off-by: mszeredi@suse.cz Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-16 16:18:39 -04:00
Linus Torvalds	0f6e0e8448	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (33 commits) AppArmor: kill unused macros in lsm.c AppArmor: cleanup generated files correctly KEYS: Add an iovec version of KEYCTL_INSTANTIATE KEYS: Add a new keyctl op to reject a key with a specified error code KEYS: Add a key type op to permit the key description to be vetted KEYS: Add an RCU payload dereference macro AppArmor: Cleanup make file to remove cruft and make it easier to read SELinux: implement the new sb_remount LSM hook LSM: Pass -o remount options to the LSM SELinux: Compute SID for the newly created socket SELinux: Socket retains creator role and MLS attribute SELinux: Auto-generate security_is_socket_class TOMOYO: Fix memory leak upon file open. Revert "selinux: simplify ioctl checking" selinux: drop unused packet flow permissions selinux: Fix packet forwarding checks on postrouting selinux: Fix wrong checks for selinux_policycap_netpeer selinux: Fix check for xfrm selinux context algorithm ima: remove unnecessary call to ima_must_measure IMA: remove IMA imbalance checking ...	2011-03-16 09:15:43 -07:00
Al Viro	326be7b484	Allow passing O_PATH descriptors via SCM_RIGHTS datagrams Just need to make sure that AF_UNIX garbage collector won't confuse O_PATHed socket on filesystem for real AF_UNIX opened socket. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Al Viro	1abf0c718f	New kind of open files - "location only". New flag for open(2) - O_PATH. Semantics: * pathname is resolved, but the file itself is _NOT_ opened as far as filesystem is concerned. * almost all operations on the resulting descriptors shall fail with -EBADF. Exceptions are: 1) operations on descriptors themselves (i.e. close(), dup(), dup2(), dup3(), fcntl(fd, F_DUPFD), fcntl(fd, F_DUPFD_CLOEXEC, ...), fcntl(fd, F_GETFD), fcntl(fd, F_SETFD, ...)) 2) fcntl(fd, F_GETFL), for a common non-destructive way to check if descriptor is open 3) "dfd" arguments of ...at(2) syscalls, i.e. the starting points of pathname resolution * closing such descriptor does NOT affect dnotify or posix locks. * permissions are checked as usual along the way to file; no permission checks are applied to the file itself. Of course, giving such thing to syscall will result in permission checks (at the moment it means checking that starting point of ....at() is a directory and caller has exec permissions on it). fget() and fget_light() return NULL on such descriptors; use of fget_raw() and fget_raw_light() is needed to get them. That protects existing code from dealing with those things. There are two things still missing (they come in the next commits): one is handling of symlinks (right now we refuse to open them that way; see the next commit for semantics related to those) and another is descriptor passing via SCM_RIGHTS datagrams. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
James Morris	1cc26bada9	Merge branch 'master'; commit 'v2.6.38-rc7' into next	2011-03-08 10:55:06 +11:00
Mimi Zohar	890275b5eb	IMA: maintain i_readcount in the VFS layer ima_counts_get() updated the readcount and invalidated the PCR, as necessary. Only update the i_readcount in the VFS layer. Move the PCR invalidation checks to ima_file_check(), where it belongs. Maintaining the i_readcount in the VFS layer, will allow other subsystems to use i_readcount. Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Acked-by: Eric Paris <eparis@redhat.com>	2011-02-10 07:51:44 -05:00
Tetsuo Handa	78d2978874	CRED: Fix kernel panic upon security_file_alloc() failure. In get_empty_filp() since 2.6.29, file_free(f) is called with f->f_cred == NULL when security_file_alloc() returned an error. As a result, kernel will panic() due to put_cred(NULL) call within RCU callback. Fix this bug by assigning f->f_cred before calling security_file_alloc(). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-04 10:40:29 -08:00
Steven Rostedt	3bc0ba4305	fs: Remove unlikely() from fget_light() There's an unlikely() in fget_light() that assumes the file ref count will be 1. Running the annotate branch profiler on a desktop that is performing daily tasks (running firefox, evolution, xchat and is also part of a distcc farm), it shows that the ref count is not 1 that often. correct incorrect % Function File Line ------- --------- - -------- ---- ---- 1035099358 6209599193 85 fget_light file_table.c 315 Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-17 03:26:27 -05:00
Eric Dumazet	518de9b39e	fs: allow for more than 2^31 files Robin Holt tried to boot a 16TB system and found af_unix was overflowing a 32bit value : <quote> We were seeing a failure which prevented boot. The kernel was incapable of creating either a named pipe or unix domain socket. This comes down to a common kernel function called unix_create1() which does: atomic_inc(&unix_nr_socks); if (atomic_read(&unix_nr_socks) > 2 * get_max_files()) goto out; The function get_max_files() is a simple return of files_stat.max_files. files_stat.max_files is a signed integer and is computed in fs/file_table.c's files_init(). n = (mempages * (PAGE_SIZE / 1024)) / 10; files_stat.max_files = n; In our case, mempages (total_ram_pages) is approx 3,758,096,384 (0xe0000000). That leaves max_files at approximately 1,503,238,553. This causes 2 * get_max_files() to integer overflow. </quote> Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long integers, and change af_unix to use an atomic_long_t instead of atomic_t. get_max_files() is changed to return an unsigned long. get_nr_files() is changed to return a long. unix_nr_socks is changed from atomic_t to atomic_long_t, while not strictly needed to address Robin problem. Before patch (on a 64bit kernel) : # echo 2147483648 >/proc/sys/fs/file-max # cat /proc/sys/fs/file-max -18446744071562067968 After patch: # echo 2147483648 >/proc/sys/fs/file-max # cat /proc/sys/fs/file-max 2147483648 # cat /proc/sys/fs/file-nr 704 0 2147483648 Reported-by: Robin Holt <holt@sgi.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David Miller <davem@davemloft.net> Reviewed-by: Robin Holt <holt@sgi.com> Tested-by: Robin Holt <holt@sgi.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:15 -07:00
Nick Piggin	6416ccb789	fs: scale files_lock fs: scale files_lock Improve scalability of files_lock by adding per-cpu, per-sb files lists, protected with an lglock. The lglock provides fast access to the per-cpu lists to add and remove files. It also provides a snapshot of all the per-cpu lists (although this is very slow). One difficulty with this approach is that a file can be removed from the list by another CPU. We must track which per-cpu list the file is on with a new variale in the file struct (packed into a hole on 64-bit archs). Scalability could suffer if files are frequently removed from different cpu's list. However loads with frequent removal of files imply short interval between adding and removing the files, and the scheduler attempts to avoid moving processes too far away. Also, even in the case of cross-CPU removal, the hardware has much more opportunity to parallelise cacheline transfers with N cachelines than with 1. A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs degenerates to contending on a single lock, which is no worse than before. When more than one CPU are allocating files, even if they are always freed by different CPUs, there will be more parallelism than the single-lock case. Testing results: On a 2 socket, 8 core opteron, I measure the number of times the lock is taken to remove the file, the number of times it is removed by the same CPU that added it, and the number of times it is removed by the same node that added it. Booting: locks= 25049 cpu-hits= 23174 (92.5%) node-hits= 23945 (95.6%) kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%) dbench 64 locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%) So a file is removed from the same CPU it was added by over 90% of the time. It remains within the same node 95% of the time. Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile. throughput 2.6.34-rc2 24.5 +patch 24.9 us sys idle IO wait (in %) 2.6.34-rc2 51.25 28.25 17.25 3.25 +patch 53.75 18.5 19 8.75 So significantly less CPU time spent in kernel code, higher idle time and slightly higher throughput. Single threaded performance difference was within the noise of microbenchmarks. That is not to say penalty does not exist, the code is larger and more memory accesses required so it will be slightly slower. Cc: linux-kernel@vger.kernel.org Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-18 08:35:48 -04:00
Nick Piggin	ee2ffa0dfd	fs: cleanup files_lock locking fs: cleanup files_lock locking Lock tty_files with a new spinlock, tty_files_lock; provide helpers to manipulate the per-sb files list; unexport the files_lock spinlock. Cc: linux-kernel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-18 08:35:47 -04:00
Linus Torvalds	2069601b3f	Revert "fsnotify: store struct file not struct path" This reverts commit `3bcf3860a4` (and the accompanying commit `c1e5c95402` "vfs/fsnotify: fsnotify_close can delay the final work in fput" that was a horribly ugly hack to make it work at all). The 'struct file' approach not only causes that disgusting hack, it somehow breaks pulseaudio, probably due to some other subtlety with f_count handling. Fix up various conflicts due to later fsnotify work. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-12 14:23:04 -07:00
Tony Battersby	58939473ba	vfs: improve comment describing fget_light() Improve the description of fget_light(), which is currently incorrect about needing a prior refcnt (judging by the way it is actually used). Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-11 08:59:02 -07:00
Eric Paris	c1e5c95402	vfs/fsnotify: fsnotify_close can delay the final work in fput fanotify almost works like so: user context calls fsnotify_* function with a struct file. fsnotify takes a reference on the struct path user context goes about it's buissiness at some later point in time the fsnotify listener gets the struct path fanotify listener calls dentry_open() to create a file which userspace can deal with listener drops the reference on the struct path at some later point the listener calls close() on it's new file With the switch from struct path to struct file this presents a problem for fput() and fsnotify_close(). fsnotify_close() is called when the filp has already reached 0 and __fput() wants to do it's cleanup. The solution presented here is a bit odd. If an event is created from a struct file we take a reference on the file. We check however if the f_count was already 0 and if so we take an EXTRA reference EVEN THOUGH IT WAS ZERO. In __fput() (where we know the f_count hit 0 once) we check if the f_count is non-zero and if so we drop that 'extra' ref and return without destroying the file. Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 10:18:51 -04:00
Al Viro	d7065da038	get rid of the magic around f_count in aio __aio_put_req() plays sick games with file refcount. What it wants is fput() from atomic context; it's almost always done with f_count > 1, so they only have to deal with delayed work in rare cases when their reference happens to be the last one. Current code decrements f_count and if it hasn't hit 0, everything is fine. Otherwise it keeps a pointer to struct file (with zero f_count!) around and has delayed work do __fput() on it. Better way to do it: use atomic_long_add_unless( , -1, 1) instead of !atomic_long_dec_and_test(). IOW, decrement it only if it's not the last reference, leave refcount alone if it was. And use normal fput() in delayed work. I've made that atomic_long_add_unless call a new helper - fput_atomic(). Drops a reference to file if it's safe to do in atomic (i.e. if that's not the last one), tells if it had been able to do that. aio.c converted to it, __fput() use is gone. req->ki_file always contributes to refcount now. And __fput() became static. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-27 22:03:07 -04:00
Wu Fengguang	42e4960868	vfs: take f_lock on modifying f_mode after open time We'll introduce FMODE_RANDOM which will be runtime modified. So protect all runtime modification to f_mode with f_lock to avoid races. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@kernel.org> [2.6.33.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-03-06 11:26:25 -08:00
Al Viro	89068c576b	Take ima_file_free() to proper place. Hooks: Just Say No. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-02-07 03:07:29 -05:00
Roland Dreier	385e3ed4f0	alloc_file(): simplify handling of mnt_clone_write() errors When alloc_file() and init_file() were combined, the error handling of mnt_clone_write() was taken into alloc_file() in a somewhat obfuscated way. Since we don't use the error code for anything except warning, we might as well warn directly without an extra variable. Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-12-22 12:27:33 -05:00
Roland Dreier	73efc4681c	re-export alloc_file() Commit `3d1e4631` ("get rid of init_file()") removed the export of alloc_file() -- possibly inadvertently, since that commit mainly consisted of deleting the lines between the end of alloc_file() and the start of the code in init_file(). There is in fact one modular use of alloc_file() in the tree, in drivers/infiniband/core/uverbs_main.c, so re-add the export to fix: ERROR: "alloc_file" [drivers/infiniband/core/ib_uverbs.ko] undefined! when CONFIG_INFINIBAND_USER_ACCESS=m. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-16 13:29:19 -08:00
Al Viro	0552f879d4	Untangling ima mess, part 1: alloc_file() There are 2 groups of alloc_file() callers: * ones that are followed by ima_counts_get * ones giving non-regular files So let's pull that ima_counts_get() into alloc_file(); it's a no-op in case of non-regular files. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-12-16 12:16:47 -05:00
Eric Paris	e81e3f4dca	fs: move get_empty_filp() deffinition to internal.h All users outside of fs/ of get_empty_filp() have been removed. This patch moves the definition from the include/ directory to internal.h so no new users crop up and removes the EXPORT_SYMBOL. I'd love to see open intents stop using it too, but that's a problem for another day and a smarter developer! Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-12-16 12:16:45 -05:00
Al Viro	2c48b9c455	switch alloc_file() to passing struct path ... and have the caller grab both mnt and dentry; kill leak in infiniband, while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-12-16 12:16:42 -05:00
Al Viro	3d1e463158	get rid of init_file() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-12-16 12:16:42 -05:00
Al Viro	732741274d	unexport get_empty_filp() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-12-16 12:16:41 -05:00
Mimi Zohar	6c21a7fb49	LSM: imbed ima calls in the security hooks Based on discussions on LKML and LSM, where there are consecutive security_ and ima_ calls in the vfs layer, move the ima_ calls to the existing security_ hooks. Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-10-25 12:22:48 +08:00
Alexey Dobriyan	8d65af789f	sysctl: remove "struct file *" argument of ->proc_handler It's unused. It isn't needed -- read or write flag is already passed and sysctl shouldn't care about the rest. It _was_ used in two places at arch/frv for some reason. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-24 07:21:04 -07:00
npiggin@suse.de	864d7c4c06	fs: move mark_files_ro into file_table.c This function walks the s_files lock, and operates primarily on the files in a superblock, so it better belongs here (eg. see also fs_may_remount_ro). [AV: ... and it shouldn't be static after that move] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:02 -04:00
npiggin@suse.de	96029c4e09	fs: introduce mnt_clone_write This patch speeds up lmbench lat_mmap test by about another 2% after the first patch. Before: avg = 462.286 std = 5.46106 After: avg = 453.12 std = 9.58257 (50 runs of each, stddev gives a reasonable confidence) It does this by introducing mnt_clone_write, which avoids some heavyweight operations of mnt_want_write if called on a vfsmount which we know already has a write count; and mnt_want_write_file, which can call mnt_clone_write if the file is open for write. After these two patches, mnt_want_write and mnt_drop_write go from 7% on the profile down to 1.3% (including mnt_clone_write). [AV: mnt_want_write_file() should take file alone and derive mnt from it; not only all callers have that form, but that's the only mnt about which we know that it's already held for write if file is opened for write] Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:02 -04:00
Tero Roponen	a4e49cb69e	trivial: remove unused variable 'path' in alloc_file() 'struct path' is not used in alloc_file(). Signed-off-by: Tero Roponen <tero.roponen@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-03-30 15:22:03 +02:00
Linus Torvalds	8e9d208972	Merge branch 'bkl-removal' of git://git.lwn.net/linux-2.6 * 'bkl-removal' of git://git.lwn.net/linux-2.6: Rationalize fasync return values Move FASYNC bit handling to f_op->fasync() Use f_lock to protect f_flags Rename struct file->f_ep_lock	2009-03-26 16:14:02 -07:00
Jonathan Corbet	6849991490	Rename struct file->f_ep_lock This lock moves out of the CONFIG_EPOLL ifdef and becomes f_lock. For now, epoll remains the only user, but a future patch will use it to protect f_flags as well. Cc: Davide Libenzi <davidel@xmailserver.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2009-03-16 08:32:27 -06:00
James Morris	cb5629b10d	Merge branch 'master' into next Conflicts: fs/namei.c Manually merged per: diff --cc fs/namei.c index 734f2b5,bbc15c2..0000000 --- a/fs/namei.c +++ b/fs/namei.c @@@ -860,9 -848,8 +849,10 @@@ static int __link_path_walk(const char nd->flags \|= LOOKUP_CONTINUE; err = exec_permission_lite(inode); if (err == -EAGAIN) - err = vfs_permission(nd, MAY_EXEC); + err = inode_permission(nd->path.dentry->d_inode, + MAY_EXEC); + if (!err) + err = ima_path_check(&nd->path, MAY_EXEC); if (err) break; @@@ -1525,14 -1506,9 +1509,14 @@@ int may_open(struct path path, int acc flag &= ~O_TRUNC; } - error = vfs_permission(nd, acc_mode); + error = inode_permission(inode, acc_mode); if (error) return error; + - error = ima_path_check(&nd->path, ++ error = ima_path_check(path, + acc_mode & (MAY_READ \| MAY_WRITE \| MAY_EXEC)); + if (error) + return error; / * An append-only file must be opened in append mode for writing. */ Signed-off-by: James Morris <jmorris@namei.org>	2009-02-06 11:01:45 +11:00
Mimi Zohar	6146f0d5e4	integrity: IMA hooks This patch replaces the generic integrity hooks, for which IMA registered itself, with IMA integrity hooks in the appropriate places directly in the fs directory. Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2009-02-06 09:05:30 +11:00
Eric Dumazet	b6b3fdead2	filp_cachep can be static in fs/file_table.c Instead of creating the "filp" kmem_cache in vfs_caches_init(), we can do it a litle be later in files_init(), so that filp_cachep is static to fs/file_table.c Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-12-31 18:07:42 -05:00
David Howells	d76b0d9b2d	CRED: Use creds in file structs Attach creds to file structs and discard f_uid/f_gid. file_operations::open() methods (such as hppfs_open()) should use file->f_cred rather than current_cred(). At the moment file->f_cred will be current_cred() at this point. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:25 +11:00
David Howells	86a264abe5	CRED: Wrap current->cred and a few other accessors Wrap current->cred and a few other accessors to hide their actual implementation. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:18 +11:00
David Howells	b6dff3ec5e	CRED: Separate task security context from task_struct Separate the task security context from task_struct. At this point, the security data is temporarily embedded in the task_struct with two pointers pointing to it. Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in entry.S via asm-offsets. With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:16 +11:00
Al Viro	233e70f422	saner FASYNC handling on file close As it is, all instances of ->release() for files that have ->fasync() need to remember to evict file from fasync lists; forgetting that creates a hole and we actually have a bunch that does forget. So let's keep our lives simple - let __fput() check FASYNC in file->f_flags and call ->fasync() there if it's been set. And lose that crap in ->release() instances - leaving it there is still valid, but we don't have to bother anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-11-01 09:49:46 -07:00
Al Viro	aeb5d72706	[PATCH] introduce fmode_t, do annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:06 -04:00
Al Viro	516e0cc564	[PATCH] f_count may wrap around make it atomic_long_t; while we are at it, get rid of useless checks in affs, hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:40 -04:00
Al Viro	9f3acc3140	[PATCH] split linux/file.h Initial splitoff of the low-level stuff; taken to fdtable.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-05-01 13:08:16 -04:00
Dave Hansen	ad775f5a8f	[PATCH] r/o bind mounts: debugging for missed calls There have been a few oopses caused by 'struct file's with NULL f_vfsmnts. There was also a set of potentially missed mnt_want_write()s from dentry_open() calls. This patch provides a very simple debugging framework to catch these kinds of bugs. It will WARN_ON() them, but should stop us from having any oopses or mnt_writer count imbalances. I'm quite convinced that this is a good thing because it found bugs in the stuff I was working on as soon as I wrote it. [hch: made it conditional on a debug option. But it's still a little bit too ugly] [hch: merged forced remount r/o fix from Dave and akpm's fix for the fix] Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-19 00:29:28 -04:00
Dave Hansen	4a3fd211cc	[PATCH] r/o bind mounts: elevate write count for open()s This is the first really tricky patch in the series. It elevates the writer count on a mount each time a non-special file is opened for write. We used to do this in may_open(), but Miklos pointed out that __dentry_open() is used as well to create filps. This will cover even those cases, while a call in may_open() would not have. There is also an elevated count around the vfs_create() call in open_namei(). See the comments for more details, but we need this to fix a 'create, remount, fail r/w open()' race. Some filesystems forego the use of normal vfs calls to create struct files. Make sure that these users elevate the mnt writer count because they will get __fput(), and we need to make sure they're balanced. Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-19 00:29:25 -04:00
Dave Hansen	aceaf78da9	[PATCH] r/o bind mounts: create helper to drop file write access If someone decides to demote a file from r/w to just r/o, they can use this same code as __fput(). NFS does just that, and will use this in the next patch. AV: drop write access in __fput() only after we evict from file list. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Cc: Erez Zadok <ezk@cs.sunysb.edu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J Bruce Fields" <bfields@fieldses.org> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-04-19 00:25:32 -04:00
Dave Hansen	430e285e08	[PATCH] fix up new filp allocators Some new uses of get_empty_filp() have crept in; switched to alloc_file() to make sure that pieces of initialization won't be missing. We really need to kill get_empty_filp(). [AV] fixed dentry leak on failure exit in anon_inode_getfd() Cc: Erez Zadok <ezk@cs.sunysb.edu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J Bruce Fields" <bfields@fieldses.org> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-19 06:54:05 -04:00
Harvey Harrison	fc9b52cd8f	fs: remove fastcall, it is always empty [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:31 -08:00
Matthias Kaehlcke	cfdaf9e5f9	fs/file_table.c: use list_for_each_entry() instead of list_for_each() fs/file_table.c: use list_for_each_entry() instead of list_for_each() in fs_may_remount_ro() Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:38 -07:00
Dave Hansen	ce8d2cdf3d	r/o bind mounts: filesystem helpers for custom 'struct file's Why do we need r/o bind mounts? This feature allows a read-only view into a read-write filesystem. In the process of doing that, it also provides infrastructure for keeping track of the number of writers to any given mount. This has a number of uses. It allows chroots to have parts of filesystems writable. It will be useful for containers in the future because users may have root inside a container, but should not be allowed to write to somefilesystems. This also replaces patches that vserver has had out of the tree for several years. It allows security enhancement by making sure that parts of your filesystem read-only (such as when you don't trust your FTP server), when you don't want to have entire new filesystems mounted, or when you want atime selectively updated. I've been using the following script to test that the feature is working as desired. It takes a directory and makes a regular bind and a r/o bind mount of it. It then performs some normal filesystem operations on the three directories, including ones that are expected to fail, like creating a file on the r/o mount. This patch: Some filesystems forego the vfs and may_open() and create their own 'struct file's. This patch creates a couple of helper functions which can be used by these filesystems, and will provide a unified place which the r/o bind mount code may patch. Also, rename an existing, static-scope init_file() to a less generic name. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:04 -07:00
Denis Cheng	4975e45ff6	fs: use kmem_cache_zalloc instead Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:48 -07:00
Peter Zijlstra	52d9f3b409	lib: percpu_counter_sum_positive s/percpu_counter_sum/&_positive/ Because its consitent with percpu_counter_read* Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:44 -07:00
Randy Dunlap	e63340ae6b	header cleaning: don't include smp_lock.h when not used Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:07 -07:00
Josef "Jeff" Sipek	0f7fc9e4d0	[PATCH] VFS: change struct file to use struct path This patch changes struct file to use struct path instead of having independent pointers to struct dentry and struct vfsmount, and converts all users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}. Additionally, it adds two #define's to make the transition easier for users of the f_dentry and f_vfsmnt. Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-08 08:28:41 -08:00
Eric W. Biederman	609d7fa956	[PATCH] file: modify struct fown_struct to use a struct pid File handles can be requested to send sigio and sigurg to processes. By tracking the destination processes using struct pid instead of pid_t we make the interface safe from all potential pid wrap around problems. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:14 -07:00
Theodore Ts'o	577c4eb09d	[PATCH] inode-diet: Move i_cdev into a union Move the i_cdev pointer in struct inode into a union. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-27 08:26:17 -07:00
Jörn Engel	6ab3d5624e	Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-30 19:25:36 +02:00
Mingming Cao	0216bfcffe	[PATCH] percpu counter data type changes to suppport more than 2**31 ext3 free blocks counter The percpu counter data type are changed in this set of patches to support more users like ext3 who need more than 32 bit to store the free blocks total in the filesystem. - Generic perpcu counters data type changes. The size of the global counter and local counter were explictly specified using s64 and s32. The global counter is changed from long to s64, while the local counter is changed from long to s32, so we could avoid doing 64 bit update in most cases. - Users of the percpu counters are updated to make use of the new percpu_counter_init() routine now taking an additional parameter to allow users to pass the initial value of the global counter. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-23 07:43:06 -07:00
Benjamin LaHaise	5a6b7951bf	[PATCH] get_empty_filp tweaks, inline epoll_init_file() Eliminate a handful of cache references by keeping current in a register instead of reloading (helps x86) and avoiding the overhead of a function call. Inlining eventpoll_init_file() saves 24 bytes. Also reorder file initialization to make writes occur more sequentially. Signed-off-by: Benjamin LaHaise <bcrl@linux.intel.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-23 07:38:17 -08:00
Dipankar Sarma	529bf6be5c	[PATCH] fix file counting I have benchmarked this on an x86_64 NUMA system and see no significant performance difference on kernbench. Tested on both x86_64 and powerpc. The way we do file struct accounting is not very suitable for batched freeing. For scalability reasons, file accounting was constructor/destructor based. This meant that nr_files was decremented only when the object was removed from the slab cache. This is susceptible to slab fragmentation. With RCU based file structure, consequent batched freeing and a test program like Serge's, we just speed this up and end up with a very fragmented slab - llm22:~ # cat /proc/sys/fs/file-nr 587730 0 758844 At the same time, I see only a 2000+ objects in filp cache. The following patch I fixes this problem. This patch changes the file counting by removing the filp_count_lock. Instead we use a separate percpu counter, nr_files, for now and all accesses to it are through get_nr_files() api. In the sysctl handler for nr_files, we populate files_stat.nr_files before returning to user. Counting files as an when they are created and destroyed (as opposed to inside slab) allows us to correctly count open files with RCU. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-08 14:14:01 -08:00
Randy Dunlap	16f7e0fe2e	[PATCH] capable/capability.h (fs/) fs: Use <linux/capability.h> where capable() is used. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Acked-by: Tim Schmielau <tim@physik3.uni-rostock.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-11 18:42:13 -08:00
Nick Piggin	095975da26	[PATCH] rcu file: use atomic primitives Use atomic_inc_not_zero for rcu files instead of special case rcuref. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-08 20:13:48 -08:00
Pekka J Enberg	2109a2d1b1	[PATCH] mm: rename kmem_cache_s to kmem_cache This patch renames struct kmem_cache_s to kmem_cache so we can start using it instead of kmem_cache_t typedef. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-11-07 07:53:24 -08:00
Eric Dumazet	2f51201662	[PATCH] reduce sizeof(struct file) Now that RCU applied on 'struct file' seems stable, we can place f_rcuhead in a memory location that is not anymore used at call_rcu(&f->f_rcuhead, file_free_rcu) time, to reduce the size of this critical kernel object. The trick I used is to move f_rcuhead and f_list in an union called f_u The callers are changed so that f_rcuhead becomes f_u.fu_rcuhead and f_list becomes f_u.f_list Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-10-30 17:37:19 -08:00
Dipankar Sarma	ab2af1f500	[PATCH] files: files struct with RCU Patch to eliminate struct files_struct.file_lock spinlock on the reader side and use rcu refcounting rcuref_xxx api for the f_count refcounter. The updates to the fdtable are done by allocating a new fdtable structure and setting files->fdt to point to the new structure. The fdtable structure is protected by RCU thereby allowing lock-free lookup. For fd arrays/sets that are vmalloced, we use keventd to free them since RCU callbacks can't sleep. A global list of fdtable to be freed is not scalable, so we use a per-cpu list. If keventd is already handling the current cpu's work, we use a timer to defer queueing of that work. Since the last publication, this patch has been re-written to avoid using explicit memory barriers and use rcu_assign_pointer(), rcu_dereference() premitives instead. This required that the fd information is kept in a separate structure (fdtable) and updated atomically. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-09-09 13:57:55 -07:00
Eric Dumazet	2832e9366a	[PATCH] remove file.f_maxcount struct file cleanup: f_maxcount has an unique value (INT_MAX). Just use the hard-wired value. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-09-07 16:57:32 -07:00
Robert Love	0eeca28300	[PATCH] inotify inotify is intended to correct the deficiencies of dnotify, particularly its inability to scale and its terrible user interface: * dnotify requires the opening of one fd per each directory that you intend to watch. This quickly results in too many open files and pins removable media, preventing unmount. * dnotify is directory-based. You only learn about changes to directories. Sure, a change to a file in a directory affects the directory, but you are then forced to keep a cache of stat structures. * dnotify's interface to user-space is awful. Signals? inotify provides a more usable, simple, powerful solution to file change notification: * inotify's interface is a system call that returns a fd, not SIGIO. You get a single fd, which is select()-able. * inotify has an event that says "the filesystem that the item you were watching is on was unmounted." * inotify can watch directories or files. Inotify is currently used by Beagle (a desktop search infrastructure), Gamin (a FAM replacement), and other projects. See Documentation/filesystems/inotify.txt. Signed-off-by: Robert Love <rml@novell.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-12 20:38:38 -07:00
Kirill Korotaev	af4d2ecbf0	[PATCH] Fix of bogus file max limit messages This patch fixes incorrect and bogus kernel messages that file-max limit reached when the allocation fails Signed-Off-By: Kirill Korotaev <dev@sw.ru> Signed-Off-By: Denis Lunev <den@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:26 -07:00
Linus Torvalds	1da177e4c3	Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!	2005-04-16 15:20:36 -07:00

1 2 3 4

187 Commits