linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-16 00:52:01 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	eccba06891	gfs2: make gfs2_glock.gl_owner_pid be a struct pid * The gl_owner_pid field is used to get the lock owning task by its pid, so make it in a proper manner, i.e. by using the struct pid pointer and pid_task() function. The pid_task() becomes exported for the gfs2 module. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-07 08:42:06 -08:00
Pavel Emelyanov	b1e058da50	gfs2: make gfs2_holder.gh_owner_pid be a struct pid * The gl_owner_pid field is used to get the holder task by its pid and check whether the current is a holder, so make it in a proper manner, i.e. via the struct pid * manipulations. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-07 08:42:06 -08:00
Denis Cheng	30727174b6	dlm: add __init and __exit marks to init and exit functions it moves 365 bytes from .text to .init.text, and 30 bytes from .text to .exit.text, saves memory. Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: David Teigland <teigland@redhat.com>	2008-02-06 23:41:22 -06:00
David Teigland	d292c0cc48	dlm: eliminate astparam type casting Put lkb_astparam in a union with a dlm_user_args pointer to eliminate a lot of type casting. Signed-off-by: David Teigland <teigland@redhat.com>	2008-02-06 23:27:04 -06:00
Eric Van Hensbergen	8a0dc95fd9	9p: transport API reorganization This merges the mux.c (including the connection interface) with trans_fd in preparation for transport API changes. Ultimately, trans_fd will need to be rewritten to clean it up and simplify the implementation, but this reorganization is viewed as the first step. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2008-02-06 19:25:03 -06:00
Eric Van Hensbergen	14b8869ff4	9p: fix mmap to be read-only v9fs was allowing writable mmap which could lead to kernel BUG() cases. This sets the mmap function to generic_file_readonly_mmap which (correctly) returns an error to applications which open mmap for writing. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2008-02-06 19:25:05 -06:00
Anthony Liguori	d199d652c5	9p: add support for sticky bit GDM gets unhappy if /var/gdm doesn't have the sticky bit set. This patch adds support for the sticky bit in much the same way setuid/setgid is supported. With this patch, I can launch X from a v9fs rootfs (although I quickly run out of fds in the server once gnome starts up). Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Acked-by: Eric Van Hensbergen <ericvh@gmail.com>	2008-02-06 19:25:06 -06:00
Eric Van Hensbergen	c55703d807	9p: fix bug in attach-per-user When a new user attached at a directory other than the root, he would end up in the parent directory of the cwd. This was due to a logic error in the code which attaches the user at the mount point and walks back to the cwd. This patch fixes that. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2008-02-06 19:25:08 -06:00
Joel Becker	d24fbcda0c	ocfs2: Negotiate locking protocol versions. Currently, when ocfs2 nodes connect via TCP, they advertise their compatibility level. If the versions do not match, two nodes cannot speak to each other and they disconnect. As a result, this provides no forward or backwards compatibility. This patch implements a simple protocol negotiation at the dlm level by introducing a major/minor version number scheme for entities that communicate. Specifically, o2dlm has a major/minor version for interaction with o2dlm on other nodes, and ocfs2 itself has a major/minor version for interacting with the filesystem on other nodes. This will allow rolling upgrades of ocfs2 clusters when changes to the locking or network protocols can be done in a backwards compatible manner. In those cases, only the minor number is changed and the negotatied protocol minor is returned from dlm join. In the far less likely event that a required protocol change makes backwards compatibility impossible, we simply bump the major number. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-02-06 16:11:29 -08:00
Linus Torvalds	3e6bdf473f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86 * git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86: x86: fix deadlock, make pgd_lock irq-safe virtio: fix trivial build bug x86: fix mttr trimming x86: delay CPA self-test and repeat it x86: fix 64-bit sections generic: add __FINITDATA x86: remove suprious ifdefs from pageattr.c x86: mark the .rodata section also NX x86: fix iret exception recovery on 64-bit cpuidle: dubious one-bit signed bitfield in cpuidle.h x86: fix sparse warnings in powernow-k8.c x86: fix sparse error in traps_32.c x86: trivial sparse/checkpatch in quirks.c x86 ptrace: disallow null cs/ss MAINTAINERS: RDC R-321x SoC maintainer brk randomization: introduce CONFIG_COMPAT_BRK brk: check the lower bound properly x86: remove X2 workaround x86: make spurious fault handler aware of large mappings x86: make traps on entry code be debuggable in user space, 64-bit	2008-02-06 13:54:09 -08:00
Ingo Molnar	32a932332c	brk randomization: introduce CONFIG_COMPAT_BRK based on similar patch from: Pavel Machek <pavel@ucw.cz> Introduce CONFIG_COMPAT_BRK. If disabled then the kernel is free (but not obliged to) randomize the brk area. Heap randomization breaks ancient binaries, so we keep COMPAT_BRK enabled by default. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-06 22:39:44 +01:00
Jan Kara	bd1939de90	ext3: fix lock inversion in direct IO We cannot start transaction in ext3_direct_IO() and just let it last during the whole write because dio_get_page() acquires mmap_sem which ranks above transaction start (e.g. because we have dependency chain mmap_sem->PageLock->journal_start, or because we update atime while holding mmap_sem) and thus deadlocks could happen. We solve the problem by starting a transaction separately for each ext3_get_block() call. We could have a problem that we allocate a block and before its data are written out the machine crashes and thus we expose stale data. But that does not happen because for hole-filling generic code falls back to buffered writes and for file extension, we add inode to orphan list and thus in case of crash, journal replay will truncate inode back to the original size. [akpm@linux-foundation.org: build fix] Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:21 -08:00
Mariusz Kozlowski	e1d7ae24a2	ext3: remove unused code from ext3_find_entry() Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:21 -08:00
Akinobu Mita	859cb93679	ext[234]: cleanup ext[234]_bg_num_gdb() Use ext[234]_bg_has_super() to remove duplicate code. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:21 -08:00
Akinobu Mita	fb01bfdac7	ext[234]: remove unused argument for ext[234]_find_goal() The argument chain for ext[234]_find_goal() is not used. This patch removes it and fixes comment as well. Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:21 -08:00
Akinobu Mita	197cd65acc	ext[234]: use ext[234]_get_group_desc() Use ext[234]_get_group_desc() to get group descriptor from group number. Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:21 -08:00
Akinobu Mita	144704e522	ext[234]: fix comment for nonexistent variable The comment in ext[234]_new_blocks() describes about "i". But there is no local variable called "i" in that scope. I guess it has been renamed to group_no. Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:21 -08:00
Aneesh Kumar K.V	1eca93f9ca	ext3: change the default behaviour on error ext3 file system was by default ignoring errors and continuing. This is not a good default as continuing on error could lead to file system corruption. Change the default to mark the file system readonly. Debian and ubuntu already does this as the default in their fstab. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: <linux-ext4@vger.kernel.org> Cc: Eric Sandeen <sandeen@redhat.com> Cc: Jan Kara <jack@ucw.cz> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:20 -08:00
Aneesh Kumar K.V	feda58d37a	ext3: return after ext3_error in case of failures This fixes some instances where we were continuing after calling ext3_error. ext3_error calls panic only if errors=panic mount option is set. So we need to make sure we return correctly after ext3_error call Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:20 -08:00
Adrian Bunk	533083836f	make jbd/journal.c:__journal_abort_hard() static __journal_abort_hard() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:20 -08:00
Andi Kleen	e86e14385d	BKL-removal: remove incorrect comment refering to lock_kernel() from jbd/jbd2 None of the callers of this function does actually take the BKL as far as I can see. So remove the comment refering to the BKL. Signed-off-by: Andi Kleen <ak@suse.de> Cc: <linux-ext4@vger.kernel.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:20 -08:00
Andi Kleen	d71cadd6bc	BKL-removal: remove incorrect BKL comment in ext2 No BKL used anywhere, so don't mention it. Signed-off-by: Andi Kleen <ak@suse.de> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:20 -08:00
Andi Kleen	14f9f7b28e	BKL-removal: convert ext2 over to use unlocked_ioctl I checked ext2_ioctl and could not find anything in there that would need the BKL. So convert it over to use unlocked_ioctl Signed-off-by: Andi Kleen <ak@suse.de> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:20 -08:00
Aneesh Kumar K.V	f762e9054f	ext3: add block bitmap validation When a new block bitmap is read from disk in read_block_bitmap() there are a few bits that should ALWAYS be set. In particular, the blocks given corresponding to block bitmap, inode bitmap and inode tables. Validate the block bitmap against these blocks. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:20 -08:00
Aneesh Kumar K.V	01584fa645	ext2: add block bitmap validation When a new block bitmap is read from disk in read_block_bitmap() there are a few bits that should ALWAYS be set. In particular, the blocks given corresponding to block bitmap, inode bitmap and inode tables. Validate the block bitmap against these blocks. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:20 -08:00
Miklos Szeredi	d12def1bcb	fuse: limit queued background requests Libfuse basically creates a new thread for each new request. This is fine for synchronous requests, which are naturally limited. However background requests (especially writepage) can cause a thread creation storm. To avoid this, limit the number of background requests available to userspace. This is done by introducing another queue for background requests, and a counter for the number of "active" requests, which are currently available for userspace. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:13 -08:00
Miklos Szeredi	b57d426445	fuse: save space in struct fuse_req Move the fields 'dentry' and 'vfsmount' into the request specific union, since these are only used for the RELEASE request. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:13 -08:00
Miklos Szeredi	0952b2a4a8	fuse: fix attribute caching after create Invalidate attributes on create, since st_ctime is updated. Reported by Szabolcs Szakacsits. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:13 -08:00
Eric Sandeen	af440f5292	ecryptfs: check for existing key_tfm at mount time Jeff Moyer pointed out that a mount; umount loop of ecryptfs, with the same cipher & other mount options, created a new ecryptfs_key_tfm_cache item each time, and the cache could grow quite large this way. Looking at this with mhalcrow, we saw that ecryptfs_parse_options() unconditionally called ecryptfs_add_new_key_tfm(), which is what was adding these items. Refactor ecryptfs_get_tfm_and_mutex_for_cipher_name() to create a new helper function, ecryptfs_tfm_exists(), which checks for the cipher on the cached key_tfm_list, and sets a pointer to it if it exists. This can then be called from ecryptfs_parse_options(), and new key_tfm's can be added only when a cached one is not found. With list locking changes suggested by akpm. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:13 -08:00
Trevor Highland	19e66a67e9	eCryptfs: change the type of cipher_code from u16 to u8 Only the lower byte of cipher_code is ever used, so it makes sense for its type to be u8. Signed-off-by: Trevor Highland <trevor.highland@gmail.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:13 -08:00
Michael Halcrow	25bd817403	eCryptfs: Minor fixes to printk messages The printk statements that result when the user does not have the proper key available could use some refining. Signed-off-by: Mike Halcrow <mhalcrow@us.ibm.com> Cc: Mike Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:12 -08:00
Eric Sandeen	2830bfd6cf	ecryptfs: remove debug as mount option, and warn if set via modprobe ecryptfs_debug really should not be a mount option; it is not per-mount, but rather sets a global "ecryptfs_verbosity" variable which affects all mounted filesysytems. It's already settable as a module load option, I think we can leave it at that. Also, if set, since secret values come out in debug messages, kick things off with a stern warning. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Mike Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:12 -08:00
Eric Sandeen	99db6e4a97	ecryptfs: make show_options reflect actual mount options Change ecryptfs_show_options to reflect the actual mount options in use. Note that this does away with the "dir=" output, which is not a valid mount option and appears to be unused. Mount options such as "ecryptfs_verbose" and "ecryptfs_xattr_metadata" are somewhat indeterminate for a given fs, but in any case the reported mount options can be used in a new mount command to get the same behavior. [akpm@linux-foundation.org: fix printk warning] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:12 -08:00
Trevor Highland	8e3a6f16ba	eCryptfs: set inode key only once per crypto operation There is no need to keep re-setting the same key for any given eCryptfs inode. This patch optimizes the use of the crypto API and helps performance a bit. Signed-off-by: Trevor Highland <trevor.highland@gmail.com> Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:12 -08:00
Michael Halcrow	cc11beffdf	eCryptfs: track header bytes rather than extents Remove internal references to header extents; just keep track of header bytes instead. Headers can easily span multiple pages with the recent persistent file changes. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:12 -08:00
Adrian Bunk	7896b63182	fs/ecryptfs/: possible cleanups - make the following needlessly global code static: - crypto.c:ecryptfs_lower_offset_for_extent() - crypto.c:key_tfm_list - crypto.c:key_tfm_list_mutex - inode.c:ecryptfs_getxattr() - main.c:ecryptfs_init_persistent_file() - remove the no longer used mmap.c:ecryptfs_lower_page_cache - #if 0 the unused read_write.c:ecryptfs_read() Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:12 -08:00
Karsten Wiese	844fcc5396	make sys_poll() wait at least timeout ms schedule_timeout(jiffies) waits for at least jiffies - 1. Add 1 jiffie to the timeout_jiffies calculated in sys_poll() to wait at least timeout_msecs, like poll() manpage says. Signed-off-by: Karsten Wiese <fzu@wemgehoertderstaat.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:09 -08:00
Eric Dumazet	13f14b4d8b	Use ilog2() in fs/namespace.c We can use ilog2() in fs/namespace.c to compute hash_bits and hash_mask at compile time, not runtime. [akpm@linux-foundation.org: clean it all up] Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:09 -08:00
Thomas Bogendoerfer	b75cb06f72	partition: use DEFAULT_SGI_PARTITION for SGI_PARTION default Use DEFAULT_SGI_PARTITION for SGI_PARTION default Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:08 -08:00
Nick Piggin	a8e3eff466	ext2: xip check fix ext2 should not worry about checking sb->s_blocksize for XIP before the sb's blocksize actually gets set. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:08 -08:00
Denis Cheng	bcf11cbecc	fs/reiserfs/xattr.c: use LIST_HEAD instead of LIST_HEAD_INIT Signed-off-by: Denis Cheng <crquan@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:08 -08:00
Jan Kara	941d2380e9	quota: improve inode list scanning in add_dquot_ref() We restarted scan of sb->s_inodes list whenever we had to drop inode_lock in add_dquot_ref(). This leads to overall quadratic running time and thus add_dquot_ref() can take several minutes when called on a life filesystem. We fix the problem by using the fact that inode cannot be removed from s_inodes list while we hold a reference to it and thus we can safely restart the scan if we don't drop the reference. Here we use the fact that inodes freshly added to s_inodes list are already guaranteed to have quotas properly initialized and the ordering of inodes on s_inodes list does not change so we cannot skip any inode. Thanks goes to Nick <gentuu@gmail.com> for analyzing the problem and testing the fix. [akpm@linux-foundation.org: iput(NULL) is legal] Signed-off-by: Jan Kara <jack@suse.cz> Cc: Nick <gentuu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:07 -08:00
Nick Piggin	0d71bd5993	inotify: remove debug code The inotify debugging code is supposed to verify that the DCACHE_INOTIFY_PARENT_WATCHED scalability optimisation does not result in notifications getting lost nor extra needless locking generated. Unfortunately there are also some races in the debugging code. And it isn't very good at finding problems anyway. So remove it for now. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Robert Love <rlove@google.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Jan Kara <jack@ucw.cz> Cc: Yan Zheng <yanzheng@21cn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:07 -08:00
Nick Piggin	d599e36a9e	inotify: fix race There is a race between setting an inode's children's "parent watched" flag when placing the first watch on a parent, and instantiating new children of that parent: a child could miss having its flags set by set_dentry_child_flags, but then inotify_d_instantiate might still see !inotify_inode_watched. The solution is to set_dentry_child_flags after adding the watch. Locking is taken care of, because both set_dentry_child_flags and inotify_d_instantiate hold dcache_lock and child->d_locks. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Robert Love <rlove@google.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Jan Kara <jack@ucw.cz> Cc: Yan Zheng <yanzheng@21cn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:06 -08:00
Qi Yong	ba6f867f11	kill an unused PTR_ERR in bdev_cache_init() Signed-off-by: Qi Yong <qiyong@fc-cn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:06 -08:00
Eric Dumazet	9cfe015aa4	get rid of NR_OPEN and introduce a sysctl_nr_open NR_OPEN (historically set to 10241024) actually forbids processes to open more than 10241024 handles. Unfortunatly some production servers hit the not so 'ridiculously high value' of 10241024 file descriptors per process. Changing NR_OPEN is not considered safe because of vmalloc space potential exhaust. This patch introduces a new sysctl (/proc/sys/fs/nr_open) wich defaults to 10241024, so that admins can decide to change this limit if their workload needs it. [akpm@linux-foundation.org: export it for sparc64] Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:06 -08:00
Richard Knutsson	774ed22c21	reiserfs: complement va_start() with va_end(). Complement va_start() with va_end(). Signed-off-by: Richard Knutsson <ricknu-0@student.ltu.se> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:05 -08:00
Jan Kara	ece95912db	inotify: send IN_ATTRIB events when link count changes Currently, no notification event has been sent when inode's link count changed. This is inconvenient for the application in some cases: Suppose you have the following directory structure foo/test bar/ and you watch test. If someone does "mv foo/test bar/", you get event IN_MOVE_SELF and you know something has happened with the file "test". However if someone does "ln foo/test bar/test" and "rm foo/test" you get no inotify event for the file "test" (only directories "foo" and "bar" receive events). Furthermore it could be argued that link count belongs to file's metadata and thus IN_ATTRIB should be sent when it changes. The following patch implements sending of IN_ATTRIB inotify events when link count of the inode changes, i.e., when a hardlink to the inode is created or when it is removed. This event is sent in addition to all the events sent so far. In particular, when a last link to a file is removed, IN_ATTRIB event is sent in addition to IN_DELETE_SELF event. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Morten Welinder <mwelinder@gmail.com> Cc: Robert Love <rlove@google.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Steven French <sfrench@us.ibm.com> Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:05 -08:00
Robert P. J. Day	14b30d6284	hfs: update comment to reflect actual init and exit routines Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:05 -08:00
Eric Sandeen	55581d018e	address hfs on-disk corruption robustness review comments Address Roman's review comments for the previously sent on-disk corruption hfs robustness patch. - use 0 as a failure value, rather than making a new macro HFS_BAD_KEYLEN, and use a switch statement instead of if's. - Add new fail: target to __hfs_brec_find to skip assignments using bad values when exiting with a failure. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:05 -08:00
Robert P. J. Day	7c28cbaed6	NCPFS: update diagnostic strings to match routine names. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Cc: Petr Vandrovec <VANDROVE@vc.cvut.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:05 -08:00
Akinobu Mita	797074e44d	fs: use list_for_each_entry_reverse and kill sb_entry Use list_for_each_entry_reverse for super_blocks list and remove unused sb_entry macro. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:05 -08:00
Akinobu Mita	e8462caa91	fs: use hlist_unhashed Use hlist_unhashed() instead of opencoded equivalent. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:04 -08:00
Michal Schmidt	07a154b2bb	proc: loadavg reading race The avenrun[] values are supposed to be protected by xtime_lock. loadavg_read_proc does not use it. Theoretically this may result in an occasional glitch when the value read from /proc/loadavg would be as much as 1<<11 times higher than it should be. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:04 -08:00
Jiri Olsa	0321155926	fs: remove dead config CONFIG_HAS_COMPAT_EPOLL_EVENT symbol Remove dead config CONFIG_HAS_COMPAT_EPOLL_EVENT symbol. Signed-off-by: Jiri Olsa <olsajiri@gmail.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:03 -08:00
Adrian Bunk	7747cdb2f8	fs/eventfd.c should #include <linux/syscalls.h> Every file should include the headers containing the prototypes for its global functions (in this case sys_eventfd()). Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:03 -08:00
Adrian Bunk	7ec37dfd4d	fs/signalfd.c should #include <linux/syscalls.h> Every file should include the headers containing the prototypes for its global functions (in this case sys_signalfd()). Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:03 -08:00
Adrian Bunk	12c2ab5e8f	fs/utimes.c should #include <linux/syscalls.h> Every file should include the headers containing the prototypes for its global functions. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:03 -08:00
Adrian Bunk	011e3fcd1e	proper prototype for get_filesystem_list() Ad a proper prototype for migration_init() in include/linux/fs.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:02 -08:00
Jeff Layton	ce88cc5ed8	smbfs: fix calculation of kernel_recvmsg size parameter in smb_receive() smb_receive calls kernel_recvmsg with a size that's the minimum of the amount of buffer space in the kvec passed in or req->rq_rlen (which represents the length of the response). This does not take into account any data that was read in a request earlier pass through smb_receive. If the first pass through smb_receive receives some but not all of the response, then the next pass can call kernel_recvmsg with a size field that's too big. kernel_recvmsg can overrun into the next response, throwing off the alignment and making it unrecognizable. This causes messages like this to pop up in the ring buffer: smb_get_length: Invalid NBT packet, code=69 as well as other errors indicating that the response is unrecognizable. Typically this is seen on a smbfs mount under heavy I/O. This patch changes the code to use (req->rq_rlen - req->rq_bytes_recvd) instead instead of just req->rq_rlen, since that should represent the amount of unread data in the response. I think this is correct, but an ACK or NACK from someone more familiar with this code would be appreciated... Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:02 -08:00
Vegard Nossum	b4cf9c342a	FAT: Fix printk format strings This makes sure printk format strings contain no more than a single line. Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> [the message was tweaked.] Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:02 -08:00
Adrian Bunk	f74596d079	proper show_interrupts() prototype Add a proper prototype for show_interrupts() in include/linux/interrupt.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:02 -08:00
Andries E. Brouwer	0b03cfb25f	MNT_UNBINDABLE fix Some time ago ( http://lkml.org/lkml/2007/6/19/128 ) I wrote about MNT_UNBINDABLE that it felt like a bug that it is not reset by "mount --make-private". Today I happened to see mount(8) and Documentation/sharedsubtree.txt and both document the version obtained by applying the little patch given in the above (and again below). So, the present kernel code is not according to specs and must be regarded as buggy. Specification in Documentation/sharedsubtree.txt: See state diagram: unbindable should become private upon make-private. Specification in mount(8): ... It's also possible to set up uni-directional propagation (with --make- slave), to make a mount point unavailable for --bind/--rbind (with --make-unbindable), and to undo any of these (with --make-private). Repeat of old fix-shared-subtrees-make-private.patch (due to Dirk Gerrits, René Gabriëls, Peter Kooijmans): Acked-by: Ram Pai <linuxram@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:02 -08:00
Dmitry Antipov	bcfbf84d40	SIGIO-driven I/O with inotify queues Add SIGIO-driven I/O for descriptors returned by inotify_init(). The thing may be enabled by convenient fcntl (fd, F_SETFL, O_ASYNC) call. Signed-off-by: Dmitry Antipov <antipov@dev.rtsoft.ru> Cc: Robert Love <rlove@google.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:00 -08:00
Aneesh Kumar K.V	14e11e106b	ext2: change the default behaviour on error ext2 file system was by default ignoring errors and continuing. This is not a good default as continuing on error could lead to file system corruption. Change the default to mark the file system readonly. Debian and ubuntu already does this as the default in their fstab. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: <linux-ext4@vger.kernel.org> Cc: Eric Sandeen <sandeen@redhat.com> Cc: Jan Kara <jack@ucw.cz> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:00 -08:00
Aneesh Kumar K.V	7f0adaeced	ext2: return after ext2_error in case of failures This fixes some instances where we were continuing after calling ext2_error. ext2_error call panic only if errors=panic mount option is set. So we need to make sure we return correctly after ext2_error call. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:00 -08:00
Yan Zheng	1c17d18e37	A potential bug in inotify_user.c Following comment is at fs/inotify_user.c:287 /* coalescing: drop this event if it is a dupe of the previous */ I think the previous event in the comment should be the last event in the link list. But inotify_dev_get_event return the first event in the list. In addition, it doesn't check whether the list is empty Signed-off-by: Yan Zheng<yanzheng@21cn.com> Acked-by: Robert Love <rlove@rlove.org> Cc: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:00 -08:00
Jan Engelhardt	19c561a60f	fs/fat/: refine chmod checks Prohibit mode changes in non-quiet mode that cannot be stored reliably with the on-disk format. Suppose a vfat filesystem is mounted with umask=0 and [not-quiet]. Then all files will have mode 0777. Trying to change the owner will fail, because fat does not know about owners or groups. chmod 0770, on the other hand, will succeed, even though fat does not know about the permission triplet [user/group/other]. So this patch changes fat's not-quiet behavior so that only UNIX modes are accepted that can be mapped lossless between the fat disk format and the local system. There is only one attribute, and that is the readonly attribute, which is mapped to the UNIX write permission bit(s). chmod 0555 is therefore valid (taking away the +w bits <=> setting the readonly attribute). Since chmod 0775 and chmod 0755 is an ambiguous case as to whether to set or clear the readonly bit, these modes are also denied. In quiet mode, chmod and chown will continue to "succeed" as they did before, meaning that a subsequent stat() will temporarily return the new mode as long as the inode is not reread from disk, and chown will silently do nothing, not even return the new uid/gid in stat(). Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:40:59 -08:00
David Teigland	e5dae548b0	dlm: proper types for asts and basts Use proper types for ast and bast functions, and use consistent type for ast param. Signed-off-by: David Teigland <teigland@redhat.com>	2008-02-06 00:35:45 -06:00
Al Viro	cb79f1998d	dlm: dlm/user.c input validation fixes a) in device_write(): add sentinel NUL byte, making sure that lspace.name will be NUL-terminated b) in compat_input() be keep it simple about the amounts of data we are copying. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Teigland <teigland@redhat.com>	2008-02-04 01:30:19 -06:00
Al Viro	043b19cdc0	dlm: fix dlm_dir_lookup() handling of too long names ... those can happen and BUG() from DLM_ASSERT() in allocate_direntry() is not a good way to handle them. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Teigland <teigland@redhat.com>	2008-02-04 01:30:19 -06:00
Al Viro	a9cc915928	dlm: fix overflows when copying from ->m_extra to lvb Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Teigland <teigland@redhat.com>	2008-02-04 01:29:13 -06:00
Al Viro	ef58bccab7	dlm: make find_rsb() fail gracefully when namelen is too large We can get there from receive_request() and dlm_recover_master_copy() with namelen too large if incoming request is invalid; BUG() from DLM_ASSERT() in allocate_rsb() is a bit excessive reaction to that and in case of dlm_recover_master_copy() we would actually oops before that while calculating hash of up to 64Kb worth of data - with data actually being 64 _bytes_ in kmalloc()'ed struct. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Teigland <teigland@redhat.com>	2008-02-04 01:26:31 -06:00
Al Viro	a5dd06313d	dlm: receive_rcom_lock_args() overflow check Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Teigland <teigland@redhat.com>	2008-02-04 01:25:58 -06:00
Al Viro	ae773d0b74	dlm: verify that places expecting rcom_lock have packet long enough Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Teigland <teigland@redhat.com>	2008-02-04 01:25:09 -06:00
Al Viro	cd9df1aac3	dlm: validate data in dlm_recover_directory() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Teigland <teigland@redhat.com>	2008-02-04 01:24:52 -06:00
Al Viro	02ed16b64d	dlm: missing length check in check_config() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Teigland <teigland@redhat.com>	2008-02-04 01:24:20 -06:00
Al Viro	4007685c6e	dlm: use proper type for ->ls_recover_buf Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Teigland <teigland@redhat.com>	2008-02-04 01:24:07 -06:00
Al Viro	93ff2971e9	dlm: do not byteswap rcom_config Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Teigland <teigland@redhat.com>	2008-02-04 01:23:43 -06:00
Al Viro	163a1859ec	dlm: do not byteswap rcom_lock Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Teigland <teigland@redhat.com>	2008-02-04 01:23:14 -06:00
Al Viro	eef7d739c2	dlm: dlm_process_incoming_buffer() fixes * check that length is large enough to cover the non-variable part of message or rcom resp. (after checking that it's large enough to cover the header, of course). * kill more pointless casts Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Teigland <teigland@redhat.com>	2008-02-04 01:22:42 -06:00
Al Viro	8b0d8e03f8	dlm: use proper C for dlm/requestqueue stuff (and fix alignment bug) a) don't cast the pointer to dlm_header , we use it as dlm_message anyway. b) we copy the message into a queue element, then pass the pointer to copy to dlm_receive_message_saved(); declare it properly to make sure that we have the right alignment. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Teigland <teigland@redhat.com>	2008-02-04 01:21:32 -06:00
David Woodhouse	c1f3ee120b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git	2008-02-03 18:30:32 +11:00
Jens Axboe	8084870854	splice: always updated atime in direct splice Andre Majorel <aym-xunil@teaser.fr> points out that if we only updated the atime when we transfer some data, we deviate from the standard of always updating the atime. So change splice to always call file_accessed() even if splice_direct_to_actor() didn't transfer any data. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-02-01 09:26:32 +01:00
Linus Torvalds	75659ca0c1	Merge branch 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc * 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: (22 commits) Remove commented-out code copied from NFS NFS: Switch from intr mount option to TASK_KILLABLE Add wait_for_completion_killable Add wait_event_killable Add schedule_timeout_killable Use mutex_lock_killable in vfs_readdir Add mutex_lock_killable Use lock_page_killable Add lock_page_killable Add fatal_signal_pending Add TASK_WAKEKILL exit: Use task_is_* signal: Use task_is_* sched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL ptrace: Use task_is_* power: Use task_is_* wait: Use TASK_NORMAL proc/base.c: Use task_is_* proc/array.c: Use TASK_REPORT perfmon: Use task_is_* ... Fixed up conflicts in NFS/sunrpc manually..	2008-02-01 11:45:47 +11:00
Linus Torvalds	f389e9fcec	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: (21 commits) dlm: static initialization improvements dlm: clean ups dlm: Sanity check namelen before copying it dlm: keep cached master rsbs during recovery dlm: change error message to debug dlm: fix possible use-after-free dlm: limit dir lookup loop dlm: reject normal unlock when lock is waiting for lookup dlm: validate messages before processing dlm: reject messages from non-members dlm: another call to confirm_master in receive_request_reply dlm: recover locks waiting for overlap replies dlm: clear ast_type when removing from astqueue dlm: use fixed errno values in messages dlm: swap bytes for rcom lock reply dlm: align midcomms message buffer dlm: close othercons dlm: use dlm prefix on alloc and free functions dlm: don't print common non-errors dlm: proper prototypes ...	2008-01-31 09:29:31 +11:00
Denis Cheng	0fe410d3f3	dlm: static initialization improvements also change name_prefix from char pointer to char array. Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:43 -06:00
David Teigland	dbcfc34733	dlm: clean ups A couple small clean-ups. Remove unnecessary wrapper-functions in rcom.c, and remove unnecessary casting and an unnecessary ASSERT in util.c. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:43 -06:00
Patrick Caulfeld	2a79289e87	dlm: Sanity check namelen before copying it The 32/64 compatibility code in the DLM does not check the validity of the lock name length passed into it, so it can easily overwrite memory if the value is rubbish (as early versions of libdlm can cause with unlock calls, it doesn't zero the field). This patch restricts the length of the name to the amount of data actually passed into the call. Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:43 -06:00
David Teigland	85f0379aa0	dlm: keep cached master rsbs during recovery To prevent the master of an rsb from changing rapidly, an unused rsb is kept on the "toss list" for a period of time to be reused. The toss list was being cleared completely for each recovery, which is unnecessary. Much of the benefit of the toss list can be maintained if nodes keep rsb's in their toss list that they are the master of. These rsb's need to be included when the resource directory is rebuilt during recovery. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:43 -06:00
David Teigland	594199ebaa	dlm: change error message to debug The invalid lockspace messages are normal and can appear relatively often. They should be suppressed without debugging enabled. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:43 -06:00
David Teigland	ce5246b972	dlm: fix possible use-after-free The dlm_put_lkb() can free the lkb and its associated ua structure, so we can't depend on using the ua struct after the put. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:42 -06:00
David Teigland	755b5eb8ba	dlm: limit dir lookup loop In a rare case we may need to repeat a local resource directory lookup due to a race with removing the rsb and removing the resdir record. We'll never need to do more than a single additional lookup, though, so the infinite loop around the lookup can be removed. In addition to being unnecessary, the infinite loop is dangerous since some other unknown condition may appear causing the loop to never break. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:42 -06:00
David Teigland	42dc1601a9	dlm: reject normal unlock when lock is waiting for lookup Non-forced unlocks should be rejected if the lock is waiting on the rsb_lookup list for another lock to establish the master node. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:42 -06:00
David Teigland	c54e04b00f	dlm: validate messages before processing There was some hit and miss validation of messages that has now been cleaned up and unified. Before processing a message, the new validate_message() function checks that the lkb is the appropriate type, process-copy or master-copy, and that the message is from the correct nodeid for the the given lkb. Other checks and assertions on the lkb type and nodeid have been removed. The assertions were particularly bad since they would panic the machine instead of just ignoring the bad message. Although other recent patches have made processing old message unlikely, it still may be possible for an old message to be processed and caught by these checks. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:42 -06:00
David Teigland	46b43eed70	dlm: reject messages from non-members Messages from nodes that are no longer members of the lockspace should be ignored. When nodes are removed from the lockspace, recovery can sometimes complete quickly enough that messages arrive from a removed node after recovery has completed. When processed, these messages would often cause an error message, and could in some cases change some state, causing problems. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:42 -06:00
David Teigland	aec64e1be2	dlm: another call to confirm_master in receive_request_reply When a failed request (EBADR or ENOTBLK) is unlocked/canceled instead of retried, there may be other lkb's waiting on the rsb_lookup list for it to complete. A call to confirm_master() is needed to move on to the next waiting lkb since the current one won't be retried. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:42 -06:00
David Teigland	601342ce02	dlm: recover locks waiting for overlap replies When recovery looks at locks waiting for replies, it fails to consider locks that have already received a reply for their first remote operation, but not received a reply for secondary, overlapping unlock/cancel. The appropriate stub reply needs to be called for these waiters. Appears when we start doing recovery in the presence of a many overlapping unlock/cancel ops. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:42 -06:00
David Teigland	8a358ca8e7	dlm: clear ast_type when removing from astqueue The lkb_ast_type field indicates whether the lkb is on the astqueue list. When clearing locks for a process, lkb's were being removed from the astqueue list without clearing the field. If release_lockspace then happened immediately afterward, it could try to remove the lkb from the list a second time. Appears when process calls libdlm dlm_release_lockspace() which first closes the ls dev triggering clear_proc_locks, and then removes the ls (a write to control dev) causing release_lockspace(). Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:42 -06:00
David Teigland	861e2369e9	dlm: use fixed errno values in messages Some errno values differ across platforms. So if we return things like -EINPROGRESS from one node it can get misinterpreted or rejected on another one. This patch fixes up the errno values passed on the wire so that they match the x86 ones (so as not to break the protocol), and re-instates the platform-specific ones at the other end. Many thanks to Fabio for testing this patch. Initial patch from Patrick. Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com> Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:42 -06:00
Fabio M. Di Nitto	550283e30c	dlm: swap bytes for rcom lock reply DLM_RCOM_LOCK_REPLY messages need byte swapping. Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com> Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:41 -06:00
Fabio M. Di Nitto	e7847d35ac	dlm: align midcomms message buffer gcc does not guarantee that an auto buffer is 64bit aligned. This change allows sparc64 to work. Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com> Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-30 11:04:25 -06:00
Andi Kleen	612a95b4e0	x86: remove iBCS support ibcs2 support has never been supported on 2.6 kernels as far as I know, and if it has it must have been an external patch. Anyways, if anybody applies an external patch they could as well readd the ibcs checking code to the ELF loader in the same patch. But there is no reason to keep this code running in all Linux kernels. This will save at least two strcmps each ELF execution. No deprecation period because it could not have been used anyway. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:33:32 +01:00
Roland McGrath	b9d36d5d00	x86: compat_binfmt_elf Kconfig This adds Kconfig and Makefile bits to build fs/compat_binfmt_elf.c, just added. Each arch that wants to use this file needs to add a "select COMPAT_BINFMT_ELF" line in its Kconfig bits that enable COMPAT. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:31:46 +01:00
Roland McGrath	2f79e48ae2	x86: compat_binfmt_elf This adds fs/compat_binfmt_elf.c, a wrapper around fs/binfmt_elf.c for 32-bit ELF support on 64-bit kernels. It can replace all the hand-rolled versions of this that each 32/64 arch has, which are all about the same. To use this, an arch's asm/elf.h has to define at least a few compat_* macros that parallel the various macros that fs/binfmt_elf.c uses for native support. There is no attempt to deal with compat macros for the core dump format support. To use this file, the arch has to define compat_gregset_t for linux/elfcore-compat.h and #define CORE_DUMP_USE_REGSET. The 32-bit compatible formats should come automatically from task_user_regset_view called on a 32-bit task. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:31:46 +01:00
Roland McGrath	4206d3aa19	elf core dump: notes user_regset This modifies the ELF core dump code under #ifdef CORE_DUMP_USE_REGSET. It changes nothing when this macro is not defined. When it's #define'd by some arch header (e.g. asm/elf.h), the arch must support the user_regset (linux/regset.h) interface for reading thread state. This provides an alternate version of note segment writing that is based purely on the user_regset interfaces. When CORE_DUMP_USE_REGSET is set, the arch need not define macros such as ELF_CORE_COPY_REGS and ELF_ARCH. All that information is taken from the user_regset data structures. The core dumps come out exactly the same if arch's definitions for its user_regset details are correct. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:31:45 +01:00
Roland McGrath	3aba481fc9	elf core dump: notes reorg This pulls out the code for writing the notes segment of an ELF core dump into separate functions. This cleanly isolates into one cluster of functions everything that deals with the note formats and the hooks into arch code to fill them. The top-level elf_core_dump function itself now deals purely with the generic ELF format and the memory segments. This only moves code around into functions that can be inlined away. It should not change any behavior at all. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:31:44 +01:00
Nick Piggin	95c354fe9f	spinlock: lockbreak cleanup The break_lock data structure and code for spinlocks is quite nasty. Not only does it double the size of a spinlock but it changes locking to a potentially less optimal trylock. Put all of that under CONFIG_GENERIC_LOCKBREAK, and introduce a __raw_spin_is_contended that uses the lock data itself to determine whether there are waiters on the lock, to be used if CONFIG_GENERIC_LOCKBREAK is not set. Rename need_lockbreak to spin_needbreak, make it use spin_is_contended to decouple it from the spinlock implementation, and make it typesafe (rwlocks do not have any need_lockbreak sites -- why do they even get bloated up with that break_lock then?). Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:31:20 +01:00
Harvey Harrison	56c4da454d	core: remove last users of empty FASTCALL macro FASTCALL is always empty after the x86 removal. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:31:17 +01:00
Andrew Morton	bb1ad8205b	x86: PIE executable randomization, checkpatch fixes #39: FILE: arch/ia64/ia32/binfmt_elf32.c:229: +elf32_map (struct file filep, unsigned long addr, struct elf_phdr eppnt, int prot, int type, unsigned long unused) WARNING: no space between function name and open parenthesis '(' #39: FILE: arch/ia64/ia32/binfmt_elf32.c:229: +elf32_map (struct file filep, unsigned long addr, struct elf_phdr eppnt, int prot, int type, unsigned long unused) WARNING: line over 80 characters #67: FILE: arch/x86/kernel/sys_x86_64.c:80: + new_begin = randomize_range(begin, begin + 0x02000000, 0); ERROR: use tabs not spaces #110: FILE: arch/x86/kernel/sys_x86_64.c:185: + ^I mm->cached_hole_size = 0;$ ERROR: use tabs not spaces #111: FILE: arch/x86/kernel/sys_x86_64.c:186: + ^I^Imm->free_area_cache = mm->mmap_base;$ ERROR: use tabs not spaces #112: FILE: arch/x86/kernel/sys_x86_64.c:187: + ^I}$ ERROR: use tabs not spaces #141: FILE: arch/x86/kernel/sys_x86_64.c:216: + ^I^I/* remember the largest hole we saw so far /$ ERROR: use tabs not spaces #142: FILE: arch/x86/kernel/sys_x86_64.c:217: + ^I^Iif (addr + mm->cached_hole_size < vma->vm_start)$ ERROR: use tabs not spaces #143: FILE: arch/x86/kernel/sys_x86_64.c:218: + ^I^I mm->cached_hole_size = vma->vm_start - addr;$ ERROR: use tabs not spaces #157: FILE: arch/x86/kernel/sys_x86_64.c:232: + ^Imm->free_area_cache = TASK_UNMAPPED_BASE;$ ERROR: need a space before the open parenthesis '(' #291: FILE: arch/x86/mm/mmap_64.c:101: + } else if(mmap_is_legacy()) { WARNING: braces {} are not necessary for single statement blocks #302: FILE: arch/x86/mm/mmap_64.c:112: + if (current->flags & PF_RANDOMIZE) { + mm->mmap_base += ((long)rnd) << PAGE_SHIFT; + } WARNING: line over 80 characters #314: FILE: fs/binfmt_elf.c:48: +static unsigned long elf_map (struct file , unsigned long, struct elf_phdr , int, int, unsigned long); WARNING: no space between function name and open parenthesis '(' #314: FILE: fs/binfmt_elf.c:48: +static unsigned long elf_map (struct file , unsigned long, struct elf_phdr *, int, int, unsigned long); WARNING: line over 80 characters #429: FILE: fs/binfmt_elf.c:438: + eppnt, elf_prot, elf_type, total_size); ERROR: need space after that ',' (ctx:VxV) #480: FILE: fs/binfmt_elf.c:939: + elf_prot, elf_flags,0); ^ total: 9 errors, 7 warnings, 461 lines checked Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:31:07 +01:00
Jiri Kosina	cc503c1b43	x86: PIE executable randomization main executable of (specially compiled/linked -pie/-fpie) ET_DYN binaries onto a random address (in cases in which mmap() is allowed to perform a randomization). The code has been extraced from Ingo's exec-shield patch http://people.redhat.com/mingo/exec-shield/ [akpm@linux-foundation.org: fix used-uninitialsied warning] [kamezawa.hiroyu@jp.fujitsu.com: fixed ia32 ELF on x86_64 handling] Signed-off-by: Jiri Kosina <jkosina@suse.cz> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Roland McGrath <roland@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:31:07 +01:00
Jiri Kosina	c1d171a002	x86: randomize brk Randomize the location of the heap (brk) for i386 and x86_64. The range is randomized in the range starting at current brk location up to 0x02000000 offset for both architectures. This, together with pie-executable-randomization.patch and pie-executable-randomization-fix.patch, should make the address space randomization on i386 and x86_64 complete. Arjan says: This is known to break older versions of some emacs variants, whose dumper code assumed that the last variable declared in the program is equal to the start of the dynamically allocated memory region. (The dumper is the code where emacs effectively dumps core at the end of it's compilation stage; this coredump is then loaded as the main program during normal use) iirc this was 5 years or so; we found this way back when I was at RH and we first did the security stuff there (including this brk randomization). It wasn't all variants of emacs, and it got fixed as a result (I vaguely remember that emacs already had code to deal with it for other archs/oses, just ifdeffed wrongly). It's a rare and wrong assumption as a general thing, just on x86 it mostly happened to be true (but to be honest, it'll break too if gcc does something fancy or if the linker does a non-standard order). Still its something we should at least document. Note 2: afaik it only broke the emacs build. I'm not 100% sure about that (it IS 5 years ago) though. [ akpm@linux-foundation.org: deuglification ] Signed-off-by: Jiri Kosina <jkosina@suse.cz> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Roland McGrath <roland@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:30:40 +01:00
Trond Myklebust	3fbd67ad61	NFSv4: Iterate through all nfs_clients when the server recalls a delegation The same delegation may have been handed out to more than one nfs_client. Ensure that if a recall occurs, we return all instances. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:12 -05:00
Trond Myklebust	57bfa89171	NFSv4: Deal more correctly with duplicate delegations If a (broken?) server hands out two different delegations for the same file, then we should return one of them. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:12 -05:00
Trond Myklebust	6f23e3872c	NFS: Fix a potential race between umount and nfs_access_cache_shrinker() Thanks to Yawei Niu for spotting the race. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:12 -05:00
Trond Myklebust	e6f8107595	NFS: Add an asynchronous delegreturn operation for use in nfs_clear_inode Otherwise, there is a potential deadlock if the last dput() from an NFSv4 close() or other asynchronous operation leads to nfs_clear_inode calling the synchronous delegreturn. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:12 -05:00
Benny Halevy	99fadcd764	nfs: convert NFS_*(inode) helpers to static inline Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:11 -05:00
Benny Halevy	3a10c30acc	nfs: obliterate NFS_FLAGS macro use NFS_I(inode)->flags instead Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:11 -05:00
Chuck Lever	fc6014771b	NFS: Address memory leaks in the NFS client mount option parser David Howells noticed that repeating the same mount option twice during an NFS mount request can result in orphaned memory in certain cases. Only the client_address and mount_server.hostname strings are initialized in the mount parsing loop, so those appear to be the only two pointers that might be written over by repeating a mount option. The strings in the nfs_server section of the nfs_parsed_mount_data structure are set only once after the options are parsed, thus these are not susceptible to being overwritten. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:11 -05:00
J. Bruce Fields	3d1c550874	nfs4: allow nfsv4 acls on non-regular-files The rfc doesn't give any reason it shouldn't be possible to set an attribute on a non-regular file. And if the server supports it, then it shouldn't be up to us to prevent it. Thanks to Erez for the report and Trond for further analysis. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Tested-by: Erez Zadok <ezk@cs.sunysb.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:10 -05:00
Trond Myklebust	f3c391e89c	NFS: Optimise away the sigmask code in aio/dio reads and writes There are no interruptible waits for asynchronous RPC tasks, so we don't need to wrap calls to rpc_run_task() with an rpc_clnt_sigmask/rpc_clnt_unsigmask pair. Instead we can wrap the wait_for_completion_interruptible() in nfs_direct_wait(). This means that we completely optimise away sigmask setting for the case of non-blocking aio/dio. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:10 -05:00
Trond Myklebust	65fdf7d264	NLM: Fix a bogus 'return' in nlmclnt_rpc_release Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:08 -05:00
Chuck Lever	883bb163f8	NLM: Introduce an arguments structure for nlmclnt_init() Clean up: pass 5 arguments to nlmclnt_init() in a structure similar to the new nfs_client_initdata structure. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2008-01-30 02:06:07 -05:00
Chuck Lever	1093a60ef3	NLM/NFS: Use cached nlm_host when calling nlmclnt_proc() Now that each NFS mount point caches its own nlm_host structure, it can be passed to nlmclnt_proc() for each lock request. By pinning an nlm_host for each mount point, we trade the overhead of looking up or creating a fresh nlm_host struct during every NLM procedure call for a little extra memory. We also restrict the nlmclnt_proc symbol to limit the use of this call to in-tree modules. Note that nlm_lookup_host() (just removed from the client's per-request NLM processing) could also trigger an nlm_host garbage collection. Now client-side nlm_host garbage collection occurs only during NFS mount processing. Since the NFS client now holds a reference on these nlm_host structures, they wouldn't have been affected by garbage collection anyway. Given that nlm_lookup_host() reorders the global nlm_host chain after every successful lookup, and that a garbage collection could be triggered during the call, we've removed a significant amount of per-NLM-request CPU processing overhead. Sidebar: there are only a few remaining references to the internals of NFS inodes in the client-side NLM code. The only references I found are related to extracting or comparing the inode's file handle via NFS_FH(). One is in nlmclnt_grant(); the other is in nlmclnt_setlockargs(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:07 -05:00
Chuck Lever	9289e7f91a	NFS: Invoke nlmclnt_init during NFS mount processing Cache an appropriate nlm_host structure in the NFS client's mount point metadata for later use. Note that there is no need to set NFS_MOUNT_NONLM in the error case -- if nfs_start_lockd() returns a non-zero value, its callers ensure that the mount request fails outright. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:07 -05:00
Chuck Lever	52c4044d00	NLM: Introduce external nlm_host set-up and tear-down functions We would like to remove the per-lock-operation nlm_lookup_host() call from nlmclnt_proc(). The new architecture pins an nlm_host structure to each NFS client superblock that has the "lock" mount option set. The NFS client passes in the pinned nlm_host structure during each call to nlmclnt_proc(). NFS client unmount processing "puts" the nlm_host so it can be garbage- collected later. This patch introduces externally callable NLM functions that handle mount-time nlm_host set up and tear-down. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:06 -05:00
Chuck Lever	cab6fc1b77	lockd: Eliminate harmless mixed sign comparison in nlmdbg_cookie2a() The cookie->len field is unsigned, so the loop index variable in nlmdbg_cookie2a() should also be unsigned. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:02 -05:00
Chuck Lever	3d509e5454	NFS: nfs_write_end clean up Clean up: commit `4899f9c8` added nfs_write_end(), which introduces a conditional expression that returns an unsigned integer in one arm and a signed integer in the other. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:02 -05:00
Chuck Lever	bf4285e75c	NFS: Fix minor mixed sign comparison in NFS client's write logic Clean up: PAGE_CACHE_SIZE is unsigned, and nfs_pageio_init() takes a size_t. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:01 -05:00
Chuck Lever	d24aae41b4	NFS: Use size_t for storing name lengths Clean up: always use the same type when handling buffer lengths. As a bonus, this prevents a mixed sign comparison in idmap_lookup_name. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:01 -05:00
Chuck Lever	a661b77fc1	NFS: Fix use of copy_to_user() in idmap_pipe_upcall The idmap_pipe_upcall() function expects the copy_to_user() function to return a negative error value if the call fails, but copy_to_user() returns an unsigned long number of bytes that couldn't be copied. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:01 -05:00
Chuck Lever	369af0f116	NFS: Clean up fs/nfs/idmap.c Clean up white space damage and use standard kernel coding conventions for return statements. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:00 -05:00
Trond Myklebust	59dca3b28c	NFS: Fix the 'proto=' mount option Currently, if you have a server mounted using networking protocol, you cannot specify a different value using the 'proto=' option on another mountpoint. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:00 -05:00
Trond Myklebust	331702337f	NFS: Support per-mountpoint timeout parameters. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:59 -05:00
Trond Myklebust	7a3e3e18e4	NFS: Ensure that we respect NFS_MAX_TCP_TIMEOUT It isn't sufficient just to limit timeout->to_initval, we also need to limit to_maxval. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:59 -05:00
Trond Myklebust	69dd716c5f	NFSv4: Add socket proto argument to setclientid Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:58 -05:00
Chuck Lever	3c7c7e4812	NFS: Pull covers off IPv6 address parsing Now that the needed IPv6 infrastructure is in place, allow the NFS client's IP address parser to generate AF_INET6 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:57 -05:00
Chuck Lever	4c56801770	NFS: Support non-IPv4 addresses in nfs_parsed_mount_data Replace the nfs_server and mount_server address fields in the nfs_parsed_mount_data structure with a "struct sockaddr_storage" instead of a "struct sockaddr_in". Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:57 -05:00
Chuck Lever	9412b92772	NFS: Refactor mount option address parsing into separate function Refactor the logic to parse incoming text-based IP addresses. Use the in4_pton() function instead of the older in_aton(), following the lead of the in-kernel CIFS client. Later we'll add IPv6 address parsing using the matching in6_pton() function. For now we can't allow IPv6 address parsing: we must expand the size of the address storage fields in the nfs_parsed_mount_options struct before we can parse and store IPv6 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:56 -05:00
Chuck Lever	338320345b	NFS: Remove the NIPQUAD from nfs_try_mount In the name of address family compatibility, we can't have the NIP_FMT and NIPQUAD macros in nfs_try_mount(). Instead, we can make use of an unused mount option to display the mount server's hostname. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:56 -05:00
Chuck Lever	6677d09513	NFS: Adjust nfs_clone_mount structure to store "struct sockaddr " Change the addr field in the nfs_clone_mount structure to store a "struct sockaddr " to support non-IPv4 addresses in the NFS client. Note this is mostly a cosmetic change, and does not actually allow referrals using IPv6 addresses. The existing referral code assumes that the server returns a string that represents an IPv4 address. This code needs to support hostnames and IPv6 addresses as well as IPv4 addresses, thus it will need to be reorganized completely (to handle DNS resolution in user space). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:56 -05:00
Chuck Lever	dcecae0ff4	NFS: Change nfs4_set_client() to accept struct sockaddr * Adjust the arguments and callers of nfs4_set_client() to pass a "struct sockaddr " instead of a "struct sockaddr_in " to support non-IPv4 addresses in the NFS client. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:56 -05:00
Chuck Lever	d7422c472b	NFS: Change nfs_get_client() to take sockaddr * Adjust arguments and callers of nfs_get_client() to pass a "struct sockaddr " instead of "struct sockaddr_in " to support non-IPv4 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:55 -05:00
Chuck Lever	ff052645c9	NFS: Change nfs_find_client() to take "struct sockaddr " Adjust arguments and callers of nfs_find_client() to pass a "struct sockaddr " instead of "struct sockaddr_in *" to support non-IPv4 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Trond: Also fix up protocol version number argument in nfs_find_client() to use the correct u32 type. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:55 -05:00
Chuck Lever	c1d3586656	NFS: Change cb_recallargs to pass "struct sockaddr " instead of sockaddr_in Change the addr field in the cb_recallargs struct to a "struct sockaddr " to support non-IPv4 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:55 -05:00
Chuck Lever	671beed7e2	NFS: Change cb_getattrargs to pass "struct sockaddr " instead of sockaddr_in Change the addr field in the cb_getattrargs struct to a "struct sockaddr " to support non-IPv4 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:54 -05:00
Chuck Lever	6e4cffd7b2	NFS: Expand server address storage in nfs_client struct Prepare for managing larger addresses in the NFS client by widening the nfs_client struct's cl_addr field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> (Modified to work with the new parameters for nfs_alloc_client) Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:54 -05:00
Trond Myklebust	3b0d3f93d0	NFS: Add support for AF_INET6 addresses in __nfs_find_client() Introduce AF_INET6-specific address checking to __nfs_find_client(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:54 -05:00
Chuck Lever	0d0f0c192d	NFS: Set default port for NFSv4, with support for AF_INET6 Create a helper function to set the default NFS port for NFSv4 mount points. The helper supports both AF_INET and AF_INET6 family addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:54 -05:00
Chuck Lever	04dcd6e3ac	NFS: Make setting a port number agostic We'll need to set the port number of an AF_INET or AF_INET6 address in several places in fs/nfs/super.c, so introduce a helper that can manage this for us. We put this helper to immediate use. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:53 -05:00
Chuck Lever	cdcd7f9abc	NFS: Verify IPv6 addresses properly Add support to nfs_verify_server_address for recognizing AF_INET6 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:53 -05:00
Chuck Lever	fd00a8ff8e	NFS: Add support for AF_INET6 addresses in nfs_compare_super() Refactor nfs_compare_super() and add AF_INET6 support. Replace the generic memcmp() to document explicitly what parts of the addresses must match in this check, and make the comparison independent of the lengths of both addresses. A side benefit is both tests are more computationally efficient than a memcmp(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:53 -05:00
Chuck Lever	3f43c6667a	NFS: Address a couple of nits in nfs_follow_referral() Clean up: fix an outdated block comment, and address a comparison between a signed and unsigned integer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:52 -05:00
Chuck Lever	1d98fe6717	NFS: Move dprintks from callback.c to callback_proc.c Clean up: The client side peer address is available in callback_proc.c, so move a dprintk out of fs/nfs/callback.c and into fs/nfs/callback_proc.c. This is more consistent with other debugging messages, and the proc routines have more information about each request to display. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:52 -05:00
Chuck Lever	5d8515caeb	NFS: eliminate NIPQUAD(clp->cl_addr.sin_addr) To ensure the NFS client displays IPv6 addresses properly, replace address family-specific NIPQUAD() invocations with a call to the RPC client to get a formatted string representing the remote peer's address. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:52 -05:00
Chuck Lever	d4d3c50749	NFS: Enable NFS client to generate CLIENTID strings with IPv6 addresses We recently added methods to RPC transports that provide string versions of the remote peer address information. Convert the NFSv4 SETCLIENTID procedure to use those methods instead of building the client ID out of whole cloth. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:51 -05:00
Chuck Lever	cc38bac3a0	NFS: Ensure NFSv4 SETCLIENTID send buffer is large enough Ensure that the RPC buffer size specified for NFSv4 SETCLIENTID procedures matches what we are encoding into the buffer. See the definition of struct nfs4_setclientid {} and the encode_setclientid() function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:51 -05:00
Trond Myklebust	40c553193d	NFS: Remove the redundant nfs_client->cl_nfsversion We can get the same information from the rpc_ops structure instead. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:49 -05:00
Trond Myklebust	c81468a1a7	NFS: Clean up the nfs_find_client function. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:48 -05:00
Trond Myklebust	3a498026ee	NFS: Clean up the nfs_client initialisation Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:48 -05:00
Trond Myklebust	bfc69a4566	NFS: define a function to update nfsi->cache_change_attribute Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:47 -05:00
Chuck Lever	5cce428d95	NFS: Remove an unneeded check in decode_compound_header_arg() Clean up: The header tag length is unsigned, so checking that it is less than zero is unnecessary. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:46 -05:00
Chuck Lever	d45273ed6f	NFS: Clean up address comparison in __nfs_find_client() The address comparison in the __nfs_find_client() function is deceptive. It uses a memcmp() to check a pair of u32 fields for equality. Not only is this inefficient, but usually memcmp() is used for comparing two whole sockaddr_in's (which includes comparisons of the address family and port number), so it's easy to mistake the comparison here for a whole sockaddr comparison, which it isn't. So for clarity and efficiency, we replace the memcmp() with a simple test for equality between the two s_addr fields. This should have no behavioral effect. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:46 -05:00
Chuck Lever	6a0ed1de8e	NFS: Clean up: copy hostname with kstrndup during mount processing Clean up: mount option parsing uses kstrndup in several places, rather than using kzalloc. Replace the few remaining uses of kzalloc with kstrndup, for consistency. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:46 -05:00
Chuck Lever	e887cbcf91	NFS: Remove support for the 'mountprog' option Remove the mount option that allows users to specify an alternate mountd program number. The client hasn't support setting an alternate mountd program number for a very long time. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:46 -05:00
Chuck Lever	ad879cef85	NFS: Remove support for the 'nfsprog' option Remove the mount option that allows users to specify an alternate NFS program number. The client hasn't support setting an alternate NFS program number for a very long time. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:45 -05:00
Chuck Lever	0eb2574121	NFS: Ensure that NFS version 4 mounts use NFS_PORT if nfsport wasn't set Text-based mount option parsing introduced a minor regression in the behavior of NFS version 4 mounts. NFS version 4 is not supposed to require a running rpcbind service on the server in order for a mount to succeed. In other words, if the mount options don't specify a port number, the port number is supposed to default to 2049. For earlier versions of NFS, the default port number was zero in order to cause the RPC client to autobind to the server's NFS service. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:45 -05:00
Chuck Lever	28c494c5c8	NFS: Prevent nfs_getattr() hang during heavy write workloads POSIX requires that ctime and mtime, as reported by the stat(2) call, reflect the activity of the most recent write(2). To that end, nfs_getattr() flushes pending dirty writes to a file before doing a GETATTR to allow the NFS server to set the file's size, ctime, and mtime properly. However, nfs_getattr() can be starved when a constant stream of application writes to a file prevents nfs_wb_nocommit() from completing. This usually results in hangs of programs doing a stat against an NFS file that is being written. "ls -l" is a common victim of this behavior. To prevent starvation, hold the file's i_mutex in nfs_getattr() to freeze applications writes temporarily so the client can more quickly obtain clean values for a file's size, mtime, and ctime. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:45 -05:00
Chuck Lever	464ad6b1ad	NFS: Change sign of some loop indices in nfs4xdr.c Nit: Eliminate some mixed sign comparisons in loop indices. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:44 -05:00
Chuck Lever	bcecff77a9	NFS: Use unsigned intermediates for manipulating header lengths (NFSv4 XDR) Clean up: prevent length underflow and mixed sign comparison when unmarshalling NFS version 4 getacl, readdir, and readlink replies. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:44 -05:00
Chuck Lever	c957c526ef	NFS: Use unsigned intermediates for manipulating header lengths (NFSv3 XDR) Clean up: prevent length underflow and mixed sign comparisons when unmarshalling NFS version 3 read, readdir, and readlink replies. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:44 -05:00
Chuck Lever	6232dbbcff	NFS: Use unsigned intermediates for manipulating header lengths (NFSv2 XDR) Clean up: prevent length underflow and mixed sign comparisons when unmarshalling NFS version 2 read, readdir, and readlink replies. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:44 -05:00
Chuck Lever	8a8c74bf94	NFS: Ensure nfs_wcc_update_inode always converts file size to loff_t The nfs_wcc_update_inode() function omits logic to convert the type of the NFS on-the-wire value of a file's size (__u64) to the type of file size value stored in struct inode (loff_t, which is signed). Everywhere else in the NFS client I checked already correctly converts the file size type. This effects only very large files. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:43 -05:00
Trond Myklebust	0773769191	NFS/SUNRPC: Convert users of rpc_init_task+rpc_execute to rpc_run_task() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:39 -05:00
Trond Myklebust	5138fde011	NFS/SUNRPC: Convert all users of rpc_call_setup() Replace use of rpc_call_setup() with rpc_init_task(), and in cases where we need to initialise task->tk_action, with rpc_call_start(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:32 -05:00
Trond Myklebust	bdc7f021f3	NFS: Clean up the (commit\|read\|write)_setup() callback routines Move the common code for setting up the nfs_write_data and nfs_read_data structures into fs/nfs/read.c, fs/nfs/write.c and fs/nfs/direct.c. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:32 -05:00
Trond Myklebust	3ff7576dda	SUNRPC: Clean up the initialisation of priority queue scheduling info. We want the default scheduling priority (priority == 0) to remain RPC_PRIORITY_NORMAL. Also ensure that the priority wait queue scheduling is per process id instead of sometimes being per thread, and sometimes being per inode. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:30 -05:00
Trond Myklebust	c970aa85e7	SUNRPC: Clean up rpc_run_task Make it use the new task initialiser structure instead of acting as a wrapper. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:30 -05:00
Trond Myklebust	84115e1cd4	SUNRPC: Cleanup of rpc_task initialisation Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:30 -05:00
Steve Dickson	ef818a28fa	NFS: Stop sillyname renames and unmounts from racing Added an active/deactive mechanism to the nfs_server structure allowing async operations to hold off umount until the operations are done. Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:24 -05:00
Trond Myklebust	2f74c0a056	NFSv4: Clean up the OPEN/CLOSE serialisation code Reduce the time spent locking the rpc_sequence structure by queuing the nfs_seqid only when we are ready to take the lock (when calling nfs_wait_on_sequence). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:24 -05:00
Trond Myklebust	acee478afc	NFS: Clean up the write request locking. Ensure that we set/clear NFS_PAGE_TAG_LOCKED when the nfs_page is hashed. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:24 -05:00
Trond Myklebust	8b1f9ee56e	NFS: Optimise nfs_vm_page_mkwrite() The current model locks the page twice for no good reason. Optimise by inlining the parts of nfs_write_begin()/nfs_write_end() that we care about. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:23 -05:00
Trond Myklebust	77f111929d	NFS: Ensure that we eject stale inodes as soon as possible Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:22 -05:00
Trond Myklebust	d45b9d8baf	NFS: Handle -ENOENT errors in unlink()/rmdir()/rename() If the server returns an ENOENT error, we still need to do a d_delete() in order to ensure that the dentry is deleted. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:22 -05:00
Trond Myklebust	609005c319	NFS: Sillyrename: in the case of a race, check aliases are really positive In nfs_do_call_unlink() we check that we haven't raced, and that lookup() hasn't created an aliased dentry to our sillydeleted dentry. If somebody has deleted the file on the server and the lookup() resulted in a negative dentry, then ignore... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:22 -05:00
Trond Myklebust	fccca7fc6a	NFS: Fix a sillyrename race... Ensure that readdir revalidates its data cache after blocking on sillyrename. Also fix a typo in nfs_do_call_unlink(): swap the ^= for an \|=. The result is the same, since we've already checked that the flag is unset, but it makes the code more readable. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:21 -05:00
Patrick Caulfeld	39bd4177dd	dlm: close othercons This patch addresses a problem introduced with the last round of lowcomms patches where the 'othercon' connections do not get freed when the DLM shuts down. This results in the error message "slab error in kmem_cache_destroy(): cache `dlm_conn': Can't free all objects" and the DLM cannot be restarted without a system reboot. See bz#428119 Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com> Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-29 17:17:32 -06:00
David Teigland	52bda2b5ba	dlm: use dlm prefix on alloc and free functions The dlm functions in memory.c should use the dlm_ prefix. Also, use kzalloc/kfree directly for dlm_direntry's, removing the wrapper functions. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-29 17:17:19 -06:00
David Teigland	11b2498ba7	dlm: don't print common non-errors Change log_error() to log_debug() for conditions that can occur in large number in normal operation. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-29 17:17:08 -06:00
Adrian Bunk	e028398da7	dlm: proper prototypes This patch adds a proper prototype for some functions in fs/dlm/dlm_internal.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-29 17:16:52 -06:00
Lon Hohberger	6bd8fedaa1	dlm: bind connections from known local address when using TCP A common problem occurs when multiple IP addresses within the same subnet are assigned to the same NIC. If we make a connection attempt to another address on the same subnet as one of those addresses, the connection attempt will not necessarily be routed from the address we want. In the case of the DLM, the other nodes will quickly drop the connection attempt, causing problems. This patch makes the DLM bind to the local address it acquired from the cluster manager when using TCP prior to making a connection, obviating the need for administrators to "fix" their systems or use clever routing tricks. Signed-off-by: Lon Hohberger <lhh@redhat.com> Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-29 16:44:25 -06:00
Jens Axboe	9e97198dbf	splice: fix problem with atime not being updated A bug report on nfsd that states that since it was switched to use splice instead of sendfile, the atime was no longer being updated on the input file. do_generic_mapping_read() does this when accessing the file, make splice do it for the direct splice handler. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-29 21:55:20 +01:00
Linus Torvalds	0ba6c33bcd	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25: (1470 commits) [IPV6] ADDRLABEL: Fix double free on label deletion. [PPP]: Sparse warning fixes. [IPV4] fib_trie: remove unneeded NULL check [IPV4] fib_trie: More whitespace cleanup. [NET_SCHED]: Use nla_policy for attribute validation in ematches [NET_SCHED]: Use nla_policy for attribute validation in actions [NET_SCHED]: Use nla_policy for attribute validation in classifiers [NET_SCHED]: Use nla_policy for attribute validation in packet schedulers [NET_SCHED]: sch_api: introduce constant for rate table size [NET_SCHED]: Use typeful attribute parsing helpers [NET_SCHED]: Use typeful attribute construction helpers [NET_SCHED]: Use NLA_PUT_STRING for string dumping [NET_SCHED]: Use nla_nest_start/nla_nest_end [NET_SCHED]: Propagate nla_parse return value [NET_SCHED]: act_api: use PTR_ERR in tcf_action_init/tcf_action_get [NET_SCHED]: act_api: use nlmsg_parse [NET_SCHED]: act_api: fix netlink API conversion bug [NET_SCHED]: sch_netem: use nla_parse_nested_compat [NET_SCHED]: sch_atm: fix format string warning [NETNS]: Add namespace for ICMP replying code. ...	2008-01-29 22:54:01 +11:00
Linus Torvalds	5ea293a904	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild * git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (79 commits) Remove references to "make dep" kconfig: document use of HAVE_* Introduce new section reference annotations tags: __ref, __refdata, __refconst kbuild: warn about ld added unique sections kbuild: add verbose option to Section mismatch reporting in modpost kconfig: tristate choices with mixed tristate and boolean values asm-generic/vmlix.lds.h: simplify __mem{init,exit}* dependencies remove __attribute_used__ kbuild: support ARCH=x86 in buildtar kconfig: remove "enable" kbuild: simplified warning report in modpost kbuild: introduce a few helpers in modpost kbuild: use simpler section mismatch warnings in modpost kbuild: link vmlinux.o before kallsyms passes kbuild: introduce new option to enhance section mismatch analysis Use separate sections for __dev/__cpu/__mem code/data compiler.h: introduce __section() all archs: consolidate init and exit sections in vmlinux.lds.h kbuild: check section names consistently in modpost kbuild: introduce blacklisting in modpost ...	2008-01-29 22:46:14 +11:00
Linus Torvalds	8cd226ca3f	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (50 commits) jbd2: sparse pointer use of zero as null jbd2: Use round-jiffies() function for the "5 second" ext4/jbd2 wakeup jbd2: Mark jbd2 slabs as SLAB_TEMPORARY jbd2: add lockdep support ext4: Use the ext4_ext_actual_len() helper function ext4: fix uniniatilized extent splitting error ext4: Check for return value from sb_set_blocksize ext4: Add stripe= option to /proc/mounts ext4: Enable the multiblock allocator by default ext4: Add multi block allocator for ext4 ext4: Add new functions for searching extent tree ext4: Add ext4_find_next_bit() ext4: fix up EXT4FS_DEBUG builds ext4: Fix ext4_show_options to show the correct mount options. ext4: Add EXT4_IOC_MIGRATE ioctl ext4: Add inode version support in ext4 vfs: Add 64 bit i_version support ext4: Add the journal checksum feature jbd2: jbd2 stats through procfs ext4: Take read lock during overwrite case. ...	2008-01-29 22:43:38 +11:00
Mingming Cao	4019191be7	jbd2: sparse pointer use of zero as null Get rid of sparse related warnings from places that use integer as NULL pointer. (Ported from upstream ext3/jbd changes.) Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Mingming Cao	db857da336	jbd2: Use round-jiffies() function for the "5 second" ext4/jbd2 wakeup While "every 5 seconds" doesn't sound as a problem, there can be many of these (and these timers do add up over all the kernel). The "5 second" wakeup isn't really timing sensitive; in addition even with rounding it'll still happen every 5 seconds (with the exception of the very first time, which is likely to be rounded up to somewhere closer to 6 seconds) (Ported from similar JBD patch made by Arjan van de Ven to fs/jbd/transaction.c) Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Mingming Cao	77160957e2	jbd2: Mark jbd2 slabs as SLAB_TEMPORARY This patch marks slab allocations by jbd2 as short-lived in support of Mel Gorman's "Group short-lived and reclaimable kernel allocations" patch. (Ported from similar changes made to fs/jbd/journal.c and fs/jbd/revoke.c in Mel's patch.) Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Mingming Cao	7b7510662f	jbd2: add lockdep support Ported from similar patch for the jbd layer. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	b939e3766e	ext4: Use the ext4_ext_actual_len() helper function ext4 uses the high bit of the extent length to encode whether the extent is intialized or not. The helper function ext4_ext_get_actual_len should be used to get the actual length of the extent. This addresses the kernel bug documented here: http://bugzilla.kernel.org/show_bug.cgi?id=9732 kernel BUG at fs/ext4/extents.c:1056! .... Call Trace: [<ffffffff88366073>] :ext4dev:ext4_ext_get_blocks+0x5ba/0x8c1 [<ffffffff81053c91>] lock_release_holdtime+0x27/0x49 [<ffffffff812748f6>] _spin_unlock+0x17/0x20 [<ffffffff883400a6>] :jbd2:start_this_handle+0x4e0/0x4fe [<ffffffff88366564>] :ext4dev:ext4_fallocate+0x175/0x39a [<ffffffff81053c91>] lock_release_holdtime+0x27/0x49 [<ffffffff81056480>] __lock_acquire+0x4e7/0xc4d [<ffffffff81053c91>] lock_release_holdtime+0x27/0x49 [<ffffffff810a8de7>] sys_fallocate+0xe4/0x10d [<ffffffff8100c043>] tracesys+0xd5/0xda Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Dmitry Monakhov	dbf9d7da33	ext4: fix uniniatilized extent splitting error Fix bug reported by Dmitry Monakhov caused by lost error code Testcase: blksize = 0x1000; fd = open(argv[1], O_RDWR\|O_CREAT, 0700); unsigned long long sz = 0x10000000UL; /* allocating big blocks chunk / syscall(__NR_fallocate, fd, 0, 0UL, sz) / grab all other available filesystem space / tfd = open("tmp", O_RDWR\|O_CREAT\|O_DIRECT, 0700); while( write(tfd, buf, 4096) > 0); / loop untill ENOSPC / fsync(fd); / just in case / while (pos < sz) { / each seek+ write operation result in splits uninitialized extent in three extents. Splitting may result in new extent allocation which probably will fail because of ENOSPC/ lseek(fd, blksize2 -1, SEEK_CUR); if ((ret = write(fd, 'a', 1)) != 1) exit(1); pos += blksize * 2; } Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	ce40733ce9	ext4: Check for return value from sb_set_blocksize sb_set_blocksize validates whether the specfied block size can be used by the file system. Make sure we fail mounting the file system if the blocksize specfied cannot be used. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Miklos Szeredi	cb45bbe44b	ext4: Add stripe= option to /proc/mounts Add stripe= option to /proc/mounts for ext4 filesystems. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	3dbd0ede4d	ext4: Enable the multiblock allocator by default Enable the multiblock allocator by default. Fix ext4_show_options() so if it is not enabled, the nomballoc option included in /proc/mounts. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:26 -05:00
Alex Tomas	c9de560ded	ext4: Add multi block allocator for ext4 Signed-off-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-29 00:19:52 -05:00
Alex Tomas	1988b51e47	ext4: Add new functions for searching extent tree Add the functions ext4_ext_search_left() and ext4_ext_search_right(), which are used by mballoc during ext4_ext_get_blocks to decided whether to merge extent information. Signed-off-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Johann Lombardi <johann@clusterfs.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Eric Sandeen	c549a95d40	ext4: fix up EXT4FS_DEBUG builds Builds with EXT4FS_DEBUG defined (to enable ext4_debug()) fail without these changes. Clean up some format warnings too. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	aa22df2cc8	ext4: Fix ext4_show_options to show the correct mount options. We need to look at the default value and make sure the mount options are not set via default value before showing them via ext4_show_options Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	c14c6fd5c5	ext4: Add EXT4_IOC_MIGRATE ioctl The below patch add ioctl for migrating ext3 indirect block mapped inode to ext4 extent mapped inode. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:26 -05:00
Jean Noel Cordenner	25ec56b518	ext4: Add inode version support in ext4 This patch adds 64-bit inode version support to ext4. The lower 32 bits are stored in the osd1.linux1.l_i_version field while the high 32 bits are stored in the i_version_hi field newly created in the ext4_inode. This field is incremented in case the ext4_inode is large enough. A i_version mount option has been added to enable the feature. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Jean Noel Cordenner <jean-noel.cordenner@bull.net>	2008-01-28 23:58:27 -05:00
Jean Noel Cordenner	7a224228ed	vfs: Add 64 bit i_version support The i_version field of the inode is changed to be a 64-bit counter that is set on every inode creation and that is incremented every time the inode data is modified (similarly to the "ctime" time-stamp). The aim is to fulfill a NFSv4 requirement for rfc3530. This first part concerns the vfs, it converts the 32-bit i_version in the generic inode to a 64-bit, a flag is added in the super block in order to check if the feature is enabled and the i_version is incremented in the vfs. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Jean Noel Cordenner <jean-noel.cordenner@bull.net> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com>	2008-01-28 23:58:27 -05:00
Girish Shilamkar	818d276ceb	ext4: Add the journal checksum feature The journal checksum feature adds two new flags i.e JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM. JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the checksum for the blocks described by the descriptor blocks. Due to checksums, writing of the commit record no longer needs to be synchronous. Now commit record can be sent to disk without waiting for descriptor blocks to be written to disk. This behavior is controlled using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be able to recover the journal with _ASYNC_COMMIT hence it is made incompat. The commit header has been extended to hold the checksum along with the type of the checksum. For recovery in pass scan checksums are verified to ensure the sanity and completeness(in case of _ASYNC_COMMIT) of every transaction. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Girish Shilamkar <girish@clusterfs.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Johann Lombardi	8e85fb3f30	jbd2: jbd2 stats through procfs The patch below updates the jbd stats patch to 2.6.20/jbd2. The initial patch was posted by Alex Tomas in December 2005 (http://marc.info/?l=linux-ext4&m=113538565128617&w=2). It provides statistics via procfs such as transaction lifetime and size. Sometimes, investigating performance problems, i find useful to have stats from jbd about transaction's lifetime, size, etc. here is a patch for review and inclusion probably. for example, stats after creation of 3M files in htree directory: [root@bob ~]# cat /proc/fs/jbd/sda/history R/C tid wait run lock flush log hndls block inlog ctime write drop close R 261 8260 2720 0 0 750 9892 8170 8187 C 259 750 0 4885 1 R 262 20 2200 10 0 770 9836 8170 8187 R 263 30 2200 10 0 3070 9812 8170 8187 R 264 0 5000 10 0 1340 0 0 0 C 261 8240 3212 4957 0 R 265 8260 1470 0 0 4640 9854 8170 8187 R 266 0 5000 10 0 1460 0 0 0 C 262 8210 2989 4868 0 R 267 8230 1490 10 0 4440 9875 8171 8188 R 268 0 5000 10 0 1260 0 0 0 C 263 7710 2937 4908 0 R 269 7730 1470 10 0 3330 9841 8170 8187 R 270 0 5000 10 0 830 0 0 0 C 265 8140 3234 4898 0 C 267 720 0 4849 1 R 271 8630 2740 20 0 740 9819 8170 8187 C 269 800 0 4214 1 R 272 40 2170 10 0 830 9716 8170 8187 R 273 40 2280 0 0 3530 9799 8170 8187 R 274 0 5000 10 0 990 0 0 0 where, R - line for transaction's life from T_RUNNING to T_FINISHED C - line for transaction's checkpointing tid - transaction's id wait - for how long we were waiting for new transaction to start (the longest period journal_start() took in this transaction) run - real transaction's lifetime (from T_RUNNING to T_LOCKED lock - how long we were waiting for all handles to close (time the transaction was in T_LOCKED) flush - how long it took to flush all data (data=ordered) log - how long it took to write the transaction to the log hndls - how many handles got to the transaction block - how many blocks got to the transaction inlog - how many blocks are written to the log (block + descriptors) ctime - how long it took to checkpoint the transaction write - how many blocks have been written during checkpointing drop - how many blocks have been dropped during checkpointing close - how many running transactions have been closed to checkpoint this one all times are in msec. [root@bob ~]# cat /proc/fs/jbd/sda/info 280 transaction, each upto 8192 blocks average: 1633ms waiting for transaction 3616ms running transaction 5ms transaction was being locked 1ms flushing data (in ordered mode) 1799ms logging transaction 11781 handles per transaction 5629 blocks per transaction 5641 logged blocks per transaction Signed-off-by: Johann Lombardi <johann.lombardi@bull.net> Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	4df3d265bf	ext4: Take read lock during overwrite case. When we are overwriting a file and not actually allocating new file system blocks we need to take only the read lock on i_data_sem. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:29 -05:00
Aneesh Kumar K.V	0e855ac8b1	ext4: Convert truncate_mutex to read write semaphore. We are currently taking the truncate_mutex for every read. This would have performance impact on large CPU configuration. Convert the lock to read write semaphore and take read lock when we are trying to read the file. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:26 -05:00
Aneesh Kumar K.V	c278bfeceb	ext4: Make ext4_get_blocks_wrap take the truncate_mutex early. When doing a migrate from ext3 to ext4 inode we need to make sure the test for inode type and walking inode data happens inside lock. To make this happen move truncate_mutex early before checking the i_flags. This actually should enable us to remove the verify_chain(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Mariusz Kozlowski	01f4adc044	ext4: remove unused code from ext4_find_entry() The unused code found in ext3_find_entry() is also present (and still unused) in the ext4_find_entry() code. This patch removes it. Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	221879c927	ext4: Check for the correct error return from ext4_ext_get_blocks returns negative values on error. We should check for <= 0 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Jan Kara	f5a7a6b0d9	jbd2: Fix assertion failure in fs/jbd2/checkpoint.c Before we start committing a transaction, we call __journal_clean_checkpoint_list() to cleanup transaction's written-back buffers. If this call happens to remove all of them (and there were already some buffers), __journal_remove_checkpoint() will decide to free the transaction because it isn't (yet) a committing transaction and soon we fail some assertion - the transaction really isn't ready to be freed :). We change the check in __journal_remove_checkpoint() to free only a transaction in T_FINISHED state. The locking there is subtle though (as everywhere in JBD ;(). We use j_list_lock to protect the check and a subsequent call to __journal_drop_transaction() and do the same in the end of journal_commit_transaction() which is the only place where a transaction can get to T_FINISHED state. Probably I'm too paranoid here and such locking is not really necessary - checkpoint lists are processed only from log_do_checkpoint() where a transaction must be already committed to be processed or from __journal_clean_checkpoint_list() where kjournald itself calls it and thus transaction cannot change state either. Better be safe if something changes in future... Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	abcb2947c9	ext4: add block bitmap validation When a new block bitmap is read from disk in read_block_bitmap() there are a few bits that should ALWAYS be set. In particular, the blocks given corresponding to block bitmap, inode bitmap and inode tables. Validate the block bitmap against these blocks. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	389d1b083c	Add buffer head related helper functions Add buffer head related helper function bh_uptodate_or_lock and bh_submit_read which can be used by file system Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:26 -05:00
Aneesh Kumar K.V	bb4f397a1a	ext4: Change the default behaviour on error ext4 file system was by default ignoring errors and continuing. This is not a good default as continuing on error could lead to file system corruption. Change the default to mark the file system readonly. Debian and ubuntu already does this as the default in their fstab. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:26 -05:00
Eric Sandeen	e7c9559300	ext4: fix oops on corrupted ext4 mount When mounting an ext4 filesystem with corrupted s_first_data_block, things can go very wrong and oops. Because blocks_count in ext4_fill_super is a u64, and we must use do_div, the calculation of db_count is done differently than on ext4. If first_data_block is corrupted such that it is larger than ext4_blocks_count, for example, then the intermediate blocks_count value may go negative, but sign-extend to a very large value: blocks_count = (ext4_blocks_count(es) - le32_to_cpu(es->s_first_data_block) + EXT4_BLOCKS_PER_GROUP(sb) - 1); This is then assigned to s_groups_count which is an unsigned long: sbi->s_groups_count = blocks_count; This may result in a value of 0xFFFFFFFF which is then used to compute db_count: db_count = (sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) / EXT4_DESC_PER_BLOCK(sb); and in this case db_count will wind up as 0 because the addition overflows 32 bits. This in turn causes the kmalloc for group_desc to be of 0 size: sbi->s_group_desc = kmalloc(db_count * sizeof (struct buffer_head *), GFP_KERNEL); and eventually in ext4_check_descriptors, dereferencing sbi->s_group_desc[desc_block] will result in a NULL pointer dereference. The simplest test seems to be to sanity check s_first_data_block, EXT4_BLOCKS_PER_GROUP, and ext4_blocks_count values to be sure their combination won't result in a bad intermediate value for blocks_count. We could just check for db_count == 0, but catching it at the root cause seems like it provides more info. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Adrian Bunk	07620f69ef	ext4/super.c: fix #ifdef's (CONFIG_EXT4_* -> CONFIG_EXT4DEV_*) Based on a report by Robert P. J. Day. Signed-off-by: Adrian Bunk <bunk@kernel.org>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	cb47dce791	ext4: Return after ext4_error in case of failures This fix some instances where we were continuing after calling ext4_error. ext4_error call panic only if errors=panic mount option is set. So we need to make sure we return correctly after ext4_error call Reported by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	fe7fdc37b5	ext3: Fix the max file size for ext3 file system. The max file size for ext3 file system is now calculated with hardcoded 4K block size. The patch fixes it to be calculated with the right block size. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:26 -05:00
Aneesh Kumar K.V	902be4c5ef	ext2: Fix the max file size for ext2 file system. The max file size for ext2 file system is now calculated with hardcoded 4K block size. The patch fixes it to be calculated with the right block size. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:26 -05:00
Eric Sandeen	e2b4657453	ext4: store maxbytes for bitmapped files and return EFBIG as appropriate Calculate & store the max offset for bitmapped files, and catch too-large seeks, truncates, and writes in ext4, shortening or rejecting as appropriate. Signed-off-by: Eric Sandeen <sandeen@redhat.com>	2008-01-28 23:58:27 -05:00
Eric Sandeen	19295529db	ext4: export iov_shorten from kernel for ext4's use Export iov_shorten() from kernel so that ext4 can truncate too-large writes to bitmapped files. Signed-off-by: Eric Sandeen <sandeen@redhat.com>	2008-01-28 23:58:27 -05:00
Eric Sandeen	cd2291a463	ext4: different maxbytes functions for bitmap & extent files use 2 different maxbytes functions for bitmapped & extent-based files. Signed-off-by: Eric Sandeen <sandeen@redhat.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	8180a5627d	ext4: Support large files This patch converts ext4_inode i_blocks to represent total blocks occupied by the inode in file system block size. Earlier the variable used to represent this in 512 byte block size. This actually limited the total size of the file. The feature is enabled transparently when we write an inode whose i_blocks cannot be represnted as 512 byte units in a 48 bit variable. inode flag EXT4_HUGE_FILE_FL Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	0fc1b45147	ext4: Add support for 48 bit inode i_blocks. Use the __le16 l_i_reserved1 field of the linux2 struct of ext4_inode to represet the higher 16 bits for i_blocks. With this change max_file size becomes (2*48 -1 ) 512 bytes. We add a RO_COMPAT feature to the super block to indicate that inode have i_blocks represented as a split 48 bits. Super block with this feature set cannot be mounted read write on a kernel with CONFIG_LSF disabled. Super block flag EXT4_FEATURE_RO_COMPAT_HUGE_FILE Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:26 -05:00
Aneesh Kumar K.V	a48380f769	ext4: Rename i_dir_acl to i_size_high Rename ext4_inode.i_dir_acl to i_size_high drop ext4_inode_info.i_dir_acl as it is not used Rename ext4_inode.i_size to ext4_inode.i_size_lo Add helper function for accessing the ext4_inode combined i_size. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	7973c0c19e	ext4: Rename i_file_acl to i_file_acl_lo Rename i_file_acl to i_file_acl_lo. This helps in finding bugs where we use i_file_acl instead of the combined i_file_acl_lo and i_file_acl_high Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	1d03ec984c	ext4: Fix sparse warnings. Fix sparse warnings related to static functions and local variables. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	99e6f829a8	ext4: Introduce ext4_update__feature Introduce ext4_update__feature and use them instead of opencoding. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Avantika Mathur	2aa9fc4c40	ext4: fixes block group number being set to a negative value This patch fixes various places where the group number is set to a negative value. Signed-off-by: Avantika Mathur <mathur@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Avantika Mathur	fd2d42912f	ext4: add ext4_group_t, and change all group variables to this type. In many places variables for block group are of type int, which limits the maximum number of block groups to 2^31. Each block group can have up to 2^15 blocks, with a 4K block size, and the max filesystem size is limited to 2^31 * (2^15 * 2^12) = 2^58 -- or 256 PB This patch introduces a new type ext4_group_t, of type unsigned long, to represent block group numbers in ext4. All occurrences of block group variables are converted to type ext4_group_t. Signed-off-by: Avantika Mathur <mathur@us.ibm.com>	2008-01-28 23:58:27 -05:00
Eric Sandeen	bba907433b	ext4 extents: remove unneeded casts There are many casts in extents.c which are not needed, as the variables are already the type of the cast, or are being promoted for no particular reason in printk's. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	725d26d3f0	ext4: Introduce ext4_lblk_t This patch adds a new data type ext4_lblk_t to represent the logical file blocks. This is the preparatory patch to support large files in ext4 The follow up patch with convert the ext4_inode i_blocks to represent the number of blocks in file system block size. This changes makes it possible to have a block number 2**32 -1 which will result in overflow if the block number is represented by signed long. This patch convert all the block number to type ext4_lblk_t which is typedef to __u32 Also remove dead code ext4_ext_walk_space Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>	2008-01-28 23:58:27 -05:00
Jan Kara	a72d7f834e	ext4: Avoid rec_len overflow with 64KB block size With 64KB blocksize, a directory entry can have size 64KB which does not fit into 16 bits we have for entry lenght. So we store 0xffff instead and convert value when read from / written to disk. The patch also converts some places to use ext4_next_entry() when we are changing them anyway. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Takashi Sato	afc7cbca5b	ext4: Support large blocksize up to PAGESIZE This patch set supports large block size(>4k, <=64k) in ext4, just enlarging the block size limit. But it is NOT possible to have 64kB blocksize on ext4 without some changes to the directory handling code. The reason is that an empty 64kB directory block would have a rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in the filesystem. The proposed solution is treat 64k rec_len with a an impossible value like rec_len = 0xffff to handle this. The Patch-set consists of the following 2 patches. [1/2] ext4: enlarge blocksize - Allow blocksize up to pagesize [2/2] ext4: fix rec_len overflow - prevent rec_len from overflow with 64KB blocksize Now on 64k page ppc64 box runs with this patch set we could create a 64k block size ext4dev, and able to handle empty directory block. Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Joel Becker	6b11d8179d	ocfs2: Fix userspace ABI breakage in sysfs The userspace ABI of ocfs2's internal cluster stack (o2cb) was broken by commit `c60b717879` "kset: convert ocfs2 to use kset_create". Specifically, the '/sys/o2cb' kset was moved to '/sys/fs/o2cb'. This breaks all ocfs2 tools and renders the filesystem unmountable. This fix moves '/sys/o2cb' back where it belongs. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-28 19:10:23 -08:00
Denis V. Lunev	b7c6ba6eb1	[NETNS]: Consolidate kernel netlink socket destruction. Create a specific helper for netlink kernel socket disposal. This just let the code look better and provides a ground for proper disposal inside a namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:07 -08:00
Denis V. Lunev	e5d69b9f4a	[ATM]: Oops reading net/atm/arp cat /proc/net/atm/arp causes the NULL pointer dereference in the get_proc_net+0xc/0x3a. This happens as proc_get_net believes that the parent proc dir entry contains struct net. Fix this assumption for "net/atm" case. The problem is introduced by the commit c0097b07abf5f92ab135d024dd41bd2aada1512f from Eric W. Biederman/Daniel Lezcano. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:36 -08:00
Denis V. Lunev	e372c41401	[NET]: Consolidate net namespace related proc files creation. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:28 -08:00
Jens Axboe	bbdfc2f706	[SPLICE]: Don't assume regular pages in splice_to_pipe() Allow caller to pass in a release function, there might be other resources that need releasing as well. Needed for network receive. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:30 -08:00
Adrian Bunk	3ff6eecca4	remove __attribute_used__ Remove the deprecated __attribute_used__. [Introduce __section in a few places to silence checkpatch /sam] Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>	2008-01-28 23:21:18 +01:00
WANG Cong	7491a76b23	FS: Remove dead code Remove dead code in smbfs makefile. Cc: Al Viro <viro@www.linux.org.uk> Cc: Tim Shimmin <xfs-masters@oss.sgi.com> Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>	2008-01-28 23:14:37 +01:00
Linus Torvalds	8d01eddf29	Merge branch 'for-2.6.25' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.25' of git://git.kernel.dk/linux-2.6-block: block: implement drain buffers __bio_clone: don't calculate hw/phys segment counts block: allow queue dma_alignment of zero blktrace: Add blktrace ioctls to SCSI generic devices	2008-01-29 08:51:56 +11:00
Jens Axboe	0871714e08	cfq-iosched: relax IOPRIO_CLASS_IDLE restrictions Currently you must be root to set idle io prio class on a process. This is due to the fact that the idle class is implemented as a true idle class, meaning that it will not make progress if someone else is requesting disk access. Unfortunately this means that it opens DOS opportunities by locking down file system resources, hence it is root only at the moment. This patch relaxes the idle class a little, by removing the truly idle part (which entals a grace period with associated timer). The modifications make the idle class as close to zero impact as can be done while still guarenteeing progress. This means we can relax the root only criteria as well. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 11:38:15 +01:00
Jens Axboe	d38ecf935f	io context sharing: preliminary support Detach task state from ioc, instead keep track of how many processes are accessing the ioc. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 10:50:31 +01:00
Jens Axboe	fd0928df98	ioprio: move io priority from task_struct to io_context This is where it belongs and then it doesn't take up space for a process that doesn't do IO. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 10:50:29 +01:00
Jens Axboe	5d84070ee0	__bio_clone: don't calculate hw/phys segment counts If the users sets a new ->bi_bdev on the bio after __bio_clone() has returned it, the "segment counts valid" flag still remains even though it may be different with the new target. So don't calculate segment counts in __bio_clone(). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 10:04:46 +01:00
Linus Torvalds	ef3f2de2b5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] DFS build fixes [CIFS] DFS support: provide shrinkable mounts [CIFS] Do not log path names in lookup errors [CIFS] DFS support patchset: Added mountdata [CIFS] Forgot to add two new files from previous commit [CIFS] DNS name resolution helper upcall for cifs [CIFS] fix checkpatch warnings in fs/cifs/inode.c [CIFS] hold ses sem on tcp session reconnect during mount [CIFS] Allow setting mode via cifs acl [CIFS] fix unicode string alignment in SPNEGO setup [CIFS] cifs_partialpagewrite() cleanup [CIFS] use krb5 session key from first SMB session after a NegProt [CIFS] redo existing session setup if needed in cifs_mount [CIFS] Only dump SPNEGO key if CONFIG_CIFS_DEBUG2 is set [CIFS] fix SetEA failure to some Samba versions	2008-01-26 23:01:20 -08:00
Linus Torvalds	9b73e76f3c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (200 commits) [SCSI] usbstorage: use last_sector_bug flag universally [SCSI] libsas: abstract STP task status into a function [SCSI] ultrastor: clean up inline asm warnings [SCSI] aic7xxx: fix firmware build [SCSI] aacraid: fib context lock for management ioctls [SCSI] ch: remove forward declarations [SCSI] ch: fix device minor number management bug [SCSI] ch: handle class_device_create failure properly [SCSI] NCR5380: fix section mismatch [SCSI] sg: fix /proc/scsi/sg/devices when no SCSI devices [SCSI] IB/iSER: add logical unit reset support [SCSI] don't use __GFP_DMA for sense buffers if not required [SCSI] use dynamically allocated sense buffer [SCSI] scsi.h: add macro for enclosure bit of inquiry data [SCSI] sd: add fix for devices with last sector access problems [SCSI] fix pcmcia compile problem [SCSI] aacraid: add Voodoo Lite class of cards. [SCSI] aacraid: add new driver features flags [SCSI] qla2xxx: Update version number to 8.02.00-k7. [SCSI] qla2xxx: Issue correct MBC_INITIALIZE_FIRMWARE command. ...	2008-01-25 17:19:08 -08:00
Linus Torvalds	29bd17af7d	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (31 commits) ocfs2: clean up bh null checks ocfs2: document access rules for blocked_lock_list configfs: file.c fix possible recursive locking configfs: dir.c fix possible recursive locking configfs: Remove EXPERIMENTAL ocfs2: bump version number ocfs2/dlm: Clear joining_node on hearbeat node down ocfs2: convert byte order of constant instead of variable ocfs2: Update default cluster timeouts ocfs2: printf fixes ocfs2: Use generic_file_llseek ocfs2: Safer read_inline_data() ocfs2: Silence false lockdep warnings [PATCH 2/2] ocfs2: cluster aware flock() [PATCH 1/2] ocfs2: add flock lock type ocfs2: Local alloc window size changeable via mount option ocfs2: Support commit= mount option ocfs2: Add missing permission checks [PATCH 2/2] ocfs2: Implement group add for online resize [PATCH 1/2] ocfs2: Add group extend for online resize ...	2008-01-25 17:11:13 -08:00
Mark Fasheh	2fe5c1d7eb	ocfs2: clean up bh null checks If we know a buffer_head is non-null, then brelse() is unnecessary and put_bh() can be used instead. Also, an explicit check for NULL is unnecessary when using brelse(). This patch only covers buffer_head_io.c and resize.c, which have recently added code which exhibits this problem. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:48 -08:00
Mark Fasheh	7ec373cf33	ocfs2: document access rules for blocked_lock_list ocfs2_super->blocked_lock_list and ocfs2_super->blocked_lock_count have some usage restrictions which aren't immediately obvious to anyone reading the code. It's a good idea to document this so that we avoid making costly mistakes in the future. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:48 -08:00
Joonwoo Park	116ba5d5ea	configfs: file.c fix possible recursive locking configfs_register_subsystem() with default_groups triggers recursive locking. it seems that mutex_lock_nested is needed. ============================================= [ INFO: possible recursive locking detected ] 2.6.24-rc6 #145 --------------------------------------------- swapper/1 is trying to acquire lock: (&sb->s_type->i_mutex_key#3){--..}, at: [<c40c9a9e>] configfs_add_file+0x2e/0x70 but task is already holding lock: (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca985>] configfs_register_subsystem+0x55/0x130 other info that might help us debug this: 1 lock held by swapper/1: #0: (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca985>] configfs_register_subsystem+0x55/0x130 stack backtrace: Pid: 1, comm: swapper Not tainted 2.6.24-rc6 #145 [<c40053ba>] show_trace_log_lvl+0x1a/0x30 [<c4005e82>] show_trace+0x12/0x20 [<c400687e>] dump_stack+0x6e/0x80 [<c404ec72>] __lock_acquire+0xe62/0x1120 [<c404efb2>] lock_acquire+0x82/0xa0 [<c43fda88>] mutex_lock_nested+0x98/0x2e0 [<c40c9a9e>] configfs_add_file+0x2e/0x70 [<c40c9b0c>] configfs_create_file+0x2c/0x40 [<c40ca639>] configfs_attach_item+0x139/0x220 [<c40ca734>] configfs_attach_group+0x14/0x140 [<c40ca7e9>] configfs_attach_group+0xc9/0x140 [<c40ca9f6>] configfs_register_subsystem+0xc6/0x130 [<c45c8186>] init_netconsole+0x2b6/0x300 [<c45a75f2>] kernel_init+0x142/0x320 [<c4004fb3>] kernel_thread_helper+0x7/0x14 ======================= Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:47 -08:00
Joonwoo Park	ba611edfe4	configfs: dir.c fix possible recursive locking configfs_register_subsystem() with default_groups triggers recursive locking. it seems that mutex_lock_nested is needed. ============================================= [ INFO: possible recursive locking detected ] 2.6.24-rc6 #141 --------------------------------------------- swapper/1 is trying to acquire lock: (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca76f>] configfs_attach_group+0x4f/0x190 but task is already holding lock: (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca9d5>] configfs_register_subsystem+0x55/0x130 other info that might help us debug this: 1 lock held by swapper/1: #0: (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca9d5>] configfs_register_subsystem+0x55/0x130 stack backtrace: Pid: 1, comm: swapper Not tainted 2.6.24-rc6 #141 [<c40053ba>] show_trace_log_lvl+0x1a/0x30 [<c4005e82>] show_trace+0x12/0x20 [<c400687e>] dump_stack+0x6e/0x80 [<c404ec72>] __lock_acquire+0xe62/0x1120 [<c404efb2>] lock_acquire+0x82/0xa0 [<c43fdad8>] mutex_lock_nested+0x98/0x2e0 [<c40ca76f>] configfs_attach_group+0x4f/0x190 [<c40caa46>] configfs_register_subsystem+0xc6/0x130 [<c45c8186>] init_netconsole+0x2b6/0x300 [<c45a75f2>] kernel_init+0x142/0x320 [<c4004fb3>] kernel_thread_helper+0x7/0x14 ======================= Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:47 -08:00
Joel Becker	02ac0499c0	configfs: Remove EXPERIMENTAL configfs has been alive and kicking for a while now. It underpins some non-EXPERIMENTAL subsystems, such as OCFS2's cluster stack. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:47 -08:00
Mark Fasheh	0e5ae03203	ocfs2: bump version number Bump the printed version to 1.5.0. This helps us quickly identify which version of Ocfs2 a bug filer is running. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:46 -08:00
Tao Ma	2d4b1cbb44	ocfs2/dlm: Clear joining_node on hearbeat node down Currently the process of dlm join contains 2 steps: query join and assert join. After query join, the joined node will set its joining_node. So if the joining node happens to panic before the 2nd step, the joined node will fail to clear its joining_node flag because that node isn't in the domain map. It at least cause 2 problems. 1. All the new join request will fail. So no new node can mount the volume. 2. The joined node can't umount the volume since during the umount process it has to wait for the joining_node to be unknown. So the umount will be hanged. The solution is to clear the joining_node before we check the domain map. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:46 -08:00
Marcin Slusarz	4092d49f70	ocfs2: convert byte order of constant instead of variable Convert byte order of constant instead of variable it will be done at compile time vs run time. Remove unused le32_and_cpu. Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:46 -08:00
Sunil Mushran	17104683d2	ocfs2: Update default cluster timeouts Lots of people are having trouble with the default timeouts, which are too low. These new values are derived from an informal survey taken on ocfs2-users, as well as data from bug reports. This should reduce the amount of cluster disconnects and subsequent fencing seen during normal workloads. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:45 -08:00
Jan Kara	634bf74d1e	ocfs2: printf fixes Explicitely convert loff_t to long long in printf. Just for sure... Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:45 -08:00
Jan Kara	32c3c0e2e5	ocfs2: Use generic_file_llseek We should use generic_file_llseek() and not default_llseek() so that s_maxbytes gets properly checked when seeking. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:45 -08:00
Jan Kara	d2849fb294	ocfs2: Safer read_inline_data() In ocfs2_read_inline_data() we should store file size in loff_t. Although the file size should fit in 32 bits we cannot be sure in case filesystem is corrupted. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:44 -08:00
Jan Kara	5fa0613ea5	ocfs2: Silence false lockdep warnings Create separate lockdep lock classes for system file's i_mutexes. They are used to guard allocations and similar things and thus rank differently than i_mutex of a regular file or directory. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:44 -08:00
Mark Fasheh	53fc622b9e	[PATCH 2/2] ocfs2: cluster aware flock() Hook up ocfs2_flock(), using the new flock lock type in dlmglue.c. A new mount option, "localflocks" is added so that users can revert to old functionality as need be. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:43 -08:00
Mark Fasheh	cf8e06f1a8	[PATCH 1/2] ocfs2: add flock lock type This adds a new dlmglue lock type which is intended to back flock() requests. Since these locks are driven from userspace, usage rules are much more liberal than the typical Ocfs2 internal cluster lock. As a result, we can't make use of most dlmglue features - lock caching and lock level optimizations in particular. Additionally, userspace is free to deadlock itself, so we have to deal with that in the same way as the rest of the kernel - by allowing a signal to abort a lock request. In order to keep ocfs2_cluster_lock() complexity down, ocfs2_file_lock() does it's own dlm coordination. We still use the same helper functions though, so duplicated code is kept to a minimum. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:43 -08:00
Sunil Mushran	2fbe8d1ebe	ocfs2: Local alloc window size changeable via mount option Local alloc is a performance optimization in ocfs2 in which a node takes a window of bits from the global bitmap and then uses that for all small local allocations. This window size is fixed to 8MB currently. This patch allows users to specify the window size in MB including disabling it by passing in 0. If the number specified is too large, the fs will use the default value of 8MB. mount -o localalloc=X /dev/sdX /mntpoint Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:43 -08:00
Mark Fasheh	d147b3d630	ocfs2: Support commit= mount option Mostly taken from ext3. This allows the user to set the jbd commit interval, in seconds. The default of 5 seconds stays the same, but now users can easily increase the commit interval. Typically, this would be increased in order to benefit performance at the expense of data-safety. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:42 -08:00
Mark Fasheh	0957f00796	ocfs2: Add missing permission checks Check that an online resize is being driven by a user with permission to change system resource limits. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:19 -08:00
Tao Ma	7909f2bf83	[PATCH 2/2] ocfs2: Implement group add for online resize This patch adds the ability for a userspace program to request that a properly formatted cluster group be added to the main allocation bitmap for an Ocfs2 file system. The request is made via an ioctl, OCFS2_IOC_GROUP_ADD. On a high level, this is similar to ext3, but we use a different ioctl as the structure which has to be passed through is different. During an online resize, tunefs.ocfs2 will format any new cluster groups which must be added to complete the resize, and call OCFS2_IOC_GROUP_ADD on each one. Kernel verifies that the core cluster group information is valid and then does the work of linking it into the global allocation bitmap. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:04:24 -08:00
Tao Ma	d659072f73	[PATCH 1/2] ocfs2: Add group extend for online resize This patch adds the ability for a userspace program to request an extend of last cluster group on an Ocfs2 file system. The request is made via ioctl, OCFS2_IOC_GROUP_EXTEND. This is derived from EXT3_IOC_GROUP_EXTEND, but is obviously Ocfs2 specific. tunefs.ocfs2 would call this for an online-resize operation if the last cluster group isn't full. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:53:35 -08:00
Tao Ma	e9d578a8f2	ocfs2: Initalize bitmap_cpg of ocfs2_super to be the maximum. This value is initialized from global_bitmap->id2.i_chain.cl_cpg. If there is only 1 group, it will be equal to the total clusters in the volume. So as for online resize, it should change for all the nodes in the cluster. It isn't easy and there is no corresponding lock for it. bitmap_cpg is only used in 2 areas: 1. Check whether the suballoc is too large for us to allocate from the global bitmap, so it is little used. And now the suballoc size is 2048, it rarely meet this situation and the check is almost useless. 2. Calculate which group a cluster belongs to. We use it during truncate to figure out which cluster group an extent belongs too. But we should be OK if we increase it though as the cluster group calculated shouldn't change and we only ever have a small bitmap_cpg on file systems with a single cluster group. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:48:54 -08:00
Mark Fasheh	1252c434e3	ocfs2: Documentation update Remove 'readpages' from the list in ocfs2.txt. Instead of having two identical lists, I just removed the list in the OCFS2 section of fs/Kconfig and added a pointer to Documentation/filesystems/ocfs2.txt. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:48:42 -08:00
Mark Fasheh	628a24f5bd	ocfs2: Readpages support Add ->readpages support to Ocfs2. This is rather trivial - all it required is a small update to ocfs2_get_block (for mapping full extents via b_size) and an ocfs2_readpages() function which partially mirrors ocfs2_readpage(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:48:12 -08:00
Mark Fasheh	e63aecb651	ocfs2: Rename ocfs2_meta_[un]lock Call this the "inode_lock" now, since it covers both data and meta data. This patch makes no functional changes. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:46:01 -08:00
Mark Fasheh	c934a92d05	ocfs2: Remove data locks The meta lock now covers both meta data and data, so this just removes the now-redundant data lock. Combining locks saves us a round of lock mastery per inode and one less lock to ping between nodes during read/write. We don't lose much - since meta locks were always held before a data lock (and at the same level) ordered writeout mode (the default) ensured that flushing for the meta data lock also pushed out data anyways. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:45:57 -08:00
Mark Fasheh	f1f540688e	ocfs2: Add data downconvert worker to inode lock In order to extend inode lock coverage to inode data, we use the same data downconvert worker with only a small modification to only do work for regular files. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:45:54 -08:00
Mark Fasheh	34d024f843	ocfs2: Remove mount/unmount votes The node maps that are set/unset by these votes are no longer relevant, thus we can remove the mount and umount votes. Since those are the last two remaining votes, we can also remove the entire vote infrastructure. The vote thread has been renamed to the downconvert thread, and the small amount of functionality related to managing it has been moved into fs/ocfs2/dlmglue.c. All references to votes have been removed or updated. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:45:34 -08:00
Mark Fasheh	6f7b056ea9	ocfs2: Remove fs dependency on ocfs2_heartbeat module Now that the dlm exposes domain information to us, we don't need generic node up / node down callbacks. And since the DLM is only telling us when a node goes down unexpectedly, we no longer need to optimize away node down callbacks via the umount map. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:36:40 -08:00
Mark Fasheh	6561168cb4	ocfs2_dlm: Call node eviction callbacks from heartbeat handler With this, a dlm client can take advantage of the group protocol in the dlm to get full notification whenever a node within the dlm domain leaves unexpectedly. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:36:40 -08:00
Linus Torvalds	0008bf5440	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: (96 commits) sched: keep total / count stats in addition to the max for sched, futex: detach sched.h and futex.h sched: fix: don't take a mutex from interrupt context sched: print backtrace of running tasks too printk: use ktime_get() softlockup: fix signedness sched: latencytop support sched: fix goto retry in pick_next_task_rt() timers: don't #error on higher HZ values sched: monitor clock underflows in /proc/sched_debug sched: fix rq->clock warps on frequency changes sched: fix, always create kernel threads with normal priority debug: clean up kernel/profile.c sched: remove the !PREEMPT_BKL code sched: make PREEMPT_BKL the default debug: track and print last unloaded module in the oops trace debug: show being-loaded/being-unloaded indicator for modules sched: rt-watchdog: fix .rlim_max = RLIM_INFINITY sched: rt-group: reduce rescheduling hrtimer: unlock hrtimer_wakeup ...	2008-01-25 13:42:32 -08:00
Linus Torvalds	2d94dfc8c3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6: mount options: fix jfs JFS: simplify types to get rid of sparse warning JFS: FIx one more plain integer as NULL pointer warning JFS: Remove defconfig ptr comparison to 0 JFS: use DIV_ROUND_UP where appropriate Remove unnecessary kmalloc casts in the jfs filesystem JFS is missing a memory barrier JFS: Make sure special inode data is written after journal is flushed JFS: clear PAGECACHE_TAG_DIRTY for no-write pages	2008-01-25 12:20:32 -08:00
Arjan van de Ven	9745512ce7	sched: latencytop support LatencyTOP kernel infrastructure; it measures latencies in the scheduler and tracks it system wide and per process. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-25 21:08:34 +01:00
Paul E. McKenney	e260be673a	Preempt-RCU: implementation This patch implements a new version of RCU which allows its read-side critical sections to be preempted. It uses a set of counter pairs to keep track of the read-side critical sections and flips them when all tasks exit read-side critical section. The details of this implementation can be found in this paper - http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf and the article- http://lwn.net/Articles/253651/ This patch was developed as a part of the -rt kernel development and meant to provide better latencies when read-side critical sections of RCU don't disable preemption. As a consequence of keeping track of RCU readers, the readers have a slight overhead (optimizations in the paper). This implementation co-exists with the "classic" RCU implementations and can be switched to at compiler. Also includes RCU tracing summarized in debugfs. [ akpm@linux-foundation.org: build fixes on non-preempt architectures ] Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-25 21:08:24 +01:00
Linus Torvalds	b47711bfbc	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6: selinux: make mls_compute_sid always polyinstantiate security/selinux: constify function pointer tables and fields security: add a secctx_to_secid() hook security: call security_file_permission from rw_verify_area security: remove security_sb_post_mountroot hook Security: remove security.h include from mm.h Security: remove security_file_mmap hook sparse-warnings (NULL as 0). Security: add get, set, and cloning of superblock security information security/selinux: Add missing "space"	2008-01-25 08:44:29 -08:00
Linus Torvalds	e07dd2ad30	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (56 commits) [GFS2] Allow journal recovery on read-only mount [GFS2] Lockup on error [GFS2] Fix page_mkwrite truncation race path [GFS2] Fix typo [GFS2] Fix write alloc required shortcut calculation [GFS2] gfs2_alloc_required performance [GFS2] Remove unneeded i_spin [GFS2] Reduce inode size by moving i_alloc out of line [GFS2] Fix assert in log code [GFS2] Fix problems relating to execution of files on GFS2 [GFS2] Initialize extent_list earlier [GFS2] Allow page migration for writeback and ordered pages [GFS2] Remove unused variable [GFS2] Fix log block mapper [GFS2] Minor correction [GFS2] Eliminate the no longer needed sd_statfs_mutex [GFS2] Incremental patch to fix compiler warning [GFS2] Function meta_read optimization [GFS2] Only fetch the dinode once in block_map [GFS2] Reorganize function gfs2_glmutex_lock ...	2008-01-25 08:39:18 -08:00
Steve French	366781c196	[CIFS] DFS build fixes Also includes a few minor changes suggested by Christoph Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-01-25 10:12:41 +00:00
Abhijith Das	7bc5c414fe	[GFS2] Allow journal recovery on read-only mount This patch allows gfs2 to perform journal recovery even if it is mounted read-only. Strictly speaking, a read-only mount should not be writing to the filesystem, but we do this only to perform journal recovery. A read-only mount will fail if we don't recover the dirty journal. Also, when gfs2 is used as a root filesystem, it will be mounted read-only before being mounted read-write during the boot sequence. A failed read-only mount will panic the machine during bootup. Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:21:22 +00:00
Bob Peterson	1b8177ec1e	[GFS2] Lockup on error I spotted this bug while I was digging around. Looks like it could cause a lockup in some rare error condition. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:21:04 +00:00
Steven Whitehouse	b7fe2e391e	[GFS2] Fix page_mkwrite truncation race path There was a bug in the truncation/invalidation race path for ->page_mkwrite for gfs2. It ought to return 0 so that the effect is the same as if the page was truncated at any of the other points at which the page_lock is dropped. This will result in the restart of the whole page fault path. If it was due to a real truncation (as opposed to an invalidate because we let a glock go) then the ->fault path will pick that up when it gets called again. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:20:15 +00:00
Bob Peterson	3e5cd0877e	[GFS2] Fix typo This patch fixes a minor typo. Surprisingly, it still compiled. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:19:51 +00:00
Steven Whitehouse	1af535727b	[GFS2] Fix write alloc required shortcut calculation The comparison was being made against the wrong quantity. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:19:28 +00:00
Bob Peterson	0522053519	[GFS2] gfs2_alloc_required performance This is a small I/O performance enhancement to gfs2. (Actually, it is a rework of an earlier version I got wrong). The idea here is to check if the write extends past the last block in the file. If so, the function can save itself a lot of time and trouble because it knows an allocate will be required. Benchmarks like iozone should see better performance. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:19:03 +00:00
Bob Peterson	598278bd48	[GFS2] Remove unneeded i_spin This patch removes a vestigial variable "i_spin" from the gfs2_inode structure. This not only saves us memory (>300000 of these in memory for the oom test) it also saves us time because we don't have to spend time initializing it (i.e. slightly better performance). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:18:44 +00:00
Steven Whitehouse	6dbd822487	[GFS2] Reduce inode size by moving i_alloc out of line It is possible to reduce the size of GFS2 inodes by taking the i_alloc structure out of the gfs2_inode. This patch allocates the i_alloc structure whenever its needed, and frees it afterward. This decreases the amount of low memory we use at the expense of requiring a memory allocation for each page or partial page that we write. A quick test with postmark shows that the overhead is not measurable and I also note that OCFS2 use the same approach. In the future I'd like to solve the problem by shrinking down the size of the members of the i_alloc structure, but for now, this reduces the immediate problem of using too much low-memory on x86 and doesn't add too much overhead. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:18:25 +00:00
Steven Whitehouse	ac39aadd04	[GFS2] Fix assert in log code Although the values were all being calculated correctly, there was a race in the assert due to the way it was using atomic variables. This changes the value we assert on so that we get the same effect by testing a different variable. This prevents the assert triggering when it shouldn't. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:18:03 +00:00
Steven Whitehouse	9656b2c14c	[GFS2] Fix problems relating to execution of files on GFS2 This patch fixes a couple of problems which affected the execution of files on GFS2. The first is that there was a corner case where inodes were not always uptodate at the point at which permissions checks were being carried out, this was resulting in refusal of execute permission, but only on the first lookup, subsequent requests worked correctly. The second was a problem relating to incorrect updating of file sizes which was introduced with the write_begin/end code for GFS2 a little while back. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>	2008-01-25 08:17:31 +00:00
Bob Peterson	0811a127cb	[GFS2] Initialize extent_list earlier Here is a patch for the latest upstream GFS2 code: The journal extent map needs to be initialized sooner than it currently is. Otherwise failed mount attempts (e.g. not enough journals, etc.) may panic trying to access the uninitialized list. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:17:04 +00:00
Steven Whitehouse	e5d9dc278c	[GFS2] Allow page migration for writeback and ordered pages To improve performance on NUMA, we use the VM's standard page migration for writeback and ordered pages. Probably we could also do the same for journaled data, but that would need a careful audit of the code, so will be the subject of a later patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:16:41 +00:00
Steven Whitehouse	65a6290998	[GFS2] Remove unused variable The go_drop_th function is never called or referenced. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:16:19 +00:00
Steven Whitehouse	ff91cc9bb4	[GFS2] Fix log block mapper A missing offset in the calculation. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:15:58 +00:00
Bob Peterson	fa3742fa85	[GFS2] Minor correction This is a small correction to my previously posted patch1. It just changes a divide to a shift. It's faster and doesn't introduce odd dependencies on 32-bit compiles. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:15:37 +00:00
Bob Peterson	c3f60b6e3a	[GFS2] Eliminate the no longer needed sd_statfs_mutex This patch eliminates the unneeded sd_statfs_mutex mutex but preserves the ordering as discussed. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:15:16 +00:00
Bob Peterson	b3513fca7e	[GFS2] Incremental patch to fix compiler warning Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:14:53 +00:00
Bob Peterson	15c7cee799	[GFS2] Function meta_read optimization This patch optimizes function gfs2_meta_read. Basically, gfs2_meta_wait was being called regardless of whether a disk read was requested. This just pulls that wait into the if that triggers the read. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:14:33 +00:00
Bob Peterson	b0d5fd3074	[GFS2] Only fetch the dinode once in block_map Function gfs2_block_map was often looking up the disk inode twice. This optimizes it so that only does it once. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:14:13 +00:00
Bob Peterson	398bbe6832	[GFS2] Reorganize function gfs2_glmutex_lock This patch optimizes the function gfs2_glmutex_lock. The basic theory is: Why bother initializing a holder, setting up wait bits and then waiting on them, if you know the glock can be yours. So the holder stuff is placed inside the if checking if the glock is locked. This one needs careful scrutiny because changing anything to do with locking should strike terror into one's heart. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:13:52 +00:00
Bob Peterson	5fdc2eeb5d	[GFS2] Run through full bitmaps quicker in gfs2_bitfit I eliminated the passing of an unused parameter into gfs2_bitfit called rgd. This also changes the gfs2_bitfit code that searches for free (or used) blocks. Before, the code was trying to check for bytes that indicated 4 blocks in the undesired state. The problem is, it was spending more time trying to do this than it actually was saving. This version only optimizes the case where we're looking for free blocks, and it checks a machine word at a time. So on 32-bit machines, it will check 32-bits (16 blocks) and on 64-bit machines, it will check 64-bits (32 blocks) at a time. The compiler optimizes that quite well and we save some time, especially when running through full bitmaps (like the bitmaps allocated for the journals). There's probably a more elegant or optimized way to do this, but I haven't thought of it yet. I'm open to suggestions. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:13:31 +00:00
Bob Peterson	0d0868bde3	[GFS2] Get rid of useless "found" variable in quota.c This just eliminates an unused variable from the quota code. Not likely to be a time saver. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:13:01 +00:00
Bob Peterson	da6dd40d59	[GFS2] Journal extent mapping This patch saves a little time when gfs2 writes to the journals by keeping a mapping between logical and physical blocks on disk. That's better than constantly looking up indirect pointers in buffers, when the journals are several levels of indirection (which they typically are). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:11:46 +00:00
Bob Peterson	e9e1ef2b6e	[GFS2] Remove function gfs2_get_block This patch is just a cleanup. Function gfs2_get_block() just calls function gfs2_block_map reversing the last two parameters. By reversing the parameters, gfs2_block_map() may be called directly and function gfs2_get_block may be eliminated altogether. Since this function is done for every block operation, this streamlines the code and makes it a little bit more efficient. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:08:25 +00:00
David Teigland	2066b58b0a	[GFS2] use pid for plock owner for nfs clients The fl_owner is that of lockd when posix locks arrive from nfs clients, so it can't be used to distinguish between lock holders. Use fl_pid as owner instead; it's the pid of the process on the nfs client. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:08:23 +00:00
Steven Whitehouse	dbee2199c3	[GFS2] Remove unused variable Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:08:20 +00:00
Abhijith Das	292c8c14ca	[GFS2] patch to check for recursive lock requests in gfs2_rename code path A certain scenario in the rename code path triggers a kernel BUG() because it accidentally does recursive locking The first lock is requested to unlink an already existing inode (replacing a file) and the second lock is requested when the destination directory needs to alloc some space. It is rare that these two events happen during the same rename call, and even more rare that these two instances try to lock the same rgrp. It is, however, possible. https://bugzilla.redhat.com/show_bug.cgi?id=404711 Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:08:18 +00:00
Wendy Cheng	c97bfe4351	[GFS2] Remove lock methods for lock_nolock protocol GFS2 supports two modes of locking - lock_nolock for single node filesystem and lock_dlm for cluster mode locking. The gfs2 lock methods are removed from file operation table for lock_nolock protocol. This would allow VFS to handle posix lock and flock logics just like other in-tree filesystems without duplication. Signed-off-by: S. Wendy Cheng <wcheng@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:08:15 +00:00
Fabio M. Di Nitto	bcd405599f	[GFS2] Remove unrequired code Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:08:13 +00:00
Fabio Massimo Di Nitto	6a69a23f7d	[GFS2] Fix build warnings Hi Steven, Steven Whitehouse wrote: > Hi, > > Now in the -nmw git tree. Thanks, > > Steve. > > On Wed, 2007-11-21 at 11:54 -0600, Ryan O'Hara wrote: this patch introduces a bunch of build warnings by leaving around struct inode *inode = &ip->i_inode; The patch in attachment cleans them up. Please apply. Signed-off-by: Fabio Massimo Di Nitto <fabbione@ubuntu.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:08:11 +00:00
Ryan O'Hara	002ef1dc63	[GFS2] remove unnecessary permission checks Remove read/write permission() checks from xattr operations. VFS layer is already handling permission for xattrs via the xattr_permission() call, so there is no need for gfs2 to check permissions. Futhermore, using permission() for SELinux xattrs ops is incorrect. Signed-off-by: Ryan O'Hara <rohara@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:08:08 +00:00
Fabio Massimo Di Nitto	1a2781cfa5	[GFS2] Fix runtime issue with UP kernels The issue is indeed UP vs SMP and it is totally random. spin_is_locked() is a bad assertion because there is no correct answer on UP. on UP spin_is_locked() has to return either one value or another, always. This means that in my setup I am lucky enough to trigger the issue and your you are lucky enough not to. the patch in attachment removes the bogus calls to BUG_ON and according to David (in CC and thanks for the long explanation on the problem) we can rely upon things like lockdep to find problem that might be trying to catch. Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:08:06 +00:00
David Teigland	00c134756c	[GFS2] tidy up error message Print error with log_error() to be consistent with others. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:08:04 +00:00
Fabio Massimo Di Nitto	0b7580c786	[GFS2] Check for installation of mount helpers for DLM mounts The patch is a fix to abort mount if the mount.gfs* and possible umount.* are missing from /sbin. While we do what we can to guarantee that they are installed properly in userland (CVS HEAD), we want to make sure that mount still aborts properly. The only sign of missing helpers is that lock_dlm will receive no mount options at all. According to David the problem does not exist for lock_nolock as the helpers are not required. The patch has been tested for both gfs and gfs2 and it works as expected. The lack of mount.gfs* will generate an error that is propagated to mount: oot@node1:~# mount -t gfs2 /dev/nbd2 /mnt/ mount: wrong fs type, bad option, bad superblock on /dev/nbd2, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg \| tail or so [ 3513.303346] GFS2: fsid=: Trying to join cluster "lock_dlm", "gutsy:gfs2" [ 3513.304546] DLM/GFS2/GFS ERROR: (u)mount helpers are not installed properly! [ 3513.306290] GFS2: fsid=: can't mount proto=lock_dlm, table=gutsy:gfs2, hostdata= You might want to notice that it will also avoid mount to hang or fail silently or with strange errors that will require the cluster to reboot/restart before you can actually mount the filesystem again. Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:08:01 +00:00
Steven Whitehouse	e35b921185	[GFS2] Don't periodically update the jindex We only care about the content of the jindex in two cases, one is when we mount the fs and the other is when we need to recover another journal. In both cases we have to update the jindex anyway, so there is no point in updating it periodically between times, so this removes it to simplify gfs2_logd. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:59 +00:00
Steven Whitehouse	ec69b18883	[GFS2] Move gfs2_logd into log.c This means that we can mark gfs2_ail1_empty static and prepares the way for further changes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:56 +00:00
Steven Whitehouse	fd041f0b40	[GFS2] Use atomic_t for journal free blocks counter This patch changes the counter which keeps track of the free blocks in the journal to an atomic_t in preparation for the following patch which will update the log reservation code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:54 +00:00
Steven Whitehouse	2bcd610d2f	[GFS2] Don't add glocks to the journal The only reason for adding glocks to the journal was to keep track of which locks required a log flush prior to release. We add a flag to the glock to allow this check to be made in a simpler way. This reduces the size of a glock (by 12 bytes on i386, 24 on x86_64) and means that we can avoid extra work during the journal flush. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:52 +00:00
David Teigland	8cbc434247	[GFS2] check kthread_should_stop when waiting Use wait_event_interruptible() in the lock_dlm thread instead of an open coded equivalent, and include a kthread_should_stop() check in the wait test so we don't miss a kthread_stop(). Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:49 +00:00
Bob Peterson	c7227e4642	[GFS2] Given device ID rather than s_id in "id" sysfs file This patch changes the /sys/fs/gfs2/<s_id>/id file to give the device id "major:minor" rather than the s_id. That enables gfs2_tool to match devices properly (by id, not name) when locating the tuning files. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:47 +00:00
Steven Whitehouse	e589665eb9	[GFS2] Remove flags no longer required The HIF_MUTEX and HIF_PROMOTE flags were set on the glock holders depending upon which of the two waiters lists they were going to be queued upon. They were then tested when the holders were taken off the lists to ensure that the right type of holder was being dequeued. Since we are already using separate lists, there doesn't seem a lot of point having these flags as well, and since setting them and testing them is in the fast path for locking and unlocking glock, this patch removes them. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:44 +00:00
Steven Whitehouse	3042a2ccd6	[GFS2] Reorder writeback for glock sync Previously we were doing (write data, wait for data, write metadata, wait for metadata). After this patch we so (write metadata, write data, wait for data, wait for metadata) which should be more efficient. Also I noticed that the drop_bh and xmote_bh functions were almost identical. In fact the only difference was a single test, and that test is such that in the drop_bh case, it would always evaluate to the correct result. As such we can use the xmote_bh functions in all the places where we were using the drop_bh function and remove the drop_bh functions. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:42 +00:00
Steven Whitehouse	52d4c74b08	[GFS2] Add sync_page to metadata address space operations This set of address space operations was missing a sync_page operation. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:40 +00:00
Steven Whitehouse	c2932e03db	[GFS2] Remove "reclaim limit" This call to reclaim glocks is not needed, and in particular we don't want it in the fast path for locking glocks. The limit was entirely arbitrary anyway and we can't expect users to adjust things like this, the remaining code will do the right thing on its own. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:37 +00:00
Steven Whitehouse	60b0d08779	[GFS2] Remove unused variables These haven't been used for some time, remove them. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:35 +00:00
Steven Whitehouse	47e83b5091	[GFS2] Use correct include file in ops_address.c Something changed in the upstream kernel, and it needs this one-liner to allow ops_address.c to build. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:32 +00:00
Steven Whitehouse	c41d4f09f1	[GFS2] Don't hold page lock when starting transaction This is an addendum to the new AOPs work which moves the point at which we take the page lock so that we don't get it until the last possible moment. This resolves a conflict between starting transactions and the page lock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:30 +00:00
Steven Whitehouse	b8e7cbb65b	[GFS2] Add writepages for GFS2 jdata This patch resolves a lock ordering issue where we had been getting a transaction lock in the wrong order with respect to the page lock. By using writepages rather than just writepage, it is then possible to start a transaction before locking the page, and thus matching the locking order elsewhere in the code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:28 +00:00
Steven Whitehouse	9ff8ec32e5	[GFS2] Split gfs2_writepage into three cases This patch splits gfs2_writepage into separate functions for each of the three cases: writeback, ordered and journalled. As a result it becomes a lot easier to see what each one is doing. The common code is moved into gfs2_writepage_common. This fixes a performance bug where we were doing more work than strictly required in the ordered write case. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:25 +00:00
Steven Whitehouse	5561093e2c	[GFS2] Introduce gfs2_set_aops() Just like ext3 we now have three sets of address space operations to cover the cases of writeback, ordered and journalled data writes. This means that the individual operations can now become less complicated as we are able to remove some of the tests for file data mode from the code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:23 +00:00
Steven Whitehouse	bf36a71316	[GFS2] Add gfs2_is_writeback() This adds a function "gfs2_is_writeback()" along the lines of the existing "gfs2_is_jdata()" in order to clean up the code and make the various tests for the inode mode more obvious. It also fixes the PageChecked() logic where we were resetting the flag too early in the case of an error path. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:21 +00:00
Steven Whitehouse	e7e36f1435	[GFS2] Remove unused field in struct gfs2_inode Removes a field that is not used. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:18 +00:00
Steven Whitehouse	f91a0d3e24	[GFS2] Remove useless i_cache from inodes The i_cache was designed to keep references to the indirect blocks used during block mapping so that they didn't have to be looked up continually. The idea failed because there are too many places where the i_cache needs to be freed, and this has in the past been the cause of many bugs. In addition there was no performance benefit being gained since the disk blocks in question were cached anyway. So this patch removes it in order to simplify the code to prepare for other changes which would otherwise have had to add further support for this feature. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:16 +00:00
Steven Whitehouse	3cc3f710ce	[GFS2] Use ->page_mkwrite() for mmap() This cleans up the mmap() code path for GFS2 by implementing the page_mkwrite function for GFS2. We are thus able to use the generic filemap_fault function for our ->fault() implementation. This now means that shared writable mappings will be much more efficiently shared across the cluster if there is a reasonable proportion of read activity (the greater proportion, the better). As a side effect, it also reduces the size of the code, removes special cases from readpage and readpages, and makes the code path easier to follow. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:13 +00:00
Steven Whitehouse	51ff87bdd9	[GFS2] Clean up internal read function As requested by Christoph, this patch cleans up GFS2's internal read function so that it no longer uses the do_generic_mapping_read function. This function is obsolete and GFS2 is the last user of it. As a side effect the internal read code gets smaller and easier to read and gfs2_readpage is split into two. One function has the locking and the other function has the rest of the logic. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Christoph Hellwig <hch@infradead.org>	2008-01-25 08:07:11 +00:00
Wendy Cheng	cc7e79b168	[GFS2] Handle multiple glock demote requests Fix a race condition where multiple glock demote requests are sent to a node back-to-back. This patch does a check inside handle_callback() to see whether a demote request is in progress. If true, it sets a flag to make sure run_queue() will loop again to handle the new request, instead of erronously setting gl_demote_state to a different state. Signed-off-by: S. Wendy Cheng <wcheng@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:09 +00:00
Greg Kroah-Hartman	197b12d679	Kobject: convert fs/* from kobject_unregister() to kobject_put() There is no need for kobject_unregister() anymore, thanks to Kay's kobject cleanup changes, so replace all instances of it with kobject_put(). Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:40 -08:00
Greg Kroah-Hartman	f9cb074bff	Kobject: rename kobject_init_ng() to kobject_init() Now that the old kobject_init() function is gone, rename kobject_init_ng() to kobject_init() to clean up the namespace. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:38 -08:00
Kay Sievers	edfaa7c365	Driver core: convert block from raw kobjects to core devices This moves the block devices to /sys/class/block. It will create a flat list of all block devices, with the disks and partitions in one directory. For compatibility /sys/block is created and contains symlinks to the disks. /sys/class/block \|-- sda -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda \|-- sda1 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda1 \|-- sda10 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda10 \|-- sda5 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda5 \|-- sda6 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda6 \|-- sda7 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda7 \|-- sda8 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda8 \|-- sda9 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda9 `-- sr0 -> ../../devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0 /sys/block/ \|-- sda -> ../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda `-- sr0 -> ../devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0 Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:36 -08:00
Greg Kroah-Hartman	a5815ddf26	Kobject: convert fs/char_dev.c to use kobject_init/add_ng() This converts the code to use the new kobject functions, cleaning up the logic in doing so. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:31 -08:00
Greg Kroah-Hartman	901195ed7f	Kobject: change GFS2 to use kobject_init_and_add Stop using kobject_register, as this way we can control the sending of the uevent properly, after everything is properly initialized. Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:26 -08:00
Greg Kroah-Hartman	6e90aa972d	kobject: convert ecryptfs to use kobject_create Using a kset for this trivial directory is an overkill. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Mike Halcrow <mhalcrow@us.ibm.com> Cc: Phillip Hellewell <phillip@hellewell.homeip.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:24 -08:00
Greg Kroah-Hartman	0ff21e4663	kobject: convert kernel_kset to be a kobject kernel_kset does not need to be a kset, but a much simpler kobject now that we have kobj_attributes. We also rename kernel_kset to kernel_kobj to catch all users of this symbol with a build error instead of an easy-to-ignore build warning. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:24 -08:00
Greg Kroah-Hartman	830d3cfb16	kset: convert block_subsys to use kset_create Dynamically create the kset instead of declaring it statically. We also rename block_subsys to block_kset to catch all users of this symbol with a build error instead of an easy-to-ignore build warning. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:23 -08:00
Greg Kroah-Hartman	c60b717879	kset: convert ocfs2 to use kset_create Dynamically create the kset instead of declaring it statically. Also use the new kobj_attribute which cleans up this file a _lot_. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:23 -08:00
Kay Sievers	000f2a4d8c	Driver Core: kill subsys_attribute and default sysfs ops Remove the no longer needed subsys_attributes, they are all converted to the more sensical kobj_attributes. There is no longer a magic fallback in sysfs attribute operations, all kobjects which create simple attributes need explicitely a ktype assigned, which tells the core what was intended here. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:22 -08:00
Greg Kroah-Hartman	af6370ea92	ecryptfs: remove version_str file from sysfs This file violates the one-value-per-file sysfs rule. If you all want it added back, please do something like a per-feature file to show what is present and what isn't. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Mike Halcrow <mhalcrow@us.ibm.com> Cc: Phillip Hellewell <phillip@hellewell.homeip.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:18 -08:00
Kay Sievers	386f275f5d	Driver Core: switch all dynamic ksets to kobj_sysfs_ops Switch all dynamically created ksets, that export simple attributes, to kobj_attribute from subsys_attribute. Struct subsys_attribute will be removed. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Cc: Mike Halcrow <mhalcrow@us.ibm.com> Cc: Phillip Hellewell <phillip@hellewell.homeip.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:18 -08:00
Greg Kroah-Hartman	bd35b93d80	kset: convert kernel_subsys to use kset_create Dynamically create the kset instead of declaring it statically. We also rename kernel_subsys to kernel_kset to catch all users of this symbol with a build error instead of an easy-to-ignore build warning. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:14 -08:00
Greg Kroah-Hartman	d405936b32	kset: convert dlm to use kset_create Dynamically create the kset instead of declaring it statically. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:14 -08:00
Greg Kroah-Hartman	136a27507f	kset: convert gfs2 dlm to use kset_create Dynamically create the kset instead of declaring it statically. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:14 -08:00
Greg Kroah-Hartman	9bec101a0c	kset: convert gfs2 to use kset_create Dynamically create the kset instead of declaring it statically. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:13 -08:00
Greg Kroah-Hartman	00d2666623	kobject: convert main fs kobject to use kobject_create This also renames fs_subsys to fs_kobj to catch all current users with a build error instead of a build warning which can easily be missed. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:13 -08:00
Greg Kroah-Hartman	917e865df7	kset: convert ecryptfs to use kset_create Dynamically create the kset instead of declaring it statically. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Mike Halcrow <mhalcrow@us.ibm.com> Cc: Phillip Hellewell <phillip@hellewell.homeip.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:13 -08:00
Greg Kroah-Hartman	3794491d0c	kobject: convert configfs to use kobject_create We don't need a kset here, a simple kobject will do just fine, so dynamically create the kobject and use it. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:11 -08:00
Greg Kroah-Hartman	191e186bd0	kobject: convert debugfs to use kobject_create We don't need a kset here, a simple kobject will do just fine, so dynamically create the kobject and use it. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:11 -08:00
Greg Kroah-Hartman	5c89e17e9c	kobject: convert fuse to use kobject_create We don't need a kset here, a simple kobject will do just fine, so dynamically create the kobject and use it. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:11 -08:00
Greg Kroah-Hartman	4ff6abff83	kobject: get rid of kobject_add_dir kobject_create_and_add is the same as kobject_add_dir, so drop kobject_add_dir. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:11 -08:00
Greg Kroah-Hartman	3514faca19	kobject: remove struct kobj_type from struct kset We don't need a "default" ktype for a kset. We should set this explicitly every time for each kset. This change is needed so that we can make ksets dynamic, and cleans up one of the odd, undocumented assumption that the kset/kobject/ktype model has. This patch is based on a lot of help from Kay Sievers. Nasty bug in the block code was found by Dave Young <hidave.darkstar@gmail.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:10 -08:00
Jiri Slaby	d7b3788965	sysfs: remove SPIN_LOCK_UNLOCKED SPIN_LOCK_UNLOCKED is deprecated, use DEFINE_SPINLOCK instead Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Tejun Heo <teheo@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:08 -08:00
Kay Sievers	2f90a85180	sysfs: create optimal relative symlink targets Instead of walking from the source down to the root of sysfs, and back to the target, we stop at the first directory the source and the target share. This link: /devices/pci0000:00/0000:00:1d.7/usb1/1-0:1.0/ep_81 pointed to: ../../../../../devices/pci0000:00/0000:00:1d.0/usb2/2-0:1.0/usb_endpoint/usbdev2.1_ep81 now it just points to: usb_endpoint/usbdev1.1_ep81 Thanks to Denis Cheng for bringing this up, and sending the initial patch. CC: Denis Cheng <crquan@gmail.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:08 -08:00
Greg Kroah-Hartman	30a468b1c1	ecryptfs: clean up attribute mess It isn't that hard to add simple kset attributes, so don't go through all the gyrations of creating your own object type and show and store functions. Just use the functions that are already present. This makes things much simpler. Note, the version_str string violates the "one value per file" rule for sysfs. I suggest changing this now (individual files per type supported is one suggested way.) Cc: Michael A. Halcrow <mahalcro@us.ibm.com> Cc: Michael C. Thompson <mcthomps@us.ibm.com> Cc: Tyler Hicks <tyhicks@ou.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:08 -08:00
Kay Sievers	62ca879256	coda: convert struct class_device to struct device Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Cc: Tony Jones <tonyj@suse.de> Cc: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:05 -08:00
Jean Delvare	9fd5b1c906	sysfs: Fix a copy-n-paste typo in comment Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:04 -08:00
Igor Mammedov	6d5ae0deb1	[CIFS] DFS support: provide shrinkable mounts Signed-off-by: Igor Mammedov <niallain@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-01-25 03:28:31 +00:00
James Morris	c43e259cc7	security: call security_file_permission from rw_verify_area All instances of rw_verify_area() are followed by a call to security_file_permission(), so just call the latter from the former. Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-01-25 11:29:52 +11:00
Miklos Szeredi	5c5e32ceeb	mount options: fix jfs Add iocharset= and errors= options to /proc/mounts for jfs filesystems. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>	2008-01-24 16:13:21 -06:00
Linus Torvalds	4784b11c4f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6: [SPARC]: Constify function pointer tables. [SPARC64]: Fix section error in sparcspkr [SPARC64]: Fix of section mismatch warnings.	2008-01-23 18:46:25 -08:00
James Bottomley	d4acd722b7	[SCSI] sysfs: add filter function to groups This patch allows the various users of attribute_groups to selectively allow the appearance of group attributes. The primary consumer of this will be the transport classes in which we currently have elaborate attribute selection algorithms to do this same thing. Acked-by: Greg KH <greg@kroah.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2008-01-23 11:29:18 -06:00
James Bottomley	11f24fbdf5	[SCSI] sysfs: fix the sysfs_add_file_to_group interfaces I can't see a reason why these shouldn't work on every group. However, they only seem to work on named groups. This patch allows the group functions to work on anonymous groups (those with NULL names). Acked-by: Tejun Heo <htejun@gmail.com> Acked-by: Kay Sievers <kay.sievers@vrfy.org> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2008-01-23 11:29:17 -06:00
Jan Engelhardt	872e2be7c4	[SPARC]: Constify function pointer tables. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-22 18:29:20 -08:00
Johann Felix Soden	889c94a14e	Fix file references in documentation and Kconfig Fix typo in arch/powerpc/boot/flatdevtree_env.h. There is no Documentation/networking/ixgbe.txt. README.cycladesZ is now in Documentation/. wavelan.p.h is now in drivers/net/wireless/. HFS.txt is now Documentation/filesystems/hfs.txt. OSS-files are now in sound/oss/. Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-22 10:43:36 -08:00
Steve French	ed2b91701d	[CIFS] Do not log path names in lookup errors Andi Kleen noticed that we were logging access denied errors (which is noisy in the dmesg log, and not needed to be logged) and that we were logging path names on that an other errors (e.g. EIO) which we should not be doing. CC: Andi Kleen <ak@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-01-20 00:30:29 +00:00
Jonas Bonn	f63dcda197	jbd: do not try lock_acquire after handle made invalid This likely fixes the oops in __lock_acquire reported as: http://www.kerneloops.org/raw.php?rawid=2753&msgid= http://www.kerneloops.org/raw.php?rawid=2749&msgid= In these reported oopses, start_this_handle is returning -EROFS. Signed-off-by: Jonas Bonn <jonas.bonn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-17 15:38:59 -08:00
Eric Sandeen	46a39c1cd5	hfs: fix coverity-found null deref Fix potential null deref introduced by commit `cf05946250` http://bugzilla.kernel.org/show_bug.cgi?id=9748 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Reported-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-17 15:38:58 -08:00
Tejun Heo	456ef1553c	sysfs: fix bugs in sysfs_rename/move_dir() sysfs_rename/move_dir() have the following bugs. - On dentry lookup failure, kfree() is called on ERR_PTR() value. - sysfs_move_dir() has an extra dput() on success path. Fix them. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-16 09:54:03 -08:00
Tejun Heo	e49452c677	sysfs: make sysfs_lookup() return ERR_PTR(-ENOENT) on failed lookup sysfs tries to keep dcache a strict subset of sysfs_dirent tree by shooting down dentries when a node is removed, that is, no negative dentry for sysfs. However, the lookup function returned NULL and thus created negative dentries when the target node didn't exist. Make sysfs_lookup() return ERR_PTR(-ENOENT) on lookup failure. This fixes the NULL dereference bug in sysfs_get_dentry() discovered by bluetooth rfcomm device moving around. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-16 09:54:03 -08:00
Linus Torvalds	c23f72cae9	Revert "writeback: introduce writeback_control.more_io to indicate more io" This reverts commit `2e6883bdf4`, as requested by Fengguang Wu. It's not quite fully baked yet, and while there are patches around to fix the problems it caused, they should get more testing. Says Fengguang: "I'll resend them both for -mm later on, in a more complete patchset". See http://bugzilla.kernel.org/show_bug.cgi?id=9738 for some of this discussion. Requested-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-14 21:21:29 -08:00
Oleg Nesterov	a98fdcef94	fix the "remove task_ppid_nr_ns" commit Commit `84427eaef1` (remove task_ppid_nr_ns) moved the task_tgid_nr_ns(task->real_parent) outside of lock_task_sighand(). This is wrong, ->real_parent could be freed/reused. Both ->parent/real_parent point to nothing after __exit_signal() because we remove the child from ->children list, and thus the child can't be reparented when its parent exits. rcu_read_lock() protects ->parent/real_parent, but _only_ if we know it was valid before we take rcu lock. Revert this part of the patch. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-14 13:23:00 -08:00
NeilBrown	ba67a39efd	knfsd: Allow NFSv2/3 WRITE calls to succeed when krb5i etc is used. When RPCSEC/GSS and krb5i is used, requests are padded, typically to a multiple of 8 bytes. This can make the request look slightly longer than it really is. As of `f34b95689d` "The NFSv2/NFSv3 server does not handle zero length WRITE request correctly", the xdr decode routines for NFSv2 and NFSv3 reject requests that aren't the right length, so krb5i (for example) WRITE requests can get lost. This patch relaxes the appropriate test and enhances the related comment. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Peter Staubach <staubach@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-13 09:57:57 -08:00
Roland McGrath	84427eaef1	remove task_ppid_nr_ns task_ppid_nr_ns is called in three places. One of these should never have called it. In the other two, using it broke the existing semantics. This was presumably accidental. If the function had not been there, it would have been much more obvious to the eye that those patches were changing the behavior. We don't need this function. In task_state, the pid of the ptracer is not the ppid of the ptracer. In do_task_stat, ppid is the tgid of the real_parent, not its pid. I also moved the call outside of lock_task_sighand, since it doesn't need it. In sys_getppid, ppid is the tgid of the real_parent, not its pid. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-13 09:56:43 -08:00
Linus Torvalds	974a9f0b47	Use access mode instead of open flags to determine needed permissions Way back when (in commit `834f2a4a15`, aka "VFS: Allow the filesystem to return a full file pointer on open intent" to be exact), Trond changed the open logic to keep track of the original flags to a file open, in order to pass down the the intent of a dentry lookup to the low-level filesystem. However, when doing that reorganization, it changed the meaning of namei_flags, and thus inadvertently changed the test of access mode for directories (and RO filesystem) to use the wrong flag. So fix those test back to use access mode ("acc_mode") rather than the open flag ("flag"). Issue noticed by Bill Roman at Datalight. Reported-and-tested-by: Bill Roman <bill.roman@datalight.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-12 14:47:58 -08:00
Linus Torvalds	460c54b779	Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6 * 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: [XFS] fix unaligned access in readdir	2008-01-11 11:49:31 -08:00
Christoph Hellwig	aea6ad0ce5	[XFS] fix unaligned access in readdir This patch should fix the issue seen on Alpha with unaligned accesses in the new readdir code. By aligning each dirent to sizeof(u64) we'll avoid unaligned accesses. To make doubly sure we're not hitting problems also rearrange struct hack_dirent to avoid holes. SGI-PV: 975411 SGI-Modid: xfs-linux-melb:xfs-kern:30302a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-01-11 18:05:04 +11:00
Igor Mammedov	e6ab15827e	[CIFS] DFS support patchset: Added mountdata Also cifs_fs_type was made not static for ussage in dfs code. Signed-off-by: Igor Mammedov <niallain@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-01-11 01:49:48 +00:00
Dave Kleikamp	967c9ec4ec	JFS: simplify types to get rid of sparse warning jfs_metapage.c was using uints and unsigned ints inconsistently when regular ints suffice. Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>	2008-01-10 16:04:25 -06:00

... 6 7 8 9 10 ...

8028 Commits