Use snprintf(buf, PAGE_SIZE, ...) instead of sprintf for sysfs show
methods. Per instructions in Documentation/filesystems/sysfs.txt
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
xfs_splice_write() failed to update the on disk inode size when extending
the so when the file was closed the range extended by splice was truncated
off. Hence any region of a file written to by splice would end up as a
hole full of zeros.
SGI-PV: 955939
SGI-Modid: xfs-linux-melb:xfs-kern:26920a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: David Chatterton <chatz@sgi.com>
__blockdev_direct_IO for the DIO_OWN_LOCKING case for direct I/O reads
since it drops and reacquires the i_mutex while holding the iolock and
this violates the locking order.
SGI-PV: 955696
SGI-Modid: xfs-linux-melb:xfs-kern:26898a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chatterton <chatz@sgi.com>
The fix for recent ENOSPC deadlocks introduced certain limitations on
allocations. The fix could cause xfssyncd to loop endlessly if we did not
leave some space free for the allocator to work correctly. Basically, we
needed to ensure that we had at least 4 blocks free for an AG free list
and a block for the inode bmap btree at all times.
However, this did not take into account the fact that each AG has a free
list that needs 4 blocks. Hence any filesystem with more than one AG could
cause oversubscription of free space and make xfssyncd spin forever trying
to allocate space needed for AG freelists that was not available in the
AG.
The following patch reserves space for the free lists in all AGs plus the
inode bmap btree which prevents oversubscription. It also prevents those
blocks from being reported as free space (as they can never be used) and
makes the SMP in-core superblock accounting code and the reserved block
ioctl respect this requirement.
SGI-PV: 955674
SGI-Modid: xfs-linux-melb:xfs-kern:26894a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: David Chatterton <chatz@sgi.com>
CIFS had one path in which dentry was instantiated before the corresponding
inode metadata was filled in.
Fixes Redhat bugzilla bug #163493
Signed-off-by: Steve French <sfrench@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: Dave Kleikamp <shaggy@austin.ibm.com>
Adds kernel-doc for alloc_super() type in fs/super.c.
Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ass a comment explaining the slightly odd construct used to
pass error values back.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's follow up emails, here are a few small
fixes which were missed earlier.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's request, some unused code is removed and
some consts added in the quota code.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's comments, removed some unused code and
removed some brackets which were not required.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's request and also a few of my own. It has
been possible to add a few most const to the code as a result of
the change in gfs2_ea_name2type.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's request, I've added a ',' to the end of
each of the multi-line structures which didn't already have
one (most already did).
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's comments, this should make all the headers
compile on their own by including and/or declaring structures
early.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per comments from Jan Engelhardt, remove redundant casts, redundant
endian conversions, add a smattering of const and rewrite the
dirent_next function in order to avoid as many casts as possible.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Introduce a couple of new constants which make the NFS filehandle
sizes that GFS2 uses a bit clearer. Also fix one or two minor
issues as per Jan Engelhardt's sixth email.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's fifth email. This has most of the changes
recommended, which is the removal of casts which are not required,
some indenting fixes and similar.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per the remainder of Jan Engelhardt's fourth email comments,
remove an cast thats not required. Also tidy up the "limit" code
in stuck_releasepage().
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Use const in endian conversion and printing of on-disk structures.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's fourth email, this is the first part of the
change set with a few minor style points.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The remains of the changes for Jan Engelhardt's third email. Remove
a cast and tidy up gfs2_inode_attr_in.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This makes all fixed size types have consistent names.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's third set of comments, this make various
code style changes and moves the structures from format.h into
super.c, which was the only place that format.h was actually used.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's second email, this removes some unused code,
and fixes up indenting in various places.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Missed a place where I forgot to convert kfree() to kmem_cache_free() as
part of jbd-manage-its-own-slab changes.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As per comments from Jan Engelhardt <jengelh@linux01.gwdg.de> this
updates the copyright message to say "version" in full rather than
"v.2". Also incore.h has been updated to remove forward structure
declarations which are not required.
The gfs2_quota_lvb structure has now had endianess annotations added
to it. Also quota.c has been updated so that we now store the
lvb data locally in endian independant format to avoid needing
a structure in host endianess too. As a result the endianess
conversions are done as required at various points and thus the
conversion routines in lvb.[ch] are no longer required. I've
moved the one remaining constant in lvb.h thats used into lm.h
and removed the unused lvb.[ch].
I have not changed the HIF_ constants. That is left to a later patch
which I hope will unify the gh_flags and gh_iflags fields of the
struct gfs2_holder.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Check if the FORCEFREE flag has been provided from user space. If so, set
the force option to dlm_release_lockspace() so that any remaining locks
will be freed.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch fixes three main bugs. Firstly the direct i/o get_block
was returning the wrong return code in certain cases. Secondly, the
GFS2's releasepage function was not dealing with cases when clean,
ordered buffers were found still queued on a transaction (which can
happen depending on the ordering of journal flushes). Thirdly, the
journaling code itself needed altering to take account of the
after effects of removing the clean ordered buffers from the transactions
before a journal flush.
The releasepage bug did also show up under "normal" buffered i/o
as well, so its not just a fix for direct i/o. In fact its not
normally used in the direct i/o path at all, except when flushing
existing buffers after performing a direct i/o write, but that was
the code path that led us to spot this.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This adds the superblock as a key for glock lookups. Since the glocks
are already stored in a per-superblock table, this has no effect at
the moment. Later on this will change though.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We can take advantage of the slab allocator to ensure that all the list
heads and the spinlock (plus one or two other fields) are initialised
by slab to speed up allocation of glocks.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Remove the unused sync feature from glocks. This is currently done by
calling the required functions to sync pages/blocks directly so this
code isn't needed.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
For all the usual reasons of enforcing correctness and potentially
reducing code size, this patch makes the glock operations const.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
conversion.
Since bma.conv is a char and XFS_BMAPI_CONVERT is 0x1000, bma.conv was
always assigned zero. Spotted by the GNU C compiler (SVN version).
SGI-PV: 947312
SGI-Modid: xfs-linux-melb:xfs-kern:26887a
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Nathan Scott <nathans@sgi.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] Do not send Query All EAs SMB when mount option nouser_xattr
[CIFS] endian errors in lanman protocol support
[CIFS] Fix oops in cifs_close due to unitialized lock sem and list in
[CIFS] Fix oops when negotiating lanman and no password specified
[CIFS]
[CIFS] Allow cifsd to suspend if connection is lost
[CIFS] Make midState usage more consistent
[CIFS] spinlock protect read of last srv response time in timeout path
[CIFS] Do not time out posix brl requests when using new posix setfileinfo
None of the other /proc/meminfo lines have a space in the identifier. This
post-2.6.17 addition has the potential to break existing parsers, so use an
underscore instead (like Committed_AS).
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes the locking error noticed by lockdep:
=============================================
[ INFO: possible recursive locking detected ]
---------------------------------------------
init/1 is trying to acquire lock:
(&sighand->siglock){....}, at: [<c047a78a>] flush_old_exec+0x3ae/0x859
but task is already holding lock:
(&sighand->siglock){....}, at: [<c047a77a>] flush_old_exec+0x39e/0x859
other info that might help us debug this:
2 locks held by init/1:
#0: (tasklist_lock){..--}, at: [<c047a76a>] flush_old_exec+0x38e/0x859
#1: (&sighand->siglock){....}, at: [<c047a77a>] flush_old_exec+0x39e/0x859
stack backtrace:
[<c04051e1>] show_trace_log_lvl+0x54/0xfd
[<c040579d>] show_trace+0xd/0x10
[<c04058b6>] dump_stack+0x19/0x1b
[<c043b33a>] __lock_acquire+0x773/0x997
[<c043bacf>] lock_acquire+0x4b/0x6c
[<c060630b>] _spin_lock+0x19/0x28
[<c047a78a>] flush_old_exec+0x3ae/0x859
[<c0498053>] load_elf_binary+0x4aa/0x1628
[<c0479cab>] search_binary_handler+0xa7/0x24e
[<c047b577>] do_execve+0x15b/0x1f9
[<c04022b4>] sys_execve+0x29/0x4d
[<c0403faf>] syscall_call+0x7/0xb
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
reiserfs seems to have another locking level layer for the i_mutex due to the
xattrs-are-a-directory thing.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
JBD currently allocates commit and frozen buffers from slabs. With
CONFIG_SLAB_DEBUG, its possible for an allocation to cross the page
boundary causing IO problems.
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=200127
So, instead of allocating these from regular slabs - manage allocation from
its own slabs and disable slab debug for these slabs.
[akpm@osdl.org: cleanups]
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix two compile failures in eventpoll.c code which would happen if
DEBUG_EPOLL is bigger than zero.
Signed-off-by: Masoud Sharbiani <masouds@google.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1) When we allocated last fragment in ufs_truncate, we read page, check
if block mapped to address, and if not trying to allocate it. This is
wrong behaviour, fragment may be NOT allocated, but mapped, this
happened because of "block map" function not checked allocated fragment
or not, it just take address of the first fragment in the block, add
offset of fragment and return result, this is correct behaviour in
almost all situation except call from ufs_truncate.
2) Almost all implementation of UFS, which I can investigate have such
"defect": if you have full disk, and try truncate file, for example 3GB
to 2MB, and have hole in this region, truncate return -ENOSPC. I tried
evade from this problem, but "block allocation" algorithm is tied to
right value of i_lastfrag, and fix of this corner case may slow down of
ordinaries scenarios, so this patch makes behavior of "truncate"
operations similar to what other UFS implementations do.
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On UFS, this scenario:
open(O_TRUNC)
lseek(1024 * 1024 * 80)
write("A")
lseek(1024 * 2)
write("A")
may cause access to invalid address.
This happened because of "goal" is calculated in wrong way in block
allocation path, as I see this problem exists also in 2.4.
We use construction like this i_data[lastfrag], i_data array of pointers to
direct blocks, indirect and so on, it has ceratain size ~20 elements, and
lastfrag may have value for example 40000.
Also this patch fixes related to handling such scenario issues, wrong
zeroing metadata, in case of block(not fragment) allocation, and wrong goal
calculation, when we allocate block
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
To handle the earlier bogus ENOSPC error caused by filesystem full of block
reservation, current code falls back to non block reservation, starts to
allocate block(s) from the goal allocation block group as if there is no
block reservation.
Current code needs to re-load the corresponding block group descriptor for
the initial goal block group in this case. The patch fixes this.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mounting an ext2 filesystem with zero s_inodes_per_group will cause a
divide error.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mounting a (corrupt) minix filesystem with zero s_zmap_blocks
gives a spectacular crash on my 2.6.17.8 system, no doubt
because minix/inode.c does an unconditional
minix_set_bit(0,sbi->s_zmap[0]->b_data);
[akpm@osdl.org: make labels conistent while we're there]
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On Wed, 2006-08-09 at 07:57 +0200, Rolf Eike Beer wrote:
> =============================================
> [ INFO: possible recursive locking detected ]
> ---------------------------------------------
> parted/7929 is trying to acquire lock:
> (&bdev->bd_mutex){--..}, at: [<c105eb8d>] __blkdev_put+0x1e/0x13c
>
> but task is already holding lock:
> (&bdev->bd_mutex){--..}, at: [<c105eec6>] do_open+0x72/0x3a8
>
> other info that might help us debug this:
> 1 lock held by parted/7929:
> #0: (&bdev->bd_mutex){--..}, at: [<c105eec6>] do_open+0x72/0x3a8
> stack backtrace:
> [<c1003aad>] show_trace_log_lvl+0x58/0x15b
> [<c100495f>] show_trace+0xd/0x10
> [<c1004979>] dump_stack+0x17/0x1a
> [<c102dee5>] __lock_acquire+0x753/0x99c
> [<c102e3b0>] lock_acquire+0x4a/0x6a
> [<c1204501>] mutex_lock_nested+0xc8/0x20c
> [<c105eb8d>] __blkdev_put+0x1e/0x13c
> [<c105ecc4>] blkdev_put+0xa/0xc
> [<c105f18a>] do_open+0x336/0x3a8
> [<c105f21b>] blkdev_open+0x1f/0x4c
> [<c1057b40>] __dentry_open+0xc7/0x1aa
> [<c1057c91>] nameidata_to_filp+0x1c/0x2e
> [<c1057cd1>] do_filp_open+0x2e/0x35
> [<c1057dd7>] do_sys_open+0x38/0x68
> [<c1057e33>] sys_open+0x16/0x18
> [<c1002845>] sysenter_past_esp+0x56/0x8d
OK, I'm having a look here; its all new to me so bear with me.
blkdev_open() calls
do_open(bdev, ...,BD_MUTEX_NORMAL) and takes
mutex_lock_nested(&bdev->bd_mutex, BD_MUTEX_NORMAL)
then something fails, and we're thrown to:
out_first: where
if (bdev != bdev->bd_contains)
blkdev_put(bdev->bd_contains) which is
__blkdev_put(bdev->bd_contains, BD_MUTEX_NORMAL) which does
mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_NORMAL) <--- lockdep trigger
When going to out_first, dbev->bd_contains is either bdev or whole, and
since we take the branch it must be whole. So it seems to me the
following patch would be the right one:
[akpm@osdl.org: compile fix]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The current sun disklabel code uses a signed int for the sector count.
When partitions larger than 1 TB are used, the cast to a sector_t causes
the partition sizes to be invalid:
# cat /proc/paritions | grep sdan
66 112 2146435072 sdan
66 115 9223372036853660736 sdan3
66 120 9223372036853660736 sdan8
This patch switches the sector count to an unsigned int to fix this.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows the simultaneous mounting of gfs2meta and gfs2
filesystems. A restriction however is that a gfs2meta fs may only be
mounted if its corresponding gfs2 filesystem is also mounted. Also, a
gfs2 filesystem cannot be unmounted before its gfs2meta filesystem.
Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
When a new lockspace was being created, the recoverd thread was being
started for it before the lockspace was added to the global list of
lockspaces. The new thread was looking up the lockspace in the global
list and sometimes not finding it due to the race with the original thread
adding it to the list. We need to add the lockspace to the global list
before starting the thread instead of after, and if the new thread can't
find the lockspace for some reason, it should return an error.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
log_refund() incorrectly assumed that if a transaction had been touched, it
always committed buffers to the incore log. Thus, when you got around to
flushing the log, you would need one more block than you committed, to account
for the header. So it automatically set reserved to 1, which had the effect of
making sdp->sd_log_blks_reserved one greater when you got to gfs2_log_flush().
However, if you don't actually commit anything to the incore log between
flushes, you don't need the header, because you aren't writing anything out.
With this patch, log_refund() only increments reservered to account for the
header if something has been committed since the last flush.
Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
I noticed the gfs2_scand seemed to be taking a lot of CPU,
so in order to cut that down a bit, here is a patch. Firstly
the type of a glock is a constant during its lifetime, so that
its possible to check this without needing locking. I've moved
the (common) case of testing for an inode glock outside of
the glmutex lock.
Also there was a mutex left over from when the glock cache was
master of the inode cache. That isn't required any more so I've
removed that too.
There is probably scope for further speed ups in the future
in this area.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This should clarify the logic in gfs2_releasepage() relating to
error handling as well as making the response to errors a bit
more graceful.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The check in open_exec() for inode->i_mode & 0111 has been made
redundant by the fix to permission().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 1d3741c5d991686699f100b65b9956f7ee7ae0ae commit)
The check in prepare_binfmt() for inode->i_mode & 0111 is redundant,
since open_exec() will already have done that.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 822dec482ced07af32c378cd936d77345786572b commit)
Currently, the access() call will return incorrect information on NFS if
there exists an ACL that grants execute access to the user on a regular
file. The reason the information is incorrect is that the VFS overrides
this execute access in open_exec() by checking (inode->i_mode & 0111).
This patch propagates the VFS execute bit check back into the generic
permission() call.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 64cbae98848c4c99851cb0a405f0b4982cd76c1e commit)
This is needed in order to handle any NFS4ERR_DELAY errors that might be
returned by the server. It also ensures that we map the NFSv4 errors before
they are returned to userland.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 71c12b3f0abc7501f6ed231a6d17bc9c05a238dc commit)
Check the bounds of length specifiers more thoroughly in the XDR decoding of
NFS4 readdir reply data.
Currently, if the server returns a bitmap or attr length that causes the
current decode point pointer to wrap, this could go undetected (consider a
small "negative" length on a 32-bit machine).
Also add a check into the main XDR decode handler to make sure that the amount
of data is a multiple of four bytes (as specified by RFC-1014). This makes
sure that we can do u32* pointer subtraction in the NFS client without risking
an undefined result (the result is undefined if the pointers are not correctly
aligned with respect to one another).
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 5861fddd64a7eaf7e8b1a9997455a24e7f688092 commit)
The problem is that we may be caching writes that would extend the file and
create a hole in the region that we are reading. In this case, we need to
detect the eof from the server, ensure that we zero out the pages that
are part of the hole and mark them as up to date.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 856b603b01b99146918c093969b6cb1b1b0f1c01 commit)
nlm_traverse_files() is not allowed to hold the nlm_file_mutex while calling
nlm_inspect file, since it may end up calling nlm_release_file() when
releaseing the blocks.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from e558d3cde986e04f68afe8c790ad68ef4b94587a commit)
rpc_unlink() and rpc_rmdir() will dput the dentry reference for you.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from a05a57effa71a1f67ccbfc52335c10c8b85f3f6a commit)
I'm trying to speeding up mkdir(2) for network file systems. A typical
mkdir(2) calls two inode_operations: lookup and mkdir. The lookup
operation would fail with ENOENT in common case. I think it is unnecessary
because the subsequent mkdir operation can check it. In case of creat(2),
lookup operation is called with the LOOKUP_CREATE flag, so individual
filesystem can omit real lookup. e.g. nfs_lookup().
Here is a sample patch which uses LOOKUP_CREATE and O_EXCL on mkdir,
symlink and mknod. This uses the gadget for creat(2).
And here is the result of a benchmark on NFSv3.
mkdir(2) 10,000 times:
original 50.5 sec
patched 29.0 sec
Signed-off-by: ASANO Masahiro <masano@tnes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from fab7bf44449b29f9d5572a5dd8adcf7c91d5bf0f commit)
nfs_wb_page() waits on request completion and, as a result, is not safe to be
called from nfs_release_page() invoked by VM scanner as part of GFP_NOFS
allocation. Fix possible deadlock by analyzing gfp mask and refusing to
release page if __GFP_FS is not set.
Signed-off-by: Nikita Danilov <danilov@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 374d969debfb290bafcb41d28918dc6f7e43ce31 commit)
When there are no locks on a resource, the recover_locks() function fails
to clear the NEW_MASTER flag by going directly to out, missing the line
that clears the flag.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
When a status reply is sent for a lockspace that doesn't yet exist, the
message sequence number from the sender was not being copied into the
reply causing the sender to ignore the reply.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The down-conversion optimization was resulting in the lkb flags being
cleared because the stub message reply had no flags value set. Copy the
current flags into the stub message so they'll be copied back into the lkb
as part of processing the fake reply. Also add an assertion to catch this
error more directly if it exists elsewhere.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Oh, and here's (hopefully) the last of these ua_tmp patches. I think I've
caught all the paths now. Sorry it didn't make the last one.
Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch fixes bz#203444 where the LKSB was lost during userland conversion
operations
Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
UDF code is not really ready to handle extents larger that 1GB. This is
the easy way to forbid creating those.
Also truncation code did not count with the case when there are no
extents in the file and we are extending the file.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
A list_del should have been a list_del_init in lops.c which was
resulting in incorrect status returns from list_empty().
Signed-off-by: Steven Whitheouse <swhiteho@redhat.com>
Introduce new function dlm_dump_rsb() to call within assertions instead of
dlm_print_rsb(). The new function dumps info about all locks on the rsb
in addition to rsb details.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
I know nothing about io scheduler, but I suspect set_task_ioprio() is not safe.
current_io_context() initializes "struct io_context", then sets ->io_context.
set_task_ioprio() running on another cpu may see the changes out of order, so
->set_ioprio(ioc) may use io_context which was not initialized properly.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Jens Axboe <axboe@suse.de>
From include/linux/sched.h:
* Careful: do_each_thread/while_each_thread is a double loop so
* 'break' will not work as expected - use goto instead.
*/
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Jens Axboe <axboe@suse.de>
This fixes a memory leak of struct gfs2_bufdata and also some
problems in the ordered write handling code. It needs a bit
more testing, but I believe that the reference counting of
ordered write buffers should now be correct.
This is aimed at fixing Red Hat bugzilla: #201028 and #201082
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch removes some obvious dead code spotted by the Coverity
checker.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
le16 compared to host-endian constant
u8 fed to le32_to_cpu()
le16 compared to host-endian constant
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <sfrench@us.ibm.com>
fcntl(F_SETSIG) no longer works on leases because
lease_release_private_callback() gets called as the lease is copied in
order to initialise it.
The problem is that lease_alloc() performs an unnecessary initialisation,
which sets the lease_manager_ops. Avoid the problem by allocating the
target lease structure using locks_alloc_lock().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Don't let fuse_readpages leave the @pages list not empty when exiting
on error.
[akpm@osdl.org: kernel-doc fixes]
Signed-off-by: Alexander Zarochentsev <zam@namesys.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Eric says:
> I saw an oops down this path when trying to create a new file on a UDF
> filesystem which was internally marked as readonly, but mounted rw:
>
> udf_create
> udf_new_inode
> new_inode
> alloc_inode
> udf_alloc_inode
> udf_new_block
> returns EIO due to readonlyness
> iput (on error)
I ran into the same issue today, but when listing a directory with
invalid/corrupt entries:
udf_lookup
udf_iget
get_new_inode_fast
alloc_inode
udf_alloc_inode
__udf_read_inode
fails for any reason
iput (on error)
...
The following patch to udf_alloc_inode() should take care of both (and
other similar) cases, but I've only tested it with udf_lookup().
Signed-off-by: Dan Bastone <dan@pwienterprises.com>
Cc: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Don't use NULL as a printf control string. Fixes bug #6889.
Cc: Ralph Corderoy <ralph@inputplus.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Allow Windows blocking locks to be cancelled via a
CANCEL_LOCK call. TODO - restrict this to servers
that support NT_STATUS codes (Win9x will probably
not support this call).
Signed-off-by: Jeremy Allison <jra@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
(cherry picked from 570d4d2d895569825d0d017d4e76b51138f68864 commit)
Make cifsd allow us to suspend if it has lost the connection with a server
Ref: http://bugzilla.kernel.org/show_bug.cgi?id=6811
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Steve French <sfrench@us.ibm.com>
(cherry picked from 27bd6cd87b0ada66515ad49bc346d77d1e9d3e05 commit)
Although harmless, we were sometimes treating midState like it contained
flags but they are exclusive states, and this makes that more clear.
Signed-off-by: Jeremy Allison <jra@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
(cherry picked from 586c057c3a68dd6ae0f3ba94fbf76798b1558074 commit)
Signed-off-by: Jeremy Allison <jra@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
(cherry picked from b33a3f55e54fd210fc043eafcf83728b03bc9e02 commit)
request and do not time out slow requests to a server that is still responding
well to other threads
Suggested by jra of Samba team
Signed-off-by: Steve French <sfrench@us.ibm.com>
(cherry picked from 89b57148115479eef074b8d3f86c4c86c96ac969 commit)
Doing the kmap() while holding the spinlock was causing recursive spinlock
problems. It seems the kmap was scheduling, although there was no warning
as I'd expect. Patrick, do we need locking around the kmap?
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
recovery.c add a brelse to deal with gfs2_replay_read_block being called
twice on the same block.
add a dput to drop the ref count on the root inode.
This was causing lingering glocks and thus causing
a mount failure to hang.
Fix a endian conversion macro that was was swizzling
16bits when it should have been swizzling 32.
Signed-off-by: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We recently fixed an out-of-space deadlock in XFS, and part of that fix
involved the addition of the XFS_ALLOC_FLAG_FREEING flag to some of the
space allocator calls to indicate they're freeing space, not allocating
it. There was a missed xfs_alloc_fix_freelist condition test that did not
correctly test "flags". The same test would also test an uninitialised
structure field (args->userdata) and depending on its value either would
or would not return early with a critical buffer pointer set to NULL.
This fixes that up, adds asserts to several places to catch future botches
of this nature, and skips sections of xfs_alloc_fix_freelist that are
irrelevent for the space-freeing case.
SGI-PV: 955303
SGI-Modid: xfs-linux-melb:xfs-kern:26743a
Signed-off-by: Nathan Scott <nathans@sgi.com>
When recoveries are aborted by other recoveries we can get replies to
status or names requests that we've given up on. This can cause problems
if we're making another request and receive an old reply. Add a sequence
number to status/names requests and reject replies that don't match. A
field already exists for the seq number that's used in other message
types.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
To aid debugging, it's useful to be able to see what nodeid the dlm is
waiting on for a message reply.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
When the debug buffer has filled up, break from the loop and return the
correct number of bytes that have been written.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
When we abort one recovery to do another, break out of the ping_members()
routine more quickly, and wake up the dlm_recoverd thread more quickly
instead of waiting for it to time out.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Print the violating name length in the assertion.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
In some cases we can enter write page without there being buffers
attached to the page. In this case the function to add gfs2_bufdata
to the buffers fails sliently causing further failures down the
stack.
This fix ensures that we always add buffers in writepage if they
didn't already exist (mmap is one way to trigger this).
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch fixes the userland DLM unlock code so that it correctly returns the
address of the userland lock status block in its completion AST.
It fixes bug #201348
Patrick
Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Record the most recently used allocation group on the allocation context, so
that subsequent allocations can attempt to optimize for contiguousness.
Local alloc especially should benefit from this as the current chain search
tends to let it spew across the disk.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Try to catch corrupted group descriptors with some stronger checks placed in
a couple of strategic locations. Detect a failed resizefs and refuse to
allocate past what bitmap i_clusters allows.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
We were storing cluster count on the ocfs2_super structure, but never
actually using it so remove that. Also, we don't want to populate the
uptodate cache with the unlocked block read - it is technically safe as is,
but we should change it for correctness.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This patch removes the unused EXPORT_SYMBOL_GPL(dlm_migrate_lockres).
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
If a process requests a lock cancel but the lock has been remotely granted
already then there is no need to send the cancel message.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This can race with other ast notification, which can cause bad status values
to propagate into the unlock ast.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Properly ignore LVB flags during a PR downconvert. This avoids an illegal
lvb update.
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
When the result of a posix lock request is read it needs to be matched up
with the correct waiting request. The owner field needs to be used in the
comparison since more than one process may be waiting for locks on the
same file.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Use the gfs2_ prefix on the register/unregister functions for the lock
modules. The gfs_ prefix was left from an old idea on how to share these
with gfs1.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
I saw an oops down this path when trying to create a new file on a UDF
filesystem which was internally marked as readonly, but mounted rw:
udf_create
udf_new_inode
new_inode
alloc_inode
udf_alloc_inode
udf_new_block
returns EIO due to readonlyness
iput (on error)
udf_put_inode
udf_discard_prealloc
udf_next_aext
udf_current_aext
udf_get_fileshortad
OOPS
the udf_discard_prealloc() path was examining uninitialized fields of the
udf inode.
udf_discard_prealloc() already has this code to short-circuit the discard
path if no extents are preallocated:
if (UDF_I_ALLOCTYPE(inode) == ICBTAG_FLAG_AD_IN_ICB ||
inode->i_size == UDF_I_LENEXTENTS(inode))
{
return;
}
so if we initialize UDF_I_LENEXTENTS(inode) = 0 earlier in udf_new_inode,
we won't try to free the (not) preallocated blocks, since this will match
the i_size = 0 set when the inode was initialized.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
reiserfs_write_full_page does zero bytes in the file past eof, but it may
call get_block on those buffers as well. On machines where the page size
is larger than the blocksize, this can result in mmaped files incorrectly
growing up to a block boundary during writepage.
The fix is to avoid calling get_block for any blocks that are entirely past
eof
Signed-off-by: Chris Mason <mason@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In bugzilla #6941, Jens Kilian reported:
"The function befs_utf2nls (in fs/befs/linuxvfs.c) writes a 0 byte past the
end of a block of memory allocated via kmalloc(), leading to memory
corruption. This happens only for filenames which are pure ASCII and a
multiple of 4 bytes in length. [...]
Without DEBUG_SLAB, this leads to further corruption and hard lockups; I
believe this is the bug which has made kernels later than 2.6.8 unusable
for me. (This must be due to changes in memory management, the bug has
been in the BeFS driver since the time it was introduced (AFAICT).)
Steps to reproduce:
Create a directory (in BeOS, naturally :-) with files named, e.g.,
"1", "22", "333", "4444", ... Mount it in Linux and do an "ls" or "find""
This patch implements the suggested fix. Credits to Jens Kilian for
debugging the problem and finding the right fix.
Signed-off-by: Diego Calleja <diegocg@gmail.com>
Cc: Jens Kilian <jjk@acm.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ufs_get_locked_page is called twice in ufs code, one time in ufs_truncate
path(we allocated last block), and another time when fragments are
reallocated. In ideal world in the second case on allocation/free block
layer we should not know that things like `truncate' exists, but now with
such crutch like ufs_get_locked_page we can (or should?) skip truncated
pages.
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As discussed earlier:
http://lkml.org/lkml/2006/6/28/136
this patch fixes such issue:
`ufs_get_locked_page' takes page from cache
after that `vmtruncate' takes page and deletes it from cache
`ufs_get_locked_page' locks page, and reports about EIO error.
Also because of find_lock_page always return valid page or NULL, we have no
need to check it if page not NULL.
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mmapped files were able to trigger a lock ordering bug. Private
maps do not need to take the glock so early on. Shared maps do
unfortunately, however we can get around that by adding a flag
into the flags for the struct gfs2_file. This only works because
we are taking an exclusive lock at this point, so we know that
nobody else can be racing with us.
Fixes Red Hat bugzilla: #201196
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We never actually set the b_done field any more; it's always zero.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from af8412d4283ef91356e65e0ed9b025b376aebded commit)
nfs_writedata_free() and nfs_readdata_free() can now become static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 5e1ce40f0c3c8f67591aff17756930d7a18ceb1a commit)
In one of the error paths of nfs_path, it may return with dcache_lock still
held; fix this by adding and using a new error path Elong_unlock which unlocks
dcache_lock.
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from f4b90b43677fb23297c56802c3056fc304f988d9 commit)
When an object is created via a symlink into an audited directory, audit misses
the event due to not having collected the inode data for the directory. Modify
__audit_inode_child() to copy the parent inode data if a parent wasn't found in
audit_names[].
Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
When the specified path is an existing file or when it is a symlink, audit
collects the wrong inode number, which causes it to miss the open() event.
Adding a second hook to the open() path fixes this.
Also add audit_copy_inode() to consolidate some code.
Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This was a nasty bug which resulted in corruption of hash tables
in the directory code with larger directories. We forgot to
increment a pointer in the read/write routines internal to the
directory code.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Based on a bug report from Russ Ross <russruss@gmail.com>
According to the spec:
"The remove request asks the file server both to remove the file
represented by fid and to clunk the fid, even if the remove fails."
but the Linux client seems to expect the fid to be valid after a failed
remove attempt. Specifically, I'm getting this behavior when attempting to
remove a non-empty directory.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Russ Ross <russross@gmail.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For files other than IFREG, nobh option doesn't make sense. Modifications
to them are journalled and needs buffer heads to do that. Without this
patch, we get kernel oops in page_buffers().
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Commit 7b2fd697427e73c81d5fa659efd91bd07d303b0e in the historical GIT tree
stopped calling the readdir member of a file_operations struct with the big
kernel lock held, and fixed up all the readdir functions to do their own
locking. However, that change added calls to unlock_kernel() in
vxfs_readdir, but no call to lock_kernel(). Fix this by adding a call to
lock_kernel().
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It is entirely possible (though rare) that jiffies half-wraps around, while a
dentry/inode remains in the cache. This could mean that the dentry/inode is
not invalidated for another half wraparound-time.
To get around this problem, use 64-bit jiffies. The only problem with this is
that dentry->d_time is 32 bits on 32-bit archs. So use d_fsdata as the high
32 bits. This is an ugly hack, but far simpler, than having to allocate
private data just for this purpose.
Since 64-bit jiffies can be assumed never to wrap around, simple comparison
can be used, and a zero time value can represent "invalid".
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
An attribute and entry timeout of zero should mean, that the entity is
invalidated immediately after the operation. Previously invalidation only
happened at the next clock tick.
Reported and tested by Craig Davies.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ufs_symlink, in one of its error paths, calls unlock_kernel without ever
having called lock_kernel(); fix this by creating and jumping to a new
label out_notlocked rather than the out label used after calling
lock_kernel().
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If efs_symlink_readpage hits the -ENAMETOOLONG error path, it will call
unlock_kernel without ever having called lock_kernel(); fix this by
creating and jumping to a new label fail_notlocked rather than the fail
label used after calling lock_kernel().
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Cc: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Commit 398c53a757702e1e3a7a2c24860c7ad26acb53ed (in the historical GIT
tree) moved the lock_kernel() in coda_open after the allocation of a
coda_file_info struct, but left an unlock_kernel() in the allocation
failure error path; remove it.
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Acked-by: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a real deadlock, a nice complex one:
(warning: long explanation follows so that Andrew can have a complete
patch description)
it's an ABCDA deadlock:
A iprune_mutex
B inode->inotify_mutex
C ih->mutex
D dev->ev_mutex
The AB relationship comes straight from invalidate_inodes()
int invalidate_inodes(struct super_block * sb)
{
int busy;
LIST_HEAD(throw_away);
mutex_lock(&iprune_mutex);
spin_lock(&inode_lock);
inotify_unmount_inodes(&sb->s_inodes);
where inotify_umount_inodes() takes the
mutex_lock(&inode->inotify_mutex);
The BC relationship comes directly from inotify_find_update_watch():
s32 inotify_find_update_watch(struct inotify_handle *ih, struct inode *inode,
u32 mask)
{
...
mutex_lock(&inode->inotify_mutex);
mutex_lock(&ih->mutex);
The CD relationship comes from inotify_rm_wd:
inotify_rm_wd does
mutex_lock(&inode->inotify_mutex);
mutex_lock(&ih->mutex)
and then calls inotify_remove_watch_locked() which calls
notify_dev_queue_event() which does
mutex_lock(&dev->ev_mutex);
(this strictly is a BCD relationship)
The DA relationship comes from the most interesting part:
[<ffffffff8022d9f2>] shrink_icache_memory+0x42/0x270
[<ffffffff80240dc4>] shrink_slab+0x11d/0x1c9
[<ffffffff802b5104>] try_to_free_pages+0x187/0x244
[<ffffffff8020efed>] __alloc_pages+0x1cd/0x2e0
[<ffffffff8025e1f8>] cache_alloc_refill+0x3f8/0x821
[<ffffffff8020a5e5>] kmem_cache_alloc+0x85/0xcb
[<ffffffff802db027>] kernel_event+0x2e/0x122
[<ffffffff8021d61c>] inotify_dev_queue_event+0xcc/0x140
inotify_dev_queue_event schedules a kernel_event which does a
kmem_cache_alloc( , GFP_KERNEL) which may try to shrink slabs, including
the inode cache .. which then takes iprune_mutex.
And voila, there is an AB, a BC, a CD relationship (even a direct BCD),
and also now a DA relationship -> a circular type AB-BA deadlock but
involving 4 locks.
The solution is simple: kernel_event() is NOT allowed to use GFP_KERNEL,
but must use GFP_NOFS to not cause recursion into the VFS.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Robert Love <rml@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Enable mac partition table support per default also for a powermac config.
Signed-off-by: Olaf Hering <olh@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We can immediately bail from invalidate_bdev() if the blockdev has no
pagecache.
This solves the huge IPI storms which hald is causing on the big ia64
machines when it polls CDROM drives.
Acked-by: Jes Sorensen <jes@sgi.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A recent commit (7fc90ec93a) moved the
call to nfsd_setuser out of the 'find a dentry for a filehandle' branch
of fh_verify so that it would always be called.
This had the unfortunately side-effect of moving *after* the call to
decode_fh, so the prober fsuid was not set when nfsd_acceptable was called,
the 'permission' check did the wrong thing.
This patch moves the nfsd_setuser call back where it was, and add as call
in the other branch of the if.
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The inode number out of an NFS file handle gets passed eventually to
ext3_get_inode_block() without any checking. If ext3_get_inode_block()
allows it to trigger an error, then bad filehandles can have unpleasant
effect - ext3_error() will usually cause a forced read-only remount, or a
panic if `errors=panic' was used.
So remove the call to ext3_error there and put a matching check in
ext3/namei.c where inode numbers are read off storage.
[akpm@osdl.org: fix off-by-one error]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: <stable@kernel.org>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Cc: Eric Sandeen <esandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We need to use fl_owner instead of fl_pid to track the owner of a posix
lock. Pass the owner value out to user space where cluster plocks are
managed.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Tidy up some files and remove an unused routine in meta_io.h. Also
added a bit of extra debugging in meta_io.h.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
flags from iclog buffers before submitting them for writing.
SGI-PV: 954772
SGI-Modid: xfs-linux-melb:xfs-kern:26605a
Signed-off-by: Nathan Scott <nathans@sgi.com>
Before putting them into struct statfs they should be endian-swapped.
SGI-PV: 954580
SGI-Modid: xfs-linux-melb:xfs-kern:26550a
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nathan Scott <nathans@sgi.com>
This means that we don't need to create a special inode just to contain
a struct address_space in order to read a single disk block. Instead
we read the disk block directly. Its slightly faster, and uses slightly
less memory, but the real reason for doing this is that it removes a
special case from the glock code.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
jfs_quota_read/write are very near duplicates of ext2_quota_read/write.
Cleaned up jfs_get_block as long as I had to change it to be non-static.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
> I think you must have an old version of the base kernel as well?
> i_private no longer exists in struct inode, so you'll have to use
> something else,
I have that patch in my stack but didn't send it; for some reason I
thought it was already changed in your git tree.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
On Wed, Jul 26, 2006 at 10:47:14AM +0100, Steven Whitehouse wrote:
> Hi,
>
> I've applied all the patches you sent, but they don't build:
Argh, sorry about that... when I fixed these a long time ago they somehow
never got included in the quilt patches. I mistakenly assumed the quilt
patches matched the source I had in front of me.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The remaining routines in page.c were all only used in one other
file, so they are now moved into the files where they are referenced
and made static. Thus page.[ch] are no longer required.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Tidy up gfs2_unstuffer_page by:
a) Moving it into bmap.c
b) Making it static
c) Calling it directly from gfs2_unstuff_dinode
d) Updating all callers of gfs2_unstuff_dinode due to one less
required argument.
It doesn't change the behaviour at all.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The loop through all waiting locks in recover_waiters can potentially be
long, so we should schedule explicitly.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The loop in grant_after_purge is intended to find all rsb's in each hash
bucket that have the LOCKS_PURGED flag set. The loop was quitting the
current bucket after finding just one rsb instead of going until there are
no more.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
If a node becomes the new master of an rsb during recovery, the
LOCKS_PURGED flag needs to be set on it so that any waiting/converting
locks will try to be granted.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Display more information from debugfs, particularly locks waiting for
a master lookup or operations waiting for a remote reply.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per comments received, alter the GFS2 direct I/O path so that
it uses the standard read functions "out of the box". Needs a
small change to one of the VFS functions. This reduces the size
of the code quite a lot and also removes the need for one new export.
Some more work remains to be done, but this is the bones of the
thing.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
traced the "umount hang due to spurious glock" issue that I was having
with gfs2meta. It's in the do_gfs2_set_flags function, which does a
gfs2_holder_init as well as a gfs2_glock_nq_init (increases ref count by
2 instead of 1).
Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Typo causes the error value from the wrong lock to be checked.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
User NOQUEUE lock requests to a remote node that failed with -EAGAIN were
never being removed from a process's list of locks.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
On Thu, Jul 13, 2006 at 10:48:00PM -0700, Andrew Morton wrote:
>...
> Changes since 2.6.18-rc1-mm1:
>...
> git-gfs2.patch
>...
> git trees.
>...
This patch removes the unused EXPORT_SYMBOL_GPL(dlm_lvb_operations).
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Fix an endian coversion bug in log.c spotted by Kevin Anderson.
Cc: Kevin Anderson <kanderso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Fix a use after free bug in dir.c spotted by Kevin Anderson.
Cc: Kevin Anderson <kanderso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This just turns off chmod() on the /proc/<pid>/ files, since there is no
good reason to allow it, and had we disallowed it originally, the nasty
/proc race exploit wouldn't have been possible.
The other patches already fixed the problem chmod() could cause, so this
is really just some final mop-up..
This particular version is based off a patch by Eugene and Marcel which
had much better naming than my original equivalent one.
Signed-off-by: Eugene Teo <eteo@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Export I/O delays seen by a task through /proc/<tgid>/stats for use in top
etc.
Note that delays for I/O done for swapping in pages (swapin I/O) is clubbed
together with all other I/O here (this is not the case in the netlink
interface where the swapin I/O is kept distinct)
[akpm@osdl.org: printk warning fix]
Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
Cc: Erich Focht <efocht@ess.nec.de>
Cc: Levent Serinol <lserinol@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On systems with block devices containing a slash (virtual dasd, cciss,
etc), reiserfs will fail to initialize /proc/fs/reiserfs/<dev> due to it
being interpreted as a subdirectory. The generic block device code changes
the / to ! for use in the sysfs tree. This patch uses that convention.
Tested by making dm devices use dm/<number> rather than dm-<number>
[akpm@osdl.org: name variables consistently]
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2.6.16 leaks like hell. While testing, I found massive leakage
(reproduced in openvz) in:
*filp
*size-4096
And 1 object leaks in
*size-32
*size-64
*size-128
It is the fix for the first one. filp leaks in the bowels of namei.c.
Seems, size-4096 is file table leaking in expand_fdtables.
I have no idea what are the rest and why they show only accompanying
another leaks. Some debugging structs?
[akpm@osdl.org, Trond: remove the IS_ERR() check]
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: <stable@kernel.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We have a bad interaction with both the kernel and user space being able
to change some of the /proc file status. This fixes the most obvious
part of it, but I expect we'll also make it harder for users to modify
even their "own" files in /proc.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This changes the way the dlm handles user locks. The core dlm is now
aware of user locks so they can be dealt with more efficiently. There is
no more dlm_device module which previously managed its own duplicate copy
of every user lock.
Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Update the NFS filehandles so that they contain the file type.
Signed-off-by: Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We're supposed to go the next power of two if nfds==nr.
Of `nr', not of `nfsd'.
Spotted by Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Address a potential 'larger than buffer size' memory access by
clear_user(). Without this patch, this call to clear_user() can attempt to
clear too many (tsz) bytes resulting in a wrong (-EFAULT) return code by
read_kcore().
Signed-off-by: Adam B. Jerome <abj@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sysfs has a different i_mutex lock order behavior for i_mutex than the
other filesystems; sysfs i_mutex is called in many places with subsystem
locks held. At the same time, many of the VFS locking rules do not apply
to sysfs at all (cross directory rename for example). To untangle this
mess (which gives false positives in lockdep), we're giving sysfs inodes
their own class for i_mutex.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When found, it is obvious. nfds calculated when allocating fdsets is
rewritten by calculation of size of fdtable, and when we are unlucky, we
try to free fdsets of wrong size.
Found due to OpenVZ resource management (User Beancounters).
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A missing initialisation when creating a new on disk inode.
Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We must not call GFP_KERNEL memory allocations while we
are holding the log lock (read or write) since that may
trigger a log flush resulting in a deadlock.
Eventually we need to fix the locking in log.c, for now
this solves the problem at the expense of freeing up memory
as fast as we would like to. This needs to be revisited
later on.
Cc: Kevin Anderson <kanderso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This adds a generation number for the eventual use of NFS to the
ondisk inode. Its backward compatible with the current code since
it doesn't really matter what the generation number is to start with,
and indeed since its set to zero, due to it being taken from padding
in both the inode and rgrp header, it should be fine.
The eventual plan is to use this rather than no_formal_ino in the
NFS filehandles. At that point no_formal_ino will be unused.
At the same time we also add a releasepages call back to the
"normal" address space for gfs2 inodes. Also I've removed a
one-linrer function thats not required any more.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Add an nfs4 operations count array to nfsd_stats structure. The count is
incremented in nfsd4_proc_compound() where all the operations are handled
by the nfsv4 server. This count of individual nfsv4 operations is also
entered into /proc filesystem.
Signed-off-by: Shankar Anand<shanand@novell.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These functions no longer exist; remove their declarations.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's a fairly obvious infinite loop in there.
Also, use roundup_pow_of_two() rather than open-coding stuff.
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add coredump capability for the ELF-FDPIC binfmt.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move the roundup() macro from binfmt_elf.c into linux/kernel.h as it's
generally useful.
[akpm@osdl.org: nuke all the other implementations]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adjust the ELF-FDPIC binfmt driver to conform much more to the CodingStyle,
silly though it may be.
Further changes:
(*) Drop the casts to long for addresses in kdebug() statements (they're
unsigned long already).
(*) Use extra variables to avoid expressions longer than 80 chars by splitting
the statement into multiple statements and letting the compiler optimise
them back together.
(*) Eliminate duplicate call of ksize() when working out how much space was
actually allocated for the stack.
(*) Discard the commented-out load_shlib prototype and op pointer as this will
not be supported in ELF-FDPIC for the foreseeable future.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix execution through the FDPIC binfmt of programs stored on ramfs by
preventing the ramfs mmap() returning successfully on a private mapping of
a ramfs file. This causes NOMMU mmap to make a copy of the mapped portion
of the file and map that instead.
This could be improved by granting direct mapping access to read-only
private mappings for which the data is stored on a contiguous run of pages.
However, this is only likely to be the case if the file was extended with
truncate before being written.
ramfs is left to map the file directly for shared mappings so that SYSV IPC
and POSIX shared memory both still work.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix FDPIC compile errors.
(akpm: we suspect it fixes a warning)
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sometimes, applications need below call to be successful although
"/mnt/hugepages/file1" doesn't exist.
fd = open("/mnt/hugepages/file1", O_CREAT|O_RDWR, 0755);
*addr = mmap(NULL, 0x1024*1024*256, PROT_NONE, 0, fd, 0);
As for regular pages (or files), above call does work, but as for huge
pages, above call would fail because hugetlbfs_file_mmap would fail if
(!(vma->vm_flags & VM_WRITE) && len > inode->i_size).
This capability on huge page is useful on ia64 when the process wants to
protect one area on region 4, so other threads couldn't read/write this
area. A famous JVM (Java Virtual Machine) implementation on IA64 needs the
capability.
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Hugh Dickins <hugh@veritas.com>
[ Expand-on-mmap semantics again... this time matching normal fs's. wli ]
Acked-by: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch marks an unused export as EXPORT_UNUSED_SYMBOL.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change the partition code in fs/partitions/check.c to initialize a newly
detected partition's policy field with that of the containing block device
(see patch below).
My reasoning is that function set_disk_ro() in block/genhd.c modifies the
policy field (read-only indicator) of a disk and all contained partitions.
When a partition is detected after the call to set_disk_ro(), the policy
field of this partition will currently not inherit the disk's policy field.
This behavior poses a problem in cases where a block device can be
'logically de- and reactivated' like e.g. the s390 DASD driver because
partition detection may run after the policy field has been modified.
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Acked-by: Al Viro <viro@ftp.linux.org.uk>
Makes-sense-to: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When write() extends a file(i_size is increased) and fsync() is called,
change of inode must be written to journaling area through fsync().
But,currently the i_trans_id is not correctly updated when i_size is
increased. So fsync() does not kick the journal writer.
Reiserfs_file_write() already updates the transaction when blocks are
allocated, but the case when i_size increases and new blocks are not added
is not correctly treated.
Following patch fix this bug.
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <mason@suse.com>
Cc: Hans Reiser <reiser@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes a bug where we were releasing a page incorrectly
sometimes when reading a stuffed file. This fixes the bug
that Kevin reported when using Xen.
Cc: Kevin Anderson <kanderso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Several issues noticed/fixed:
- We cannot reliably block in link_pipe() while holding both input and output
mutexes. So do preparatory checks before locking down both mutexes and doing
the link.
- The ipipe->nrbufs vs i check was bad, because we could have dropped the
ipipe lock in-between. This causes us to potentially look at unknown
buffers if we were racing with someone else reading this pipe.
Signed-off-by: Jens Axboe <axboe@suse.de>
This patch makes the needlessly global jffs2_obsolete_node_frag()
static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
This really is the correct fix this time. We just ignore all
glocks associated with inodes until the inodes are pushed
from the inode cache. At that point the glocks are queued for
reclaim, so we don't need to do it here.
Also fix one or two other minor bugs.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
In the case when compiling via a symlink tree, we want to ensure that the
close-to-open GETATTR call is applied only to the final file, and not to
the symlink.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The introduction of the FLUSH_INVALIDATE argument to nfs_sync_inode_wait()
does not clear the nr_unstable page state counter for pages that are being
released.
Also fix a longstanding similar bug when nfs_commit_list() fails.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Under certain circumstances the glock scanning logic would
demote locks which ought not to have been selected for
demotion.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Use FL_ACCESS flag to test and/or wait for local locks before we try
requesting a lock from the server
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Use the new behaviour of {flock,posix}_file_lock(F_UNLCK) to determine if
we held a lock, and only send the RPC request to the server if this was the
case.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Change posix_lock_file_conf(), and flock_lock_file() so that if called
with an F_UNLCK argument, and the FL_EXISTS flag they will indicate
whether or not any locks were actually freed by returning 0 or -ENOENT.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Change our one existing old-style lock initialiser to a new-style
one. This allows the lock validator to work as intended.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
cleanup: remove task_t and convert all the uses to struct task_struct. I
introduced it for the scheduler anno and it was a mistake.
Conversion was mostly scripted, the result was reviewed and all
secondary whitespace and style impact (if any) was fixed up by hand.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Teach special (recursive) locking code to the lock validator.
Effects on non-lockdep kernels:
- the introduction of the following function variants:
extern struct block_device *open_partition_by_devnum(dev_t, unsigned);
extern int blkdev_put_partition(struct block_device *);
static int
blkdev_get_whole(struct block_device *bdev, mode_t mode, unsigned flags);
which on non-lockdep are the same as open_by_devnum(), blkdev_put()
and blkdev_get().
- a subclass parameter to do_open(). [unused on non-lockdep]
- a subclass parameter to __blkdev_put(), which is a new internal
function for the main blkdev_put*() functions. [parameter unused
on non-lockdep kernels, except for two sanity check WARN_ON()s]
these functions carry no semantical difference - they only express
object dependencies towards the lockdep subsystem.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The s_umount rwsem needs to be classified as per-superblock since it's
perfectly legit to keep multiple of those recursively in the VFS locking
rules.
Has no effect on non-lockdep kernels.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Teach special (per-filesystem) locking code to the lock validator.
Minimal effect on non-lockdep kernels: one extra parameter to alloc_super().
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The quota code plays interesting games with the lock ordering; to quote Jan:
| i_mutex of inode containing quota file is acquired after all other
| quota locks. i_mutex of all other inodes is acquired before quota
| locks. Quota code makes sure (by resetting inode operations and
| setting special flag on inode) that noone tries to enter quota code
| while holding i_mutex on a quota file...
The good news is that all of this special case i_mutex grabbing happens in the
(per filesystem) low level quota write function. For this special case we
need a new I_MUTEX_* nesting level, since this just entirely outside any of
the regular VFS locking rules for i_mutex. I trust Jan on his blue eyes that
this is not ever going to deadlock; and based on that the patch below is what
it takes to inform lockdep of these very interesting new locking rules.
The new locking rule for the I_MUTEX_QUOTA nesting level is that this is the
deepest possible level of nesting for i_mutex, and that this only should be
used in quota write (and possibly read) function of filesystems. This makes
the lock ordering of the I_MUTEX_* levels:
I_MUTEX_PARENT -> I_MUTEX_CHILD -> I_MUTEX_NORMAL -> I_MUTEX_QUOTA
Has no effect on non-lockdep kernels.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NTFS uses lots of type-opaque objects which acquire their true identity
runtime - so the lock validator needs to be helped in a couple of places to
figure out object types.
Many thanks to Anton Altaparmakov for giving lots of explanations about NTFS
locking rules.
Has no effect on non-lockdep kernels.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Teach special (recursive) locking code to the lock validator. Has no effect
on non-lockdep kernels.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Teach special (recursive) locking code to the lock validator. Has no effect
on non-lockdep kernels.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Teach special (rwsem-in-irq) locking code to the lock validator. Has no
effect on non-lockdep kernels.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Locking init improvement:
- introduce and use __SPIN_LOCK_UNLOCKED for array initializations,
to pass in the name string of locks, used by debugging
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>