Commit Graph

27750 Commits

Author SHA1 Message Date
David Howells
1acf0af9b9 VFS: Fix the banner comment on lookup_open()
Since commit 197e37d9, the banner comment on lookup_open() no longer matches
what the function returns.  It used to return a struct file pointer or NULL and
now it returns an integer and is passed the struct file pointer it is to use
amongst its arguments.  Update the comment to reflect this.

Also add a banner comment to atomic_open().

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:57 +04:00
Al Viro
312b63fba9 don't pass nameidata * to vfs_create()
all we want is a boolean flag, same as the method gets now

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:50 +04:00
Al Viro
ebfc3b49a7 don't pass nameidata to ->create()
boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:47 +04:00
Al Viro
72bd866a01 fs/namei.c: don't pass nameidata to __lookup_hash() and lookup_real()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:40 +04:00
Al Viro
00cd8dd3bf stop passing nameidata to ->lookup()
Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument.  And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:32 +04:00
Al Viro
201f956e43 fs/namei.c: don't pass namedata to lookup_dcache()
just the flags...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:25 +04:00
Al Viro
4ce16ef3fe fs/namei.c: don't pass nameidata to d_revalidate()
since the method wrapped by it doesn't need that anymore...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:21 +04:00
Al Viro
0b728e1911 stop passing nameidata * to ->d_revalidate()
Just the lookup flags.  Die, bastard, die...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:14 +04:00
Al Viro
fa3c56bbda fs/nfs/dir.c: switch to passing nd->flags instead of nd wherever possible
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:07 +04:00
Al Viro
facc3530fb nfs_lookup_verify_inode() - nd is *always* non-NULL here
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:02 +04:00
Al Viro
93420b40bb switch nfs_lookup_check_intent() away from nameidata
just pass the flags

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:57 +04:00
Al Viro
02e5180d99 do_dentry_open(): take initialization of file->f_path to caller
... and get rid of a couple of arguments and a pointless reassignment
in finish_open() case.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:54 +04:00
Al Viro
2a027e7a18 fold __dentry_open() into its sole caller
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:52 +04:00
Al Viro
96b7e579ad switch do_dentry_open() to returning int
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:49 +04:00
Al Viro
e45198a6ac make finish_no_open() return int
namely, 1 ;-)  That's what we want to return from ->atomic_open()
instances after finish_no_open().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:45 +04:00
Al Viro
2675a4eb6a fs/namei.c: get do_last() and friends return int
Same conventions as for ->atomic_open().  Trimmed the
forest of labels a bit, while we are at it...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:43 +04:00
Al Viro
30d9049474 kill struct opendata
Just pass struct file *.  Methods are happier that way...
There's no need to return struct file * from finish_open() now,
so let it return int.  Next: saner prototypes for parts in
namei.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:39 +04:00
Al Viro
a4a3bdd778 kill opendata->{mnt,dentry}
->filp->f_path is there for purpose...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:37 +04:00
Al Viro
d95852777b make ->atomic_open() return int
Change of calling conventions:
old		new
NULL		1
file		0
ERR_PTR(-ve)	-ve

Caller *knows* that struct file *; no need to return it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:35 +04:00
Al Viro
3d8a00d209 don't modify od->filp at all
make put_filp() conditional on flag set by finish_open()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:33 +04:00
Al Viro
47237687d7 ->atomic_open() prototype change - pass int * instead of bool *
... and let finish_open() report having opened the file via that sucker.
Next step: don't modify od->filp at all.

[AV: FILE_CREATE was already used by cifs; Miklos' fix folded]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:31 +04:00
Miklos Szeredi
a8277b9baa vfs: move O_DIRECT check to common code
Perform open_check_o_direct() in a common place in do_last after opening the
file.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:28 +04:00
Miklos Szeredi
f60dc3db6e vfs: do_last(): clean up retry
Move the lookup retry logic to the bottom of the function to make the normal
case simpler to read.

Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:26 +04:00
Miklos Szeredi
77d660a8a8 vfs: do_last(): clean up bool
Consistently use bool for boolean values in do_last().

Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:24 +04:00
Miklos Szeredi
e83db16722 vfs: do_last(): clean up labels
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:21 +04:00
Miklos Szeredi
aa4caadb70 vfs: do_last(): clean up error handling
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:20 +04:00
Miklos Szeredi
015c3bbcd8 vfs: remove open intents from nameidata
All users of open intents have been converted to use ->atomic_{open,create}.

This patch gets rid of nd->intent.open and related infrastructure.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:18 +04:00
Miklos Szeredi
e43ae79c54 9p: implement i_op->atomic_open()
Add an ->atomic_open implementation which replaces the atomic open+create
operation implemented via ->create.  No functionality is changed.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:17 +04:00
Miklos Szeredi
2d83bde9a1 ceph: implement i_op->atomic_open()
Add an ->atomic_open implementation which replaces the atomic lookup+open+create
operation implemented via ->lookup and ->create operations.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:16 +04:00
Miklos Szeredi
3819219b59 ceph: remove unused arg from ceph_lookup_open()
What was the purpose of this?

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:16 +04:00
Miklos Szeredi
d2c127197d cifs: implement i_op->atomic_open()
Add an ->atomic_open implementation which replaces the atomic lookup+open+create
operation implemented via ->lookup and ->create operations.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Steve French <sfrench@samba.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:15 +04:00
Miklos Szeredi
c8ccbe032f fuse: implement i_op->atomic_open()
Add an ->atomic_open implementation which replaces the atomic open+create
operation implemented via ->create.  No functionality is changed.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:14 +04:00
Miklos Szeredi
eda72afb9e nfs: don't use intents for checking atomic open
is_atomic_open() is now only used by nfs4_lookup_revalidate() to check whether
it's okay to skip normal revalidation.

It does a racy check for mount read-onlyness and falls back to normal
revalidation if the open would fail.  This makes little sense now that this
function isn't used for determining whether to actually open the file or not.

The d_mountpoint() check still makes sense since it is an indication that we
might be following a mount and so open may not revalidate the dentry.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:12 +04:00
Miklos Szeredi
50de348c36 nfs: don't use nd->intent.open.flags
Instead check LOOKUP_EXCL in nd->flags, which is basically what the open intent
flags were used for.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:10 +04:00
Miklos Szeredi
8867fe5899 nfs: clean up ->create in nfs_rpc_ops
Don't pass nfs_open_context() to ->create().  Only the NFS4 implementation
needed that and only because it wanted to return an open file using open
intents.  That task has been replaced by ->atomic_open so it is not necessary
anymore to pass the context to the create rpc operation.

Despite nfs4_proc_create apparently being okay with a NULL context it Oopses
somewhere down the call chain.  So allocate a context here.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:08 +04:00
Miklos Szeredi
0dd2b474d0 nfs: implement i_op->atomic_open()
Replace NFS4 specific ->lookup implementation with ->atomic_open impelementation
and use the generic nfs_lookup for other lookups.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:06 +04:00
Miklos Szeredi
d18e9008c3 vfs: add i_op->atomic_open()
Add a new inode operation which is called on the last component of an open.
Using this the filesystem can look up, possibly create and open the file in one
atomic operation.  If it cannot perform this (e.g. the file type turned out to
be wrong) it may signal this by returning NULL instead of an open struct file
pointer.

i_op->atomic_open() is only called if the last component is negative or needs
lookup.  Handling cached positive dentries here doesn't add much value: these
can be opened using f_op->open().  If the cached file turns out to be invalid,
the open can be retried, this time using ->atomic_open() with a fresh dentry.

For now leave the old way of using open intents in lookup and revalidate in
place.  This will be removed once all the users are converted.

David Howells noticed that if ->atomic_open() opens the file but does not create
it, handle_truncate() will be called on it even if it is not a regular file.
Fix this by checking the file type in this case too.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:04 +04:00
Miklos Szeredi
54ef487241 vfs: lookup_open(): expand lookup_hash()
Copy __lookup_hash() into lookup_open().  The next patch will insert the atomic
open call just before the real lookup.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:02 +04:00
Miklos Szeredi
d58ffd35c1 vfs: add lookup_open()
Split out lookup + maybe create from do_last().  This is the part under i_mutex
protection.

The function is called lookup_open() and returns a filp even though the open
part is not used yet.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:01 +04:00
Miklos Szeredi
7157486541 vfs: do_last(): common slow lookup
Make the slow lookup part of O_CREAT and non-O_CREAT opens common.

This allows atomic_open to be hooked into the slow lookup part.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:00 +04:00
Miklos Szeredi
b6183df7b2 vfs: do_last(): separate O_CREAT specific code
Check O_CREAT on the slow lookup paths where necessary.  This allows the rest to
be shared with plain open.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:59 +04:00
Miklos Szeredi
37d7fffc9c vfs: do_last(): inline lookup_slow()
Copy lookup_slow() into do_last().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:58 +04:00
Al Viro
6d7b5aaed7 namei.c: let follow_link() do put_link() on failure
no need for kludgy "set cookie to ERR_PTR(...) because we failed
before we did actual ->follow_link() and want to suppress put_link()",
no pointless check in put_link() itself.

Callers checked if follow_link() has failed anyway; might as well
break out of their loops if that happened, without bothering
to call put_link() first.

[AV: folded fixes from hch]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:56 +04:00
Al Viro
1d674107ea coda: use list_for_each_entry
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:56 +04:00
Al Viro
b3d9b7a3c7 vfs: switch i_dentry/d_alias to hlist
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:55 +04:00
Al Viro
9f713878f2 ext4: get rid of open-coded d_find_any_alias()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:54 +04:00
Al Viro
a614a092bf ocfs2: use list_for_each_entry in ocfs2_find_local_alias()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:53 +04:00
Al Viro
12447c4039 affs: unobfuscate affs_fix_dcache()
and add a comment on what it's doing

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:52 +04:00
Al Viro
3084ee95f0 affs: get rid of open-coded list_for_each_entry()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:52 +04:00
Al Viro
7968ce12e9 adfs: don't bother with ->i_dentry in ->destroy_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:50 +04:00
Al Viro
e6f9f8d029 cifs: don't bother with ->i_dentry in ->destroy_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:49 +04:00
Al Viro
63a44583f3 qnx6: don't bother with ->i_dentry in inode-freeing callback
we'll initialize it in inode_init_always() when we allocate that
object again.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:48 +04:00
Al Viro
6ce6e24e72 get rid of magic in proc_namespace.c
don't rely on proc_mounts->m being the first field; container_of()
is there for purpose.  No need to bother with ->private, while
we are at it - the same container_of will do nicely.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:48 +04:00
Al Viro
f7a99c5b7c get rid of ->mnt_longterm
it's enough to set ->mnt_ns of internal vfsmounts to something
distinct from all struct mnt_namespace out there; then we can
just use the check for ->mnt_ns != NULL in the fast path of
mntput_no_expire()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:47 +04:00
Julia Lawall
d187663ef2 fs/direct-io.c: adjust suspicious bit operation
READ is 0, so the result of the bit-and operation is 0.  Rewrite with == as
done elsewhere in the same file.

This problem was found using Coccinelle (http://coccinelle.lip6.fr/).

Signed-off-by: Julia Lawall <julia@diku.dk>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:46 +04:00
Artem Bityutskiy
3dd847820d affs: get rid of affs_sync_super
This patch makes affs stop using the VFS '->write_super()' method along with
the 's_dirt' superblock flag, because they are on their way out.

The whole "superblock write-out" VFS infrastructure is served by the
'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and
writes out all dirty superblocks using the '->write_super()' call-back.  But the
problem with this thread is that it wastes power by waking up the system every
5 seconds, even if there are no diry superblocks, or there are no client
file-systems which would need this (e.g., btrfs does not use
'->write_super()'). So we want to kill it completely and thus, we need to make
file-systems to stop using the '->write_super()' VFS service, and then remove
it together with the kernel thread.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:45 +04:00
Artem Bityutskiy
a215fef7ed affs: introduce VFS superblock object back-reference
Add an 'sb' VFS superblock back-reference to the 'struct affs_sb_info' data
structure - we will need to find the VFS superblock from a 'struct
affs_sb_info' object in the next patch, so this change is jut a preparation.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:45 +04:00
Artem Bityutskiy
a837107439 affs: stop using lock_super
The VFS's 'lock_super()' and 'unlock_super()' calls are deprecated and unwanted
and just wait for a brave knight who'd kill them. This patch makes AFFS stop
using them and use the buffer-head's own lock instead.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:44 +04:00
Artem Bityutskiy
e0471c8d8a affs: re-structure superblock locking a bit
AFFS wants to serialize the superblock (the root block in AFFS terms) updates
and uses 'lock_super()/unlock_super()' for these purposes. This patch pushes the
locking down to the 'affs_commit_super()' from the callers.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:43 +04:00
Artem Bityutskiy
0164b1a32e affs: remove useless superblock writeout on remount
We do not need to write out the superblock from '->remount_fs()' because
VFS has already called '->sync_fs()' by this time and the superblock has
already been written out. Thus, remove the 'affs_write_super()'
infocation from 'affs_remount()'.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:42 +04:00
Artem Bityutskiy
c9753b1d20 affs: remove useless superblock writeout on unmount
We do not need to write out the superblock from '->put_super()' because VFS has
already called '->sync_fs()' by this time and the superblock has already been
written out. Thus, remove the 'affs_commit_super()' infocation from
'affs_put_super()'.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:42 +04:00
Artem Bityutskiy
bc86256d2e affs: stop setting bm_flags
AFFS stores values '1' and '2' in 'bm_flags', and I fail to see any logic when
it prefers one or another. AFFS writes '1' only from '->put_super()', while
'->sync_fs()' and '->write_super()' store value '2'.  So on the first glance,
it looks like we want to have '1' if we unmount.  However, this does not really
happen in these cases:
  1. superblock is written via 'write_super()' then we unmount;
  2. we re-mount R/O, then unmount.
which are quite typical.

I could not find good documentation describing this field, except of one random
piece of documentation in the internet which says that -1 means that the root
block is valid, which is not consistent with what we have in the Linux AFFS
driver.

Jan Kara commented on this: "I have some vague recollection that on Amiga
boolean was usually encoded as: 0 == false, ~0 == -1 == true. But it has been
ages..."

Thus, my conclusion is that value of '1' is as good as value of '2' and we can
just always use '2'. An Jan Kara suggested to go further: "generally bm_flags
handling looks strange. If they are 0, we mount fs read only and thus cannot
change them.  If they are != 0, we write 2 there. So IMHO if you just removed
bm_flags setting, nothing will really happen."

So this patch removes the bm_flags setting completely. This makes the "clean"
argument of the 'affs_commit_super()' function unneeded, so it is also removed.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:41 +04:00
Christoph Hellwig
1632dcc93f xfs: do not call xfs_bdstrat_cb in xfs_buf_iodone_callbacks
xfs_bdstrat_cb only adds a check for a shutdown filesystem over
xfs_buf_iorequest, but xfs_buf_iodone_callbacks just checked for a shut down
filesystem a little earlier.  In addition the shutdown handling in
xfs_bdstrat_cb is not very suitable for this caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-13 13:09:49 -05:00
Christoph Hellwig
40a9b7963d xfs: prevent recursion in xfs_buf_iorequest
If the b_iodone handler is run in calling context in xfs_buf_iorequest we
can run into a recursion where xfs_buf_iodone_callbacks keeps calling back
into xfs_buf_iorequest because an I/O error happened, which keeps calling
back into xfs_buf_iorequest.  This chain will usually not take long
because the filesystem gets shut down because of log I/O errors, but even
over a short time it can cause stack overflows if run on the same context.

As a short term workaround make sure we always call the iodone handler in
workqueue context.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-13 13:09:39 -05:00
Dave Chinner
aa292847b9 xfs: don't defer metadata allocation to the workqueue
Almost all metadata allocations come from shallow stack usage
situations. Avoid the overhead of switching the allocation to a
workqueue as we are not in danger of running out of stack when
making these allocations. Metadata allocations are already marked
through the args that are passed down, so this is trivial to do.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reported-by: Mel Gorman <mgorman@suse.de>
Tested-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-13 13:09:27 -05:00
Dave Chinner
e3a746f5aa xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near
The current cursor is reallocated when retrying the allocation, so
the existing cursor needs to be destroyed in both the restart and
the failure cases.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-13 13:09:18 -05:00
Linus Torvalds
4264e6a263 NFS client bugfixes for Linux 3.5
- Fix an NFSv4 mount regression
 - Fix O_DIRECT list manipulation snafus
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQADo8AAoJEGcL54qWCgDyu8wP/2entLv4jYgM7zp+/kOBhps6
 RpAv57YYv86pLk5gxoZg/JJs3z4QzS3CjMRXK11CEzRMt1NUqJfI6587TIl11DYW
 0UJcrAQo4X3EgNR7CCKAJaHUmYLayjUgquL8OjdBTuXCrpJzGdHcnBa71oF3xaEb
 jJhsA0W+qqBpa0295HH8sdHkURX1+46OdSd47+Dg+PZurkr0CRhjZo+DX+AkaSsU
 blM+pYUgu20NFaNUEfBphib3XMnnCGXc84g3gqjeOldRMijJGv1VcmhfsmJwmyLA
 1mImwZ2HDzySVLrbA/D7yeddZfXuTaf8krFmyP07U6I7hIEXCAEr16DmqCQCF1UB
 ppzZM9lu8f6df9tIlmjdA/hGOfs5OSQzD1L9Irn8xpRIWYmg+mrxXSzIG6wZE8zu
 n1EURW5CrLrlz2d7rdhqA+Y/GIkAvIO0/iBbJnvkMUdMPsd9/ikFPkHGzMhsK1KN
 AxEi2r+n0e8Hwaueu0EP685QjrEjOyLDdKIsAzNy/UEGN4v297UoW8ZyYnrzEIJG
 mQg6l4ke34zuw3AxoR3X4SuW329rGpG7x0pXNwqJG292T9jki334dPzJCy1LbaGl
 o1g+Z0BcyMe9VC7tRwOFEiGsbcd/OYPRbVVhu/3RlrWuVvj46QDvVpQ7+1pOr2s/
 4Xu70AlKw3B02ge621+p
 =bX2K
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - Fix an NFSv4 mount regression
 - Fix O_DIRECT list manipulation snafus

* tag 'nfs-for-3.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Fix an NFSv4 mount regression
  NFS: Fix list manipulation snafus in fs/nfs/direct.c
2012-07-13 10:58:45 -07:00
Dave Jones
8d657eb3b4 Remove easily user-triggerable BUG from generic_setlease
This can be trivially triggered from userspace by passing in something unexpected.

    kernel BUG at fs/locks.c:1468!
    invalid opcode: 0000 [#1] SMP
    RIP: 0010:generic_setlease+0xc2/0x100
    Call Trace:
      __vfs_setlease+0x35/0x40
      fcntl_setlease+0x76/0x150
      sys_fcntl+0x1c6/0x810
      system_call_fastpath+0x1a/0x1f

Signed-off-by: Dave Jones <davej@redhat.com>
Cc: stable@kernel.org # 3.2+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-13 10:50:23 -07:00
Jeff Moyer
91f68c89d8 block: fix infinite loop in __getblk_slow
Commit 080399aaaf ("block: don't mark buffers beyond end of disk as
mapped") exposed a bug in __getblk_slow that causes mount to hang as it
loops infinitely waiting for a buffer that lies beyond the end of the
disk to become uptodate.

The problem was initially reported by Torsten Hilbrich here:

    https://lkml.org/lkml/2012/6/18/54

and also reported independently here:

    http://www.sysresccd.org/forums/viewtopic.php?f=13&t=4511

and then Richard W.M.  Jones and Marcos Mello noted a few separate
bugzillas also associated with the same issue.  This patch has been
confirmed to fix:

    https://bugzilla.redhat.com/show_bug.cgi?id=835019

The main problem is here, in __getblk_slow:

        for (;;) {
                struct buffer_head * bh;
                int ret;

                bh = __find_get_block(bdev, block, size);
                if (bh)
                        return bh;

                ret = grow_buffers(bdev, block, size);
                if (ret < 0)
                        return NULL;
                if (ret == 0)
                        free_more_memory();
        }

__find_get_block does not find the block, since it will not be marked as
mapped, and so grow_buffers is called to fill in the buffers for the
associated page.  I believe the for (;;) loop is there primarily to
retry in the case of memory pressure keeping grow_buffers from
succeeding.  However, we also continue to loop for other cases, like the
block lying beond the end of the disk.  So, the fix I came up with is to
only loop when grow_buffers fails due to memory allocation issues
(return value of 0).

The attached patch was tested by myself, Torsten, and Rich, and was
found to resolve the problem in call cases.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reported-and-Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Josh Boyer <jwboyer@redhat.com>
Cc: Stable <stable@vger.kernel.org>  # 3.0+
[ Jens is on vacation, taking this directly  - Linus ]
--
Stable Notes: this patch requires backport to 3.0, 3.2 and 3.3.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-13 08:36:35 -07:00
Mathias Krause
0143fc5e9f udf: avoid info leak on export
For type 0x51 the udf.parent_partref member in struct fid gets copied
uninitialized to userland. Fix this by initializing it to 0.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-13 11:21:21 +02:00
Mathias Krause
fe685aabf7 isofs: avoid info leak on export
For type 1 the parent_offset member in struct isofs_fid gets copied
uninitialized to userland. Fix this by initializing it to 0.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-13 11:21:21 +02:00
Liu Bo
10983f2e8d Btrfs: fix typo in convert_extent_bit
It should be convert_extent_bit.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-07-12 11:27:34 +02:00
Steven J. Magnani
5d8ecbbc28 fat: fix non-atomic NFS i_pos read
fat_encode_fh() can fetch an invalid i_pos value on systems where 64-bit
accesses are not atomic.  Make it use the same accessor as the rest of the
FAT code.

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-11 16:04:47 -07:00
Bob Liu
fea9f718b3 fs: ramfs: file-nommu: add SetPageUptodate()
There is a bug in the below scenario for !CONFIG_MMU:

 1. create a new file
 2. mmap the file and write to it
 3. read the file can't get the correct value

Because

  sys_read() -> generic_file_aio_read() -> simple_readpage() -> clear_page()

which causes the page to be zeroed.

Add SetPageUptodate() to ramfs_nommu_expand_for_mapping() so that
generic_file_aio_read() do not call simple_readpage().

Signed-off-by: Bob Liu <lliubbo@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-11 16:04:47 -07:00
Luis Henriques
a4e08d001f ocfs2: fix NULL pointer dereference in __ocfs2_change_file_space()
As ocfs2_fallocate() will invoke __ocfs2_change_file_space() with a NULL
as the first parameter (file), it may trigger a NULL pointer dereferrence
due to a missing check.

Addresses http://bugs.launchpad.net/bugs/1006012

Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Reported-by: Bret Towe <magnade@gmail.com>
Tested-by: Bret Towe <magnade@gmail.com>
Cc: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-11 16:04:43 -07:00
Dong Aisheng
475d009429 of: Improve prom_update_property() function
prom_update_property() currently fails if the property doesn't
actually exist yet which isn't what we want. Change to add-or-update
instead of update-only, then we can remove a lot duplicated lines.

Suggested-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Dong Aisheng <dong.aisheng@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-07-11 15:26:51 +10:00
Trond Myklebust
f1daf666dd NFSv4: Fix an NFSv4 mount regression
The helper nfs_fs_mount() will always call nfs4_try_mount with the
mount_info->fill_super argument pointing to nfs_fill_super, which is
NFSv2/v3 only.
Fix is to have nfs4_try_mount replace it with nfs4_fill_super.

The regression was introduced by commit c40f8d1d (NFS: Create a common
fs_mount() function)

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-10 13:25:39 -04:00
Jan Kara
57b9655d01 udf: Improve table length check to avoid possible overflow
When a partition table length is corrupted to be close to 1 << 32, the
check for its length may overflow on 32-bit systems and we will think
the length is valid. Later on the kernel can crash trying to read beyond
end of buffer. Fix the check to avoid possible overflow.

CC: stable@vger.kernel.org
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-10 18:02:17 +02:00
Jan Kara
44f4f729e7 ext3: Check return value of blkdev_issue_flush()
blkdev_issue_flush() can fail. Make sure the error gets properly propagated.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 23:40:46 +02:00
Jan Kara
349ecd6a3c jbd: Check return value of blkdev_issue_flush()
blkdev_issue_flush() can fail. Make sure the error gets properly propagated.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 23:38:36 +02:00
Jan Kara
17dc59ba41 udf: Do not decrement i_blocks when freeing indirect extent block
Indirect extent block is not accounted in i_blocks during allocation
thus we should not decrement i_blocks when we are freeing such block
during truncation.

Reported-by: Steve Nickel <snickel58@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 13:24:21 +02:00
Jan Kara
bff943af6f udf: Fix memory leak when mounting
When we are mounting filesystem, we can load one partition table before
finding out that we cannot complete processing of logical volume descriptor
and trying the reserve descriptor. Free the table properly before trying
the reserve descriptor.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 12:03:12 +02:00
Wanlong Gao
e124a32043 ext2: cleanup the confused goto label
Cleanup the confused goto label, since the big lock has been removed.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 12:03:12 +02:00
Ashish Sangwan
a0e589b485 UDF: Remove unnecessary variable "offset" from udf_fill_inode
The variable "offset" is not needed. Remove it.

Signed-off-by: Ashish Sangwan <ashish.sangwan2@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 12:03:12 +02:00
Artem Bityutskiy
db8109ef98 udf: stop using s_dirt
The UDF file-system does not need the 's_dirt' superblock flag because it does
not define the 'write_super()' method. This flag was set to 1 in few places and
set to 0 in '->sync_fs()' and was basically useless. Stop using it because it
is on its way out.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 12:03:11 +02:00
Eric Sandeen
f007dbf8e5 ext3: force ro mount if ext3_setup_super() fails
If ext3_setup_super() fails i.e. due to a too-high revision,
the error is logged in dmesg but the fs is not mounted RO as
indicated.

Tested by:

[164152.114551] EXT3-fs (sdb6): error: revision level too high, forcing read-only mode
/dev/sdb6 /mnt/test2 ext3 rw,seclabel,relatime,errors=continue,user_xattr,acl,barrier=1,data=ordered 0 0
                          ^^

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 12:03:11 +02:00
Jeff Liu
f3da93105b quota: fix checkpatch.pl warning by replacing <asm/uaccess.h> with <linux/uaccess.h>
checkpatch.pl warns:

"WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>"

Below patch fixes it.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 12:03:11 +02:00
Trond Myklebust
4035c2487f NFS: Fix list manipulation snafus in fs/nfs/direct.c
Fix 2 bugs in nfs_direct_write_reschedule:

 - The request needs to be removed from the 'reqs' list before it can
   be added to 'failed'.
 - Fix an infinite loop if the 'failed' list is non-empty.

Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-08 10:32:08 -04:00
Linus Torvalds
332a2e1244 vfs: make O_PATH file descriptors usable for 'fchdir()'
We already use them for openat() and friends, but fchdir() also wants to
be able to use O_PATH file descriptors.  This should make it comparable
to the O_SEARCH of Solaris.  In particular, O_PATH allows you to access
(not-quite-open) a directory you don't have read persmission to, only
execute permission.

Noticed during development of multithread support for ksh93.

Reported-by: ольга крыжановская <olga.kryzhanovska@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org    # O_PATH introduced in 3.0+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-07 17:19:02 -07:00
Linus Torvalds
26c439d400 Fixes an incorrect access mode check when preparing to open a file in the lower
filesystem. This isn't an urgent fix, but it is simple and the check was
 obviously incorrect.
 
 Also fixes a couple important bugs in the eCryptfs miscdev interface. These
 changes are low risk due to the small number of users that use the miscdev
 interface. I was able to keep the changes minimal and I have some cleaner, more
 complete changes queued up for the next merge window that will build on these
 patches.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCgAGBQJP92LsAAoJENaSAD2qAscKCWkP/3BWpv0AS0fnrZPniXv/+vjf
 gdV4NcQhE/86VsQ7CtZS7jqfSVTzm+YTta9BTKj6jWZuGUZGcjXsZdyMpleBZukh
 TvRSW3HKCRtC8XNHzle3YUukD1o465nMEiCUQOYcWjAa3in7cZTiFU+3S2Unn5UF
 yh2Slfzjxkl2EUHEbcBiBayzaMH2gqwAvRR4sjM0P175m/jjDF6pDGT5vc0skvcP
 kLzFr/3Ia9BW1nU0yblTtSNcHzYV8GTJVEpj1NR7q59x2gVJubF6hBDtbZdaaGK0
 rYlKV+w9mRwzUCuVdb4zPCa9EGrbqH4gYvIWsCW+R0zoK57rfIRolQVYEglGE2TU
 K3HHL6UOsPASZCQqhi+K+tCmYtZaCfeMhDRgxyDOaxS4rQ6dy+XO6f9zM30qw1UB
 QHeVEQl7bM0IpByCcjVbuNJT4zTlW7xmsLm/pbGv60UBdZpqaUZptEBEpgUFjq30
 shgNLlHHWvelhf52gbff+ytCHf+IDVPT/Q2aGjhC2fgqWiQno44vR88gtMQz6b7g
 4yEL7t0TqBB9jCBu/ikTITGpRH5S149e3oYGm2P/+YYZUGlw0Gf9N6TBkctJFSg/
 /vk6aobMnjfxmeM80xOKey5Y1zDis660sgt1hX8NVAuo4hp7VQfWGhEZ8lYqzCzP
 aJci4ZXaDzwXx6UCC5w2
 =TEei
 -----END PGP SIGNATURE-----

Merge tag 'ecryptfs-3.5-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs

Pull eCryptfs fixes from Tyler Hicks:
 "Fixes an incorrect access mode check when preparing to open a file in
  the lower filesystem.  This isn't an urgent fix, but it is simple and
  the check was obviously incorrect.

  Also fixes a couple important bugs in the eCryptfs miscdev interface.
  These changes are low risk due to the small number of users that use
  the miscdev interface.  I was able to keep the changes minimal and I
  have some cleaner, more complete changes queued up for the next merge
  window that will build on these patches."

* tag 'ecryptfs-3.5-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  eCryptfs: Gracefully refuse miscdev file ops on inherited/passed files
  eCryptfs: Fix lockdep warning in miscdev operations
  eCryptfs: Properly check for O_RDONLY flag before doing privileged open
2012-07-06 15:32:18 -07:00
Tyler Hicks
8dc6780587 eCryptfs: Gracefully refuse miscdev file ops on inherited/passed files
File operations on /dev/ecryptfs would BUG() when the operations were
performed by processes other than the process that originally opened the
file. This could happen with open files inherited after fork() or file
descriptors passed through IPC mechanisms. Rather than calling BUG(), an
error code can be safely returned in most situations.

In ecryptfs_miscdev_release(), eCryptfs still needs to handle the
release even if the last file reference is being held by a process that
didn't originally open the file. ecryptfs_find_daemon_by_euid() will not
be successful, so a pointer to the daemon is stored in the file's
private_data. The private_data pointer is initialized when the miscdev
file is opened and only used when the file is released.

https://launchpad.net/bugs/994247

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
2012-07-06 15:51:12 -05:00
Linus Torvalds
1b7fa4c271 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
Pull ocfs2 fixes from Joel Becker.

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  aio: make kiocb->private NUll in init_sync_kiocb()
  ocfs2: Fix bogus error message from ocfs2_global_read_info
  ocfs2: for SEEK_DATA/SEEK_HOLE, return internal error unchanged if ocfs2_get_clusters_nocache() or ocfs2_inode_lock() call failed.
  ocfs2: use spinlock irqsave for downconvert lock.patch
  ocfs2: Misplaced parens in unlikley
  ocfs2: clear unaligned io flag when dio fails
2012-07-06 10:04:39 -07:00
Linus Torvalds
064ea1ae80 Merge git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French.

* git://git.samba.org/sfrench/cifs-2.6:
  cifs: when server doesn't set CAP_LARGE_READ_X, cap default rsize at MaxBufferSize
  cifs: fix parsing of password mount option
2012-07-06 10:02:12 -07:00
Linus Torvalds
5eecb9cc90 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "I held off on my rc5 pull because I hit an oops during log recovery
  after a crash.  I wanted to make sure it wasn't a regression because
  we have some logging fixes in here.

  It turns out that a commit during the merge window just made it much
  more likely to trigger directory logging instead of full commits,
  which exposed an old bug.

  The new backref walking code got some additional fixes.  This should
  be the final set of them.

  Josef fixed up a corner where our O_DIRECT writes and buffered reads
  could expose old file contents (not stale, just not the most recent).
  He and Liu Bo fixed crashes during tree log recover as well.

  Ilya fixed errors while we resume disk balancing operations on
  readonly mounts."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: run delayed directory updates during log replay
  Btrfs: hold a ref on the inode during writepages
  Btrfs: fix tree log remove space corner case
  Btrfs: fix wrong check during log recovery
  Btrfs: use _IOR for BTRFS_IOC_SUBVOL_GETFLAGS
  Btrfs: resume balance on rw (re)mounts properly
  Btrfs: restore restriper state on all mounts
  Btrfs: fix dio write vs buffered read race
  Btrfs: don't count I/O statistic read errors for missing devices
  Btrfs: resolve tree mod log locking issue in btrfs_next_leaf
  Btrfs: fix tree mod log rewind of ADD operations
  Btrfs: leave critical region in btrfs_find_all_roots as soon as possible
  Btrfs: always put insert_ptr modifications into the tree mod log
  Btrfs: fix tree mod log for root replacements at leaf level
  Btrfs: support root level changes in __resolve_indirect_ref
  Btrfs: avoid waiting for delayed refs when we must not
2012-07-05 13:06:25 -07:00
Greg Kroah-Hartman
6fbfd0592e Merge v3.5-rc5 into driver-core-next
This picks up the big printk fixes, and resolves a merge issue with:
	drivers/extcon/extcon_gpio.c

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-05 08:25:34 -07:00
Jan Kara
a4564ead76 ocfs2: Fix bogus error message from ocfs2_global_read_info
'status' variable in ocfs2_global_read_info() is always != 0 when leaving the
function because it happens to contain number of read bytes. Thus we always log
error message although everything is OK. Since all error cases properly call
mlog_errno() before jumping to out_err, there's no reason to call mlog_errno()
on exit at all. This is a fallout of c1e8d35e (conversion of mlog_exit()
calls).

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2012-07-03 23:27:17 -07:00
Jeff Liu
65622e647b ocfs2: for SEEK_DATA/SEEK_HOLE, return internal error unchanged if ocfs2_get_clusters_nocache() or ocfs2_inode_lock() call failed.
Hello,

Since ENXIO only means "offset beyond EOF" for SEEK_DATA/SEEK_HOLE,
Hence we should return the internal error unchanged if ocfs2_inode_lock() or
ocfs2_get_clusters_nocache() call failed rather than ENXIO.
Otherwise, it will confuse the user applications when they trying to understand the root cause.

Thanks Dave for pointing this out.

Thanks,
-Jeff

Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2012-07-03 23:27:16 -07:00
Srinivas Eeda
a75e9ccabd ocfs2: use spinlock irqsave for downconvert lock.patch
When ocfs2dc thread holds dc_task_lock spinlock and receives soft IRQ it
deadlock itself trying to get same spinlock in ocfs2_wake_downconvert_thread.
Below is the stack snippet.

The patch disables interrupts when acquiring dc_task_lock spinlock.

	ocfs2_wake_downconvert_thread
	ocfs2_rw_unlock
	ocfs2_dio_end_io
	dio_complete
	.....
	bio_endio
	req_bio_endio
	....
	scsi_io_completion
	blk_done_softirq
	__do_softirq
	do_softirq
	irq_exit
	do_IRQ
	ocfs2_downconvert_thread
	[kthread]

Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2012-07-03 23:27:15 -07:00
roel
16865b7c42 ocfs2: Misplaced parens in unlikley
Fix misplaced parentheses

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2012-07-03 23:27:13 -07:00
Junxiao Bi
3e5d3c35a6 ocfs2: clear unaligned io flag when dio fails
The unaligned io flag is set in the kiocb when an unaligned
dio is issued, it should be cleared even when the dio fails,
or it may affect the following io which are using the same
kiocb.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2012-07-03 23:26:50 -07:00
Tyler Hicks
60d65f1f07 eCryptfs: Fix lockdep warning in miscdev operations
Don't grab the daemon mutex while holding the message context mutex.
Addresses this lockdep warning:

 ecryptfsd/2141 is trying to acquire lock:
  (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}, at: [<ffffffffa029c213>] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs]

 but task is already holding lock:
  (&(*daemon)->mux){+.+...}, at: [<ffffffffa029c2ec>] ecryptfs_miscdev_read+0x21c/0x470 [ecryptfs]

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&(*daemon)->mux){+.+...}:
        [<ffffffff810a3b8d>] lock_acquire+0x9d/0x220
        [<ffffffff8151c6da>] __mutex_lock_common+0x5a/0x4b0
        [<ffffffff8151cc64>] mutex_lock_nested+0x44/0x50
        [<ffffffffa029c5d7>] ecryptfs_send_miscdev+0x97/0x120 [ecryptfs]
        [<ffffffffa029b744>] ecryptfs_send_message+0x134/0x1e0 [ecryptfs]
        [<ffffffffa029a24e>] ecryptfs_generate_key_packet_set+0x2fe/0xa80 [ecryptfs]
        [<ffffffffa02960f8>] ecryptfs_write_metadata+0x108/0x250 [ecryptfs]
        [<ffffffffa0290f80>] ecryptfs_create+0x130/0x250 [ecryptfs]
        [<ffffffff811963a4>] vfs_create+0xb4/0x120
        [<ffffffff81197865>] do_last+0x8c5/0xa10
        [<ffffffff811998f9>] path_openat+0xd9/0x460
        [<ffffffff81199da2>] do_filp_open+0x42/0xa0
        [<ffffffff81187998>] do_sys_open+0xf8/0x1d0
        [<ffffffff81187a91>] sys_open+0x21/0x30
        [<ffffffff81527d69>] system_call_fastpath+0x16/0x1b

 -> #0 (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}:
        [<ffffffff810a3418>] __lock_acquire+0x1bf8/0x1c50
        [<ffffffff810a3b8d>] lock_acquire+0x9d/0x220
        [<ffffffff8151c6da>] __mutex_lock_common+0x5a/0x4b0
        [<ffffffff8151cc64>] mutex_lock_nested+0x44/0x50
        [<ffffffffa029c213>] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs]
        [<ffffffff811887d3>] vfs_read+0xb3/0x180
        [<ffffffff811888ed>] sys_read+0x4d/0x90
        [<ffffffff81527d69>] system_call_fastpath+0x16/0x1b

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2012-07-03 16:34:10 -07:00
Tyler Hicks
9fe79d7600 eCryptfs: Properly check for O_RDONLY flag before doing privileged open
If the first attempt at opening the lower file read/write fails,
eCryptfs will retry using a privileged kthread. However, the privileged
retry should not happen if the lower file's inode is read-only because a
read/write open will still be unsuccessful.

The check for determining if the open should be retried was intended to
be based on the access mode of the lower file's open flags being
O_RDONLY, but the check was incorrectly performed. This would cause the
open to be retried by the privileged kthread, resulting in a second
failed open of the lower file. This patch corrects the check to
determine if the open request should be handled by the privileged
kthread.

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-07-03 16:34:09 -07:00
Linus Torvalds
a3da2c6913 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block bits from Jens Axboe:
 "As vacation is coming up, thought I'd better get rid of my pending
  changes in my for-linus branch for this iteration.  It contains:

   - Two patches for mtip32xx.  Killing a non-compliant sysfs interface
     and moving it to debugfs, where it belongs.

   - A few patches from Asias.  Two legit bug fixes, and one killing an
     interface that is no longer in use.

   - A patch from Jan, making the annoying partition ioctl warning a bit
     less annoying, by restricting it to !CAP_SYS_RAWIO only.

   - Three bug fixes for drbd from Lars Ellenberg.

   - A fix for an old regression for umem, it hasn't really worked since
     the plugging scheme was changed in 3.0.

   - A few fixes from Tejun.

   - A splice fix from Eric Dumazet, fixing an issue with pipe
     resizing."

* 'for-linus' of git://git.kernel.dk/linux-block:
  scsi: Silence unnecessary warnings about ioctl to partition
  block: Drop dead function blk_abort_queue()
  block: Mitigate lock unbalance caused by lock switching
  block: Avoid missed wakeup in request waitqueue
  umem: fix up unplugging
  splice: fix racy pipe->buffers uses
  drbd: fix null pointer dereference with on-congestion policy when diskless
  drbd: fix list corruption by failing but already aborted reads
  drbd: fix access of unallocated pages and kernel panic
  xen/blkfront: Add WARN to deal with misbehaving backends.
  blkcg: drop local variable @q from blkg_destroy()
  mtip32xx: Create debugfs entries for troubleshooting
  mtip32xx: Remove 'registers' and 'flags' from sysfs
  blkcg: fix blkg_alloc() failure path
  block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED
  block: fix return value on cfq_init() failure
  mtip32xx: Remove version.h header file inclusion
  xen/blkback: Copy id field when doing BLKIF_DISCARD.
2012-07-03 15:45:10 -07:00
Jeff Layton
ec01d738a1 cifs: when server doesn't set CAP_LARGE_READ_X, cap default rsize at MaxBufferSize
When the server doesn't advertise CAP_LARGE_READ_X, then MS-CIFS states
that you must cap the size of the read at the client's MaxBufferSize.
Unfortunately, testing with many older servers shows that they often
can't service a read larger than their own MaxBufferSize.

Since we can't assume what the server will do in this situation, we must
be conservative here for the default. When the server can't do large
reads, then assume that it can't satisfy any read larger than its
MaxBufferSize either.

Luckily almost all modern servers can do large reads, so this won't
affect them. This is really just for older win9x and OS/2 era servers.
Also, note that this patch just governs the default rsize. The admin can
always override this if he so chooses.

Cc: <stable@vger.kernel.org> # 3.2
Reported-by: David H. Durgee <dhdurgee@acm.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steven French <sfrench@w500smf.(none)>
2012-07-03 12:54:42 -05:00
Chris Mason
b6305567e7 Btrfs: run delayed directory updates during log replay
While we are resolving directory modifications in the
tree log, we are triggering delayed metadata updates to
the filesystem btrees.

This commit forces the delayed updates to run so the
replay code can find any modifications done.  It stops
us from crashing because the directory deleltion replay
expects items to be removed immediately from the tree.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
cc: stable@kernel.org
2012-07-02 15:39:19 -04:00
Josef Bacik
7fd1a3f73f Btrfs: hold a ref on the inode during writepages
We can race with unlink and not actually be able to do our igrab in
btrfs_add_ordered_extent.  This will result in all sorts of problems.
Instead of doing the complicated work to try and handle returning an error
properly from btrfs_add_ordered_extent, just hold a ref to the inode during
writepages.  If we cannot grab a ref we know we're freeing this inode anyway
and can just drop the dirty pages on the floor, because screw them we're
going to invalidate them anyway.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-02 15:39:18 -04:00
Josef Bacik
bdb7d303b3 Btrfs: fix tree log remove space corner case
The tree log stuff can have allocated space that we end up having split
across a bitmap and a real extent.  The free space code does not deal with
this, it assumes that if it finds an extent or bitmap entry that the entire
range must fall within the entry it finds.  This isn't necessarily the case,
so rework the remove function so it can handle this case properly.  This
fixed two panics the user hit, first in the case where the space was
initially in a bitmap and then in an extent entry, and then the reverse
case.  Thanks,

Reported-and-tested-by: Shaun Reich <sreich@kde.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-02 15:39:18 -04:00
Liu Bo
6bf02314d9 Btrfs: fix wrong check during log recovery
When we're evicting an inode during log recovery, we need to ensure that the inode
is not in orphan state any more, which means inode's run_time flags has _no_
BTRFS_INODE_HAS_ORPHAN_ITEM.  Thus, the BUG_ON was triggered because of a wrong
check for the flags.

Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-02 15:39:17 -04:00
Alexander Block
d3a94048c9 Btrfs: use _IOR for BTRFS_IOC_SUBVOL_GETFLAGS
We used the wrong ioctl macro for the getflags ioctl before.
As we don't have the set/getflags ioctls in the user space ioctl.h
at the moment, it's safe to fix it now.

Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
2012-07-02 15:39:17 -04:00
Ilya Dryomov
2b6ba629b5 Btrfs: resume balance on rw (re)mounts properly
This introduces btrfs_resume_balance_async(), which, given that
restriper state was recovered earlier by btrfs_recover_balance(),
resumes balance in btrfs-balance kthread.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-07-02 15:39:17 -04:00
Ilya Dryomov
68310a5e42 Btrfs: restore restriper state on all mounts
Fix a bug that triggered asserts in btrfs_balance() in both normal and
resume modes -- restriper state was not properly restored on read-only
mounts.  This factors out resuming code from btrfs_restore_balance(),
which is now also called earlier in the mount sequence to avoid the
problem of some early writes getting the old profile.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-07-02 15:39:16 -04:00
Josef Bacik
c3473e8300 Btrfs: fix dio write vs buffered read race
Miao pointed out there's a problem with mixing dio writes and buffered
reads.  If the read happens between us invalidating the page range and
actually locking the extent we can bring in pages into page cache.  Then
once the write finishes if somebody tries to read again it will just find
uptodate pages and we'll read stale data.  So we need to lock the extent and
check for uptodate bits in the range.  If there are uptodate bits we need to
unlock and invalidate again.  This will keep this race from happening since
we will hold the extent locked until we create the ordered extent, and then
teh read side always waits for ordered extents.  There was also a race in
how we updated i_size, previously we were relying on the generic DIO stuff
to adjust the i_size after the DIO had completed, but this happens outside
of the extent lock which means reads could come in and not see the updated
i_size.  So instead move this work into where we create the extents, and
then this way the update ordered i_size stuff works properly in the endio
handlers.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-07-02 15:36:23 -04:00
Stefan Behrens
597a60fade Btrfs: don't count I/O statistic read errors for missing devices
It is normal behaviour of the low level btrfs function btrfs_map_bio()
to complete a bio with -EIO if the device is missing, instead of just
preventing the bio creation in an earlier step.
This used to cause I/O statistic read error increments and annoying
printk_ratelimited messages. This commit fixes the issue.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Reported-by: Carey Underwood <cwillu@cwillu.com>
2012-07-02 15:36:23 -04:00
Linus Torvalds
221d3ebf3a Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF fixes from Jan Kara:
 "Make UDF more robust in presence of corrupted filesystem"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Fortify loading of sparing table
  udf: Avoid run away loop when partition table length is corrupted
  udf: Use 'ret' instead of abusing 'i' in udf_load_logicalvol()
2012-06-28 11:43:45 -07:00
Linus Torvalds
9a7c6b73c4 Fix the debugfs regression - we never enable it because incorrect
'IS_ENABLED()' macro usage: should be 'IS_ENABLED(CONFIG_DEBUG_FS)',
 but we had 'IS_ENABLED(DEBUG_FS)'. Also fix incorrect assertion.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJP7Hx+AAoJECmIfjd9wqK04FEP/3NC0qMlleuEcHv9AFwOJ1PB
 rn1PDjz6kuEjZ1/xhGFoboBOUj577qyzIP6IvG7MqSH66Yc8gBF+sPbKWWglx7wB
 i2y7/Fi4lHM3w59GfvnOhdI7McklhPyl3R183MZ3EBJk4V0LPi2rXsl7G5puLNgG
 XkJuOjXLXZPgyeMR+DlBsoaaxBMihnh/pdpUAyLER1cQdzQwCzba82tNrMgnCp7i
 TIFTPtn+LmEQyHcqXx5ub/FV6BiEUXIJbkKlp5Ajqyh/olSNtGdCHRrP5MXpU2kI
 DmtyMvp+3PBHxrUYQjXT6uerL9uXhIUyRv49qO0tS68fUg44JCFflmPkoV9qEvvl
 ADbOqklx1DWdVCiZdhXWe1GFhf6U+TOoUyeiIzGIy0fIIlycNl915F4LzxVUqQKm
 yoqouEvzqd1LIAsopakF2DDIKoK6ViWmHBkuN04B+u+iab4DC4aX3vgxq+Ie8mqA
 0QNIamovk/2MR2665XhbARu0yDSEmGZvD6dkuSAgXIxjw7tdvvlY7pkSouWTOSR0
 fbqPrhgbRON+mT4Fcrb5dMq+PAiOTw5kp7az90+U6i1oLm5TY8CaHt3UvmdspOM/
 UeHRhLR8o/RhfnvnexiOxIWtQGCX+CCuePe9oN/fQabj9dfvKI1iR+zoyheFLwiT
 HxaU6I7oVYQVkeifNJVV
 =iZTv
 -----END PGP SIGNATURE-----

Merge tag 'upstream-3.5-rc5' of git://git.infradead.org/linux-ubifs

Pull ubi/ubifs fixes from Artem Bityutskiy:
 "Fix the debugfs regression - we never enable it because incorrect
  'IS_ENABLED()' macro usage: should be 'IS_ENABLED(CONFIG_DEBUG_FS)',
  but we had 'IS_ENABLED(DEBUG_FS)'.  Also fix incorrect assertion."

* tag 'upstream-3.5-rc5' of git://git.infradead.org/linux-ubifs:
  UBI: correct usage of IS_ENABLED()
  UBIFS: correct usage of IS_ENABLED()
  UBIFS: fix assertion
2012-06-28 11:41:43 -07:00
Jan Kara
1df2ae31c7 udf: Fortify loading of sparing table
Add sanity checks when loading sparing table from disk to avoid accessing
unallocated memory or writing to it.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-06-28 19:31:09 +02:00
Jan Kara
adee11b208 udf: Avoid run away loop when partition table length is corrupted
Check provided length of partition table so that (possibly maliciously)
corrupted partition table cannot cause accessing data beyond current buffer.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-06-28 19:30:58 +02:00
Jan Kara
cb14d340ef udf: Use 'ret' instead of abusing 'i' in udf_load_logicalvol()
Signed-off-by: Jan Kara <jack@suse.cz>
2012-06-28 19:30:40 +02:00
Masatake YAMATO
44b8db1386 GFS2: Fixing double brelse'ing bh allocated in gfs2_meta_read when EIO occurs
This patch fixes buffer_head double free in following code path:

gfs2_block_map
=> gfs2_meta_inode_buffer
 => gfs2_meta_indirect_buffer
  => gfs2_meta_read
=> release_metapath

gfs2_block_map calls gfs2_meta_inode_buffer with &mp.mp_bh[0]
as an argument. mp.mp_bh are filled with zero at the beginning
of gfs2_block_map.

If gfs2_meta_inode_buffer returns non-zero value, gfs2_block_map
calls release_metapath to free buffers chained to mp.mp_bh.
release_metapath checks each slot of mp.mp_bh[i] and
free(with brelse) unless the slot is filled with NULL.

&mp.mp_bh[0] passed to gfs2_meta_inode_buffer is filled at
gfs2_meta_read. gfs2_meta_read is filled a buffer allocated with
gfs2_getbuf even if EIO occurs. When EIO occurs, the allocated buffer
is brelse'ed though the pointer(wrong poiner) points the brelse'ed is
passed back to caller via an argument bhp.

gfs2_meta_indirect_buffer, the caller also pass the wrong pointer
to its caller with EIO. Finally gfs2_block_map gets both EIO and
&mp.mp_bh[0] filled with the wrong pointer. release_metapath
calls brelse again on the wrong pointer.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-06-28 15:35:47 +01:00
Jan Schmidt
d42244a0d3 Btrfs: resolve tree mod log locking issue in btrfs_next_leaf
With the tree mod log, we may end up with two roots (the current root and a
rewinded version of it) both pointing to two leaves, l1 and l2, of which l2
had already been cow-ed in the current transaction. If we don't rewind any
tree blocks, we cannot have two roots both pointing to an already cowed tree
block.

Now there is btrfs_next_leaf, which has a leaf locked and wants a lock on
the next (right) leaf. And there is push_leaf_left, which has a (cowed!)
leaf locked and wants a lock on the previous (left) leaf.

In order to solve this dead lock situation, we use try_lock in
btrfs_next_leaf (only in case it's called with a tree mod log time_seq
paramter) and if we fail to get a lock on the next leaf, we give up our lock
on the current leaf and retry from the very beginning.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-27 16:34:40 +02:00
Jan Schmidt
19956c7e94 Btrfs: fix tree mod log rewind of ADD operations
When a MOD_LOG_KEY_ADD operation is rewinded, we remove the key from the
tree block. If its not the last key, removal involves a move operation.
This move operation was explicitly done before this commit.

However, at insertion time, there's a move operation before the actual
addition to make room for the new key, which is recorded in the tree mod
log as well. This means, we must drop the move operation when rewinding the
add operation, because the next operation we'll be rewinding will be the
corresponding MOD_LOG_MOVE_KEYS operation.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-27 16:34:40 +02:00
Jan Schmidt
155725c9c0 Btrfs: leave critical region in btrfs_find_all_roots as soon as possible
When delayed refs exist, btrfs_find_all_roots used to hold the delayed ref
mutex way longer than actually required. We ought to drop it immediately
after we're done collecting all the delayed refs.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-27 16:34:39 +02:00
Jan Schmidt
c3e0696523 Btrfs: always put insert_ptr modifications into the tree mod log
Several callers of insert_ptr set the tree_mod_log parameter to 0 to avoid
addition to the tree mod log. In fact, we need all of those operations. This
commit simply removes the additional parameter and makes addition to the
tree mod log unconditional.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-27 16:34:39 +02:00
Jan Schmidt
28da9fb446 Btrfs: fix tree mod log for root replacements at leaf level
For the tree mod log, we don't log any operations at leaf level. If the root
is at the leaf level (i.e. the tree consists only of the root), then
__tree_mod_log_oldest_root will find a ROOT_REPLACE operation in the log
(because we always log that one no matter which level), but no other
operations.

With this patch __tree_mod_log_oldest_root exits cleanly instead of
BUGging in this situation. get_old_root checks if its really a root at leaf
level in case we don't have any operations and WARNs if this assumption
breaks.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-27 16:34:38 +02:00
Jan Schmidt
9345457f4a Btrfs: support root level changes in __resolve_indirect_ref
With the tree mod log, we can have a tree that's two levels high, but
btrfs_search_old_slot may still return a path with the tree root at level
one instead. __resolve_indirect_ref must care for this and accept parents in
a lower level than expected.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-27 16:34:38 +02:00
Jan Schmidt
8ca78f3eda Btrfs: avoid waiting for delayed refs when we must not
We track two conditions to decide if we should sleep while waiting for more
delayed refs, the number of delayed refs (num_refs) and the first entry in
the list of blockers (first_seq).

When we suspect staleness, we save num_refs and do one more cycle. If
nothing changes, we then save first_seq for later comparison and do
wait_event. We ought to save first_seq the very same moment we're saving
num_refs. Otherwise we cannot be sure that nothing has changed and we might
start waiting when we shouldn't, which could lead to starvation.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-06-27 16:34:35 +02:00
Brian Norris
2d4cf5ae12 UBIFS: correct usage of IS_ENABLED()
Commit "818039c UBIFS: fix debugfs-less systems support" fixed one
regression but introduced a different regression - the debugfs is now always
compiled out. Root cause: IS_ENABLED() arguments should be used with the
CONFIG_* prefix.

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-06-27 14:22:15 +03:00
Greg Kroah-Hartman
bcc66c0b88 Merge 3.5-rc4 into staging-next
This picks up the staging changes made in 3.5-rc4 so that everyone can sync up
properly.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-25 09:31:00 -07:00
Linus Torvalds
002b758b6d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
 "There are a couple of fixes from Yan for bad pointer dereferences in
  the messenger code and when fiddling with page->private after page
  migration, a fix from Alex for a use-after-free in the osd client
  code, and a couple fixes for the message refcounting and shutdown
  ordering."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: flush msgr queue during mon_client shutdown
  rbd: Clear ceph_msg->bio_iter for retransmitted message
  libceph: use con get/put ops from osd_client
  libceph: osd_client: don't drop reply reference too early
  ceph: check PG_Private flag before accessing page->private
2012-06-22 17:47:08 -07:00
Linus Torvalds
369c4f542f Fixes for 3.5-rc
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJP435OAAoJENaLyazVq6ZOms0P/38KYwNpgGgoeO57ZNXtGXen
 C98aa0IwFkjNUFPIogJD4e6gcxfxI9d+626xFZpnkoIEXXpEco5xexjBSIRfg3d7
 rNB4HgyQvybJrmRimqyIonTq5DVhQNnlxfYLJKtpM8dhSodNF3YGWmXMcXRSvoZO
 D1gMXJxTCoZJ5HjVMFRUOfgKX8RYrM7zNmrzMnefUOkuyFN2Dll7ZxGil05GKnQn
 IYds1DvRnyge8o4b7tRUrI50YM3j/w1HgtEDuquQ6UWvOCdbivpPH/+JaP5yD6kQ
 H39UBmi+2orC5m8AbDGGaeNuWC844emsLHDkZ63YTrDlEPwo7XZZQ7q8PgKsfqWz
 wzOUj9K0VAmcRmPjLsgwktsZYr+tyNYW+fYz6NzlSTtBK8fTyTPiyTNEJ4fEA42O
 poRbwH1yoM0YLyII+I+dOb2gk7mOsJa7QMYUE+Art6QWapfUOvQEaIewIEoMyJL9
 5Tl4jAZzso0aHmz/72s7CgV7j3gUrOCKGklPYIe5pGt5504CUxypjF2E01cqV5hJ
 QNz3LuC8Koraj787DZqY/w7Kk+SNsyzBOidlgy7hgjJEmNrsXAtweyGXH1gKOF3G
 ZstBsrgCacgPi+1ixJNJBCDbnM0p6XUZhqFT76aV78CaMgKfslzbgS9qFeCEBS2h
 kMvFaTHC9obmAGikOD4f
 =QLde
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-Jun-21-2012' of git://oss.sgi.com/xfs/xfs

Pull XFS fixes from Ben Myers:
 - Fix stale data exposure with unwritten extents
 - Fix a warning in xfs_alloc_vextent with ODEBUG
 - Fix overallocation and alignment of pages for xfs_bufs
 - Fix a cursor leak
 - Fix a log hang
 - Fix a crash related to xfs_sync_worker
 - Rename xfs log structure from struct log to struct xlog so we can use
   crash dumps effectively

* tag 'for-linus-Jun-21-2012' of git://oss.sgi.com/xfs/xfs:
  xfs: rename log structure to xlog
  xfs: shutdown xfs_sync_worker before the log
  xfs: Fix overallocation in xfs_buf_allocate_memory()
  xfs: fix allocbt cursor leak in xfs_alloc_ag_vextent_near
  xfs: check for stale inode before acquiring iflock on push
  xfs: fix debug_object WARN at xfs_alloc_vextent()
  xfs: xfs_vm_writepage clear iomap_valid when !buffer_uptodate (REV2)
2012-06-22 11:07:55 -07:00
Linus Torvalds
636040b4ed NFS client bugfixes for Linux 3.5
Fixes include:
 - Fix a write hang due to an uninitalised variable when !defined(CONFIG_NFS_V4)
 - Address upcall races in the legacy NFSv4 idmapper
 - Remove an O_DIRECT refcounting issue
 - Fix a pNFS refcounting bug when the file layout metadata server is also
   acting as a data server
 - Fix a pNFS module loading race.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJP46PEAAoJEGcL54qWCgDyClwP/RlcSUAgTeFo3VFcedMdZqKN
 cdYzzyT2r0rzxtEOkdE1aFqukspMTx6cU83opHYJKYP66stkx98JW0+LVcsg8vtm
 SKjYZRAM/xsZVo+m8E3iQ9Z7K0kn1W/+OSwzJO7arIqo++fV8aiGn4+Gpgx9SrWS
 FU5iC7p1LThOZks3Nis0VLbLDpS058vRJgyfCzTS1NyjABHOOEYb6JhhkYeXLH7G
 vn0QRGXyq2sGxUYhR3BNWdRMn9XV5p5mOUoVjLxPBV84gm7wIjYKOGJHQRSn4Ew/
 CEYAvBksGpT7ifJflkzgg8acVSvuq7HacMpHj9O9SpT5aesvVuhcm3pxUN1YJo6m
 WNRj3kqgc6eCuIbiA+ENZuHIsLDzOFw3H4RhKCPj7C9HG88nFGIhrGJ9OIIut1AF
 X81L5aTox3UASZXuieZ0dAqVyTH7n288SSTzYaYy5O++4cW4hqZt7wQegr8Sk6b9
 8zrWXkLjTNGFZo3mAhlgZf5qV3UYt/yNCk9U/1JxvH+1tPTvfYpqavdXusMJ03rn
 7z4LQxwD93YhkiD8NNGDHoBoZesmE3E0ucug+Cb1wLeT0b0C9ChOYdAphQkXxkNl
 lJxN4TfoBCgwwQx88Z/UilNvIGffJwVZzRgX6y//WACPssCdM5S0Zlb8nGIGb98G
 J2uFwuqP0WNhMPSbg+Wj
 =uEPz
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.5-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - Fix a write hang due to an uninitalised variable when
   !defined(CONFIG_NFS_V4)
 - Address upcall races in the legacy NFSv4 idmapper
 - Remove an O_DIRECT refcounting issue
 - Fix a pNFS refcounting bug when the file layout metadata server is
   also acting as a data server
 - Fix a pNFS module loading race.

* tag 'nfs-for-3.5-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Force the legacy idmapper to be single threaded
  NFS: Initialise commit_info.rpc_out when !defined(CONFIG_NFS_V4)
  NFS: Fix a refcounting issue in O_DIRECT
  NFSv4.1: Fix a race in set_pnfs_layoutdriver
  NFSv4.1: Fix umount when filelayout DS is also the MDS
2012-06-21 16:05:43 -07:00
Linus Torvalds
8874e812fe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "This is a small pull with btrfs fixes.  The biggest of the bunch is
  another fix for the new backref walking code.

  We're still hammering out one btrfs dio vs buffered reads problem, but
  that one will have to wait for the next rc."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: delay iput with async extents
  Btrfs: add a missing spin_lock
  Btrfs: don't assume to be on the correct extent in add_all_parents
  Btrfs: introduce btrfs_next_old_item
2012-06-21 13:41:07 -07:00
Mark Tinguely
f7bdf03a99 xfs: rename log structure to xlog
Rename the XFS log structure to xlog to help crash distinquish it from the
other logs in Linux.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-06-21 14:21:11 -05:00
Ben Myers
8866fc6fa5 xfs: shutdown xfs_sync_worker before the log
Revert commit 1307bbd, which uses the s_umount semaphore to provide
exclusion between xfs_sync_worker and unmount, in favor of shutting down
the sync worker before freeing the log in xfs_log_unmount.  This is a
cleaner way of resolving the race between xfs_sync_worker and unmount
than using s_umount.

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2012-06-21 14:20:48 -05:00
Jan Kara
59c84ed0dd xfs: Fix overallocation in xfs_buf_allocate_memory()
Commit de1cbee which removed b_file_offset in favor of b_bn introduced a bug
causing xfs_buf_allocate_memory() to overestimate the number of necessary
pages. The problem is that xfs_buf_alloc() sets b_bn to -1 and thus effectively
every buffer is straddling a page boundary which causes
xfs_buf_allocate_memory() to allocate two pages and use vmalloc() for access
which is unnecessary.

Dave says xfs_buf_alloc() doesn't need to set b_bn to -1 anymore since the
buffer is inserted into the cache only after being fully initialized now.
So just make xfs_buf_alloc() fill in proper block number from the beginning.

CC: David Chinner <dchinner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-06-21 14:20:36 -05:00
Dave Chinner
76d095388b xfs: fix allocbt cursor leak in xfs_alloc_ag_vextent_near
When we fail to find an matching extent near the requested extent
specification during a left-right distance search in
xfs_alloc_ag_vextent_near, we fail to free the original cursor that
we used to look up the XFS_BTNUM_CNT tree and hence leak it.

Reported-by: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-06-21 14:20:20 -05:00
Brian Foster
9a3a5dab63 xfs: check for stale inode before acquiring iflock on push
An inode in the AIL can be flush locked and marked stale if
a cluster free transaction occurs at the right time. The
inode item is then marked as flushing, which causes xfsaild
to spin and leaves the filesystem stalled. This is
reproduced by running xfstests 273 in a loop for an
extended period of time.

Check for stale inodes before the flush lock. This marks
the inode as pinned, leads to a log flush and allows the
filesystem to proceed.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-06-21 14:20:06 -05:00
Josef Bacik
cb77fcd885 Btrfs: delay iput with async extents
There is some concern that these iput()'s could be the final iputs and could
induce lockups on people waiting on writeback.  This would happen in the
rare case that we don't create ordered extents because of an error, but it
is theoretically possible and we already have a mechanism to deal with this
so just make them delayed iputs to negate any worry.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-21 07:19:36 -04:00
Josef Bacik
e18fca7342 Btrfs: add a missing spin_lock
When fixing up the locking in the delayed ref destruction work I accidently
broke the locking myself ;(.  Add back a spin_lock that should be there and
we are now all set.  Thanks,
Btrfs: add a missing spin_lock

When fixing up the locking in the delayed ref destruction work I accidently
broke the locking myself ;(.  Add back a spin_lock that should be there and
we are now all set.  Thanks,

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-21 07:19:35 -04:00
Alexander Block
69bca40d41 Btrfs: don't assume to be on the correct extent in add_all_parents
add_all_parents did assume that path is already at a correct extent data
item, which may not be true in case of data extents that were partly
rewritten and splitted.

We need to check if we're on a matching extent for every item and only
for the ones after the first. The loop is changed to do this now.

This patch also fixes a bug introduced with commit 3b127fd8 "Btrfs:
remove obsolete btrfs_next_leaf call from __resolve_indirect_ref".
The removal of next_leaf did sometimes result in slot==nritems when
the above described case happens, and thus resulting in invalid values
(e.g. wanted_obejctid) in add_all_parents (leading to missed backrefs
or even crashes).

Signed-off-by: Alexander Block <ablock84@googlemail.com>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-21 07:19:34 -04:00
Alexander Block
1c8f52a5e9 Btrfs: introduce btrfs_next_old_item
We introduce btrfs_next_old_item that uses btrfs_next_old_leaf instead
of btrfs_next_leaf.

btrfs_next_item is also changed to simply call btrfs_next_old_item with
time_seq being 0.

Signed-off-by: Alexander Block <ablock84@googlemail.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-21 07:19:34 -04:00
Anton Vorontsov
1e6a9e5625 pstore/ram_core: Better ECC size checking
- Instead of exploiting unsigned overflows (which doesn't work for all
  sizes), use straightforward checking for ECC total size not exceeding
  initial buffer size;

- Printing overflowed buffer_size is not informative. Instead, print
  ecc_size and buffer_size;

- No need for buffer_size argument in persistent_ram_init_ecc(),
  we can address prz->buffer_size directly.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-20 16:15:22 -07:00
Anton Vorontsov
beeb94321a pstore/ram_core: Proper checking for post_init errors (e.g. improper ECC size)
We will implement variable-sized ECC buffers soon, so post_init routine
might fail much more likely, so we'd better check for its errors.

To make error handling simple, modify persistent_ram_free() to it be safe
at all times.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-20 16:15:22 -07:00
Anton Vorontsov
90b58d9690 pstore/ram: Fix error handling during przs allocation
persistent_ram_new() returns ERR_PTR() value on errors, so during
freeing of the przs we should check for both NULL and IS_ERR() entries,
otherwise bad things will happen.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-20 16:15:22 -07:00
Anton Vorontsov
924d37118f pstore/ram: Probe as early as possible
Registering the platform driver before module_init allows us to log oopses
that happen during device probing.

This requires changing module_init to postcore_initcall, and switching
from platform_driver_probe to platform_driver_register because the
platform device is not registered when the platform driver is registered;
and because we use driver_register, now can't use create_bundle() (since
it will try to register the same driver once again), so we have to switch
to platform_device_register_data().

Also, some __init -> __devinit changes were needed.

Overall, the registration logic is now much clearer, since we have only
one driver registration point, and just an optional dummy device, which
is created from the module parameters.

Suggested-by: Colin Cross <ccross@android.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-20 16:15:22 -07:00
Linus Torvalds
bc259adc9b staging tree fixes for 3.5-rc4
Here are a number of small fixes for the drivers/staging tree, as well as iio
 and pstore drivers (which came from the staging tree in the 3.5-rc1 merge).
 All of these are tiny, but resolve issues that people have been reporting.
 
 There's also a documentation update to reflect what the iio drivers really are
 doing, which is good to get straightened out.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iEYEABECAAYFAk/iNeAACgkQMUfUDdst+ynNVwCdHCj6smC2JUbvN34gACNrpsYY
 WggAoJzQn9mQhwq0pa/ZTpaUOvCFZ39L
 =hDkC
 -----END PGP SIGNATURE-----

Merge tag 'staging-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging tree fixes from Greg Kroah-Hartman:
 "Here are a number of small fixes for the drivers/staging tree, as well
  as iio and pstore drivers (which came from the staging tree in the
  3.5-rc1 merge).  All of these are tiny, but resolve issues that people
  have been reporting.

  There's also a documentation update to reflect what the iio drivers
  really are doing, which is good to get straightened out.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

* tag 'staging-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: r8712u: Add new USB IDs
  staging: gdm72xx: Release netlink socket properly
  iio: drop wrong reference from Kconfig
  pstore/inode: Make pstore_fill_super() static
  pstore/ram: Should zap persistent zone on unlink
  pstore/ram_core: Factor persistent_ram_zap() out of post_init()
  pstore/ram_core: Do not reset restored zone's position and size
  pstore/ram: Should update old dmesg buffer before reading
  staging:iio:ad7298: Fix linker error due to missing IIO kfifo buffer
  Revert "staging: usbip: bugfix for stack corruption on 64-bit architectures"
  staging: usbip: bugfix for stack corruption on 64-bit architectures
  staging/comedi: fix build for USB not enabled
  staging: omapdrm: fix crash when freeing bad fb
  staging:iio:ad7606: Re-add missing scale attribute
  iio: Fix potential use after free
  staging:iio: remove num_interrupt_lines from documentation
  iio: documentation: Add out_altvoltage and friends
2012-06-20 15:15:03 -07:00
Linus Torvalds
fe80352460 Driver core and printk fixes for 3.5-rc4
Here are some fixes for 3.5-rc4 that resolve the kmsg problems that
 people have reported showing up after the printk and kmsg changes went
 into 3.5-rc1.  There are also a smattering of other tiny fixes for the
 extcon and hyper-v drivers that people have reported.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iEYEABECAAYFAk/iNQcACgkQMUfUDdst+yklTQCfZCXFlhA43bZo/8Joqd2pLIIW
 2uoAoMze0SlfJeN6Qu7yY0P+qV/f/pc3
 =UNFY
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core and printk fixes from Greg Kroah-Hartman:
 "Here are some fixes for 3.5-rc4 that resolve the kmsg problems that
  people have reported showing up after the printk and kmsg changes went
  into 3.5-rc1.  There are also a smattering of other tiny fixes for the
  extcon and hyper-v drivers that people have reported.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

* tag 'driver-core-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  extcon: max8997: Add missing kfree for info->edev in max8997_muic_remove()
  extcon: Set platform drvdata in gpio_extcon_probe() and fix irq leak
  extcon: Fix wrong index in max8997_extcon_cable[]
  kmsg - kmsg_dump() fix CONFIG_PRINTK=n compilation
  printk: return -EINVAL if the message len is bigger than the buf size
  printk: use mutex lock to stop syslog_seq from going wild
  kmsg - kmsg_dump() use iterator to receive log buffer content
  vme: change maintainer e-mail address
  Extcon: Don't try to create duplicate link names
  driver core: fixup reversed deferred probe order
  printk: Fix alignment of buf causing crash on ARM EABI
  Tools: hv: verify origin of netlink connector message
2012-06-20 15:14:28 -07:00
Linus Torvalds
a2a2609c97 Merge branch 'akpm' (Andrew's patch-bomb)
* emailed from Andrew Morton <akpm@linux-foundation.org>: (21 patches)
  mm/memblock: fix overlapping allocation when doubling reserved array
  c/r: prctl: Move PR_GET_TID_ADDRESS to a proper place
  pidns: find_new_reaper() can no longer switch to init_pid_ns.child_reaper
  pidns: guarantee that the pidns init will be the last pidns process reaped
  fault-inject: avoid call to random32() if fault injection is disabled
  Viresh has moved
  get_maintainer: Fix --help warning
  mm/memory.c: fix kernel-doc warnings
  mm: fix kernel-doc warnings
  mm: correctly synchronize rss-counters at exit/exec
  mm, thp: print useful information when mmap_sem is unlocked in zap_pmd_range
  h8300: use the declarations provided by <asm/sections.h>
  h8300: fix use of extinct _sbss and _ebss
  xtensa: use the declarations provided by <asm/sections.h>
  xtensa: use "test -e" instead of bashism "test -a"
  xtensa: replace xtensa-specific _f{data,text} by _s{data,text}
  memcg: fix use_hierarchy css_is_ancestor oops regression
  mm, oom: fix and cleanup oom score calculations
  nilfs2: ensure proper cache clearing for gc-inodes
  thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE
  ...
2012-06-20 14:41:57 -07:00
Konstantin Khlebnikov
4fe7efdbdf mm: correctly synchronize rss-counters at exit/exec
do_exit() and exec_mmap() call sync_mm_rss() before mm_release() does
put_user(clear_child_tid) which can update task->rss_stat and thus make
mm->rss_stat inconsistent.  This triggers the "BUG:" printk in check_mm().

Let's fix this bug in the safest way, and optimize/cleanup this later.

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-20 14:39:36 -07:00
Ryusuke Konishi
fbb24a3a91 nilfs2: ensure proper cache clearing for gc-inodes
A gc-inode is a pseudo inode used to buffer the blocks to be moved by
garbage collection.

Block caches of gc-inodes must be cleared every time a garbage collection
function (nilfs_clean_segments) completes.  Otherwise, stale blocks
buffered in the caches may be wrongly reused in successive calls of the GC
function.

For user files, this is not a problem because their gc-inodes are
distinguished by a checkpoint number as well as an inode number.  They
never buffer different blocks if either an inode number, a checkpoint
number, or a block offset differs.

However, gc-inodes of sufile, cpfile and DAT file can store different data
for the same block offset.  Thus, the nilfs_clean_segments function can
move incorrect block for these meta-data files if an old block is cached.
I found this is really causing meta-data corruption in nilfs.

This fixes the issue by ensuring cache clear of gc-inodes and resolves
reported GC problems including checkpoint file corruption, b-tree
corruption, and the following warning during GC.

  nilfs_palloc_freev: entry number 307234 already freed.
  ...

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>	[2.6.37+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-20 14:39:35 -07:00