Commit Graph

25912 Commits

Author SHA1 Message Date
KAMEZAWA Hiroyuki
43d2b11324 tracepoint: add tracepoints for debugging oom_score_adj
oom_score_adj is used for guarding processes from OOM-Killer.  One of
problem is that it's inherited at fork().  When a daemon set oom_score_adj
and make children, it's hard to know where the value is set.

This patch adds some tracepoints useful for debugging. This patch adds
3 trace points.
  - creating new task
  - renaming a task (exec)
  - set oom_score_adj

To debug, users need to enable some trace pointer. Maybe filtering is useful as

# EVENT=/sys/kernel/debug/tracing/events/task/
# echo "oom_score_adj != 0" > $EVENT/task_newtask/filter
# echo "oom_score_adj != 0" > $EVENT/task_rename/filter
# echo 1 > $EVENT/enable
# EVENT=/sys/kernel/debug/tracing/events/oom/
# echo 1 > $EVENT/enable

output will be like this.
# grep oom /sys/kernel/debug/tracing/trace
bash-7699  [007] d..3  5140.744510: oom_score_adj_update: pid=7699 comm=bash oom_score_adj=-1000
bash-7699  [007] ...1  5151.818022: task_newtask: pid=7729 comm=bash clone_flags=1200011 oom_score_adj=-1000
ls-7729  [003] ...2  5151.818504: task_rename: pid=7729 oldcomm=bash newcomm=ls oom_score_adj=-1000
bash-7699  [002] ...1  5175.701468: task_newtask: pid=7730 comm=bash clone_flags=1200011 oom_score_adj=-1000
grep-7730  [007] ...2  5175.701993: task_rename: pid=7730 oldcomm=bash newcomm=grep oom_score_adj=-1000

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10 16:30:44 -08:00
Johannes Weiner
e3a41a5ba9 btrfs: pass __GFP_WRITE for buffered write page allocations
Tell the page allocator that pages allocated for a buffered write are
expected to become dirty soon.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10 16:30:44 -08:00
Konstantin Khlebnikov
5f8aefd44e mm: account reaped page cache on inode cache pruning
Inode cache pruning indirectly reclaims page-cache by invalidating mapping
pages.  Let's account them into reclaim-state to notice this progress in
memory reclaimer.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10 16:30:42 -08:00
Linus Torvalds
54c2c5761f Ext4 commits for 3.3 merge window
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABCAAGBQJPDIkfAAoJENNvdpvBGATwWhwQAJ/rVsDAVSH7rbELW7jjdRRw
 9vsDRbuCkBKsAi+r1nsfpjGnYTux1BnVU8BMGO8DnL6e43r9ihnm5o78amM9m8aL
 EDvCqogIY3hzi31gsuIs43DBPs5LwjPPQemgbeE+m/Zbpno/Ju+fDHL286Uvitj3
 X22cC691/Ce7vXhnTPS2+p2fZtT17kvwmkxkVyTs3E3gXcKKtwI3WUrwyF1iwCXF
 +1WlKt1/YLdUs+OXPNNBNvLN5nF3b4UYOU7duQXDiY6ubro666VFIbRrIWVMRDFx
 o4bySo3RBie4AAOJq3IALPDFdRhnkk/qxpBkWvgwqDnMiusogJP6yWh9RdR8OB07
 8nrkobxFwkkDnOFU4KaVjVmPtLmeq4ksiQfDFPnYRg/SQ2+BsdoxV8jpnZvugzKV
 XN+ABWqNucBuM/HoeinJF/UgT+nxpTr2QvH5ueNdvS3Kb9YsbrgJ/fX1941DWzMv
 r9P//ZUOueSGhqUB0IopvXSrjAbRMF1SjuAP9oLEJZkJ/+/dCPMohJTyusZ8qAaY
 MPb1QrQQej5dOb2jqe7BUe0eNKPT9eaZxCW3+hF5QMtlcLlJbHhDL5ijyyWJvTAZ
 GZh5RwdwANxOEfl0atXZcniBvxD2FA6hi2P/d4nBXGa3EkZRk5bDsCj12cPGPoGa
 irKRs7QRBiZGLYi0+W2a
 =3oE5
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Ext4 commits for 3.3 merge window

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (32 commits)
  ext4: fix undefined behavior in ext4_fill_flex_info()
  ext4: make more symbols static
  ext4: make local symbol ext4_initxattrs static
  jbd2: fix hung processes in jbd2_journal_lock_updates()
  ext4: reserve new feature flag codepoints
  ext4: Report max_batch_time option correctly
  ext4: add missing ext4_resize_end on error paths
  ext4: let ext4_group_add() use common code
  ext4: let ext4_group_extend() use common code
  ext4: add new online resize interface
  ext4: add a new function which adds a flex group to a fs
  ext4: add a new function which allocates bitmaps and inode tables
  ext4: pass verify_reserved_gdb() the number of group decriptors
  ext4: add a function which updates the super block during online resizing
  ext4: add a function which sets up a block group descriptors of a flex bg
  ext4: add a function which sets up group blocks of a flex bg
  ext4: add a structure which will be used by 64bit-resize interface
  ext4: add a function which adds a new group descriptors to a fs
  ext4: add a function which extends a group without checking parameters
  ext4: use proper little-endian bitops
  ...
2012-01-10 15:51:48 -08:00
Linus Torvalds
609eac1c15 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  fs/9p: iattr_valid flags are kernel internal flags map them to 9p values.
  fs/9p: We should not allocate a new inode when creating hardlines.
  fs/9p: v9fs_stat2inode should update suid/sgid bits.
  9p: Reduce object size with CONFIG_NET_9P_DEBUG
  fs/9p: check schedule_timeout_interruptible return value

Fix up trivial conflicts in fs/9p/{vfs_inode.c,vfs_inode_dotl.c} due to
debug messages having changed to use p9_debug() on one hand, and the
changes for umode_t on the other.
2012-01-10 15:09:01 -08:00
Linus Torvalds
57eccf1c2a Merge branch 'nfs-for-3.3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
* 'nfs-for-3.3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Change the default setting of the nfs4_disable_idmapping parameter
  NFSv4: Save the owner/group name string when doing open
  NFS: Remove pNFS bloat from the generic write path
  pnfs-obj: Must return layout on IO error
  pnfs-obj: pNFS errors are communicated on iodata->pnfs_error
  NFS: Cache state owners after files are closed
  NFS: Clean up nfs4_find_state_owners_locked()
  NFSv4: include bitmap in nfsv4 get acl data
  nfs: fix a minor do_div portability issue
  NFSv4.1: cleanup comment and debug printk
  NFSv4.1: change nfs4_free_slot parameters for dynamic slots
  NFSv4.1: cleanup init and reset of session slot tables
  NFSv4.1: fix backchannel slotid off-by-one bug
  nfs: fix regression in handling of context= option in NFSv4
  NFS - fix recent breakage to NFS error handling.
  NFS: Retry mounting NFSROOT
  SUNRPC: Clean up the RPCSEC_GSS service ticket requests
2012-01-10 14:57:40 -08:00
Linus Torvalds
5c395ae703 Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6:
  UBI: fix use-after-free on error path
  UBI: fix missing scrub when there is a bit-flip
  UBIFS: Use kmemdup rather than duplicating its implementation
2012-01-10 14:57:19 -08:00
Linus Torvalds
49d41bae46 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  dlm: add recovery callbacks
  dlm: add node slots and generation
  dlm: move recovery barrier calls
  dlm: convert rsb list to rb_tree
2012-01-10 14:55:55 -08:00
Al Viro
b3f2a92447 hfsplus: creation of hidden dir on mount can fail
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-10 17:48:52 -05:00
Linus Torvalds
7b3480f8b7 MTD pull for 3.3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iEYEABECAAYFAk8Mq/MACgkQdwG7hYl686PeFACfZCgbdDWD9A/JL+i1RMfExVu6
 Pi0An3Hmc3PTCp0yQ21KtcKhpF9CAMEu
 =NfuL
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-3.3' of git://git.infradead.org/mtd-2.6

MTD pull for 3.3

* tag 'for-linus-3.3' of git://git.infradead.org/mtd-2.6: (113 commits)
  mtd: Fix dependency for MTD_DOC200x
  mtd: do not use mtd->block_markbad directly
  logfs: do not use 'mtd->block_isbad' directly
  mtd: introduce mtd_can_have_bb helper
  mtd: do not use mtd->suspend and mtd->resume directly
  mtd: do not use mtd->lock, unlock and is_locked directly
  mtd: do not use mtd->sync directly
  mtd: harmonize mtd_writev usage
  mtd: do not use mtd->lock_user_prot_reg directly
  mtd: mtd->write_user_prot_reg directly
  mtd: do not use mtd->read_*_prot_reg directly
  mtd: do not use mtd->get_*_prot_info directly
  mtd: do not use mtd->read_oob directly
  mtd: mtdoops: do not use mtd->panic_write directly
  romfs: do not use mtd->get_unmapped_area directly
  mtd: do not use mtd->get_unmapped_area directly
  mtd: do use mtd->point directly
  mtd: introduce mtd_has_oob helper
  mtd: mtdcore: export symbols cleanup
  mtd: clean-up the default_mtd_writev function
  ...

Fix up trivial edit/remove conflict in drivers/staging/spectra/lld_mtd.c
2012-01-10 13:45:22 -08:00
Sergey Senozhatsky
ace8577aeb block_dev: Suppress bdev_cache_init() kmemleak warninig
Kmemleak reports the following warning in bdev_cache_init()
[    0.003738] kmemleak: Object 0xffff880153035200 (size 256):
[    0.003823] kmemleak:   comm "swapper/0", pid 0, jiffies 4294667299
[    0.003909] kmemleak:   min_count = 1
[    0.003988] kmemleak:   count = 0
[    0.004066] kmemleak:   flags = 0x1
[    0.004144] kmemleak:   checksum = 0
[    0.004224] kmemleak:   backtrace:
[    0.004303]      [<ffffffff814755ac>] kmemleak_alloc+0x21/0x3e
[    0.004446]      [<ffffffff811100ba>] kmem_cache_alloc+0xca/0x1dc
[    0.004592]      [<ffffffff811371b1>] alloc_vfsmnt+0x1f/0x198
[    0.004736]      [<ffffffff811375c5>] vfs_kern_mount+0x36/0xd2
[    0.004879]      [<ffffffff8113929a>] kern_mount_data+0x18/0x32
[    0.005025]      [<ffffffff81ab9075>] bdev_cache_init+0x51/0x81
[    0.005169]      [<ffffffff81ab8abf>] vfs_caches_init+0x101/0x10d
[    0.005313]      [<ffffffff81a9bae3>] start_kernel+0x344/0x383
[    0.005456]      [<ffffffff81a9b2a7>] x86_64_start_reservations+0xae/0xb2
[    0.005602]      [<ffffffff81a9b3ad>] x86_64_start_kernel+0x102/0x111
[    0.005747]      [<ffffffffffffffff>] 0xffffffffffffffff
[    0.008653] kmemleak: Trying to color unknown object at 0xffff880153035220 as Grey
[    0.008754] Pid: 0, comm: swapper/0 Not tainted 3.3.0-rc0-dbg-04200-g8180888-dirty #888
[    0.008856] Call Trace:
[    0.008934]  [<ffffffff81118704>] ? find_and_get_object+0x44/0x118
[    0.009023]  [<ffffffff81118fe6>] paint_ptr+0x57/0x8f
[    0.009109]  [<ffffffff81475935>] kmemleak_not_leak+0x23/0x42
[    0.009195]  [<ffffffff81ab9096>] bdev_cache_init+0x72/0x81
[    0.009282]  [<ffffffff81ab8abf>] vfs_caches_init+0x101/0x10d
[    0.009368]  [<ffffffff81a9bae3>] start_kernel+0x344/0x383
[    0.009466]  [<ffffffff81a9b2a7>] x86_64_start_reservations+0xae/0xb2
[    0.009555]  [<ffffffff81a9b140>] ? early_idt_handlers+0x140/0x140
[    0.009643]  [<ffffffff81a9b3ad>] x86_64_start_kernel+0x102/0x111

due to attempt to mark pointer to `struct vfsmount' as a gray object, which
is embedded into `struct mount' returned from alloc_vfsmnt().

Make `bd_mnt' static, avoiding need to tell kmemleak to mark it gray, as
suggested by Al Viro.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-10 13:08:55 -05:00
Miklos Szeredi
eaf5f90735 fix shrink_dcache_parent() livelock
Two (or more) concurrent calls of shrink_dcache_parent() on the same dentry may
cause shrink_dcache_parent() to loop forever.

Here's what appears to happen:

1 - CPU0: select_parent(P) finds C and puts it on dispose list, returns 1

2 - CPU1: select_parent(P) locks P->d_lock

3 - CPU0: shrink_dentry_list() locks C->d_lock
   dentry_kill(C) tries to lock P->d_lock but fails, unlocks C->d_lock

4 - CPU1: select_parent(P) locks C->d_lock,
         moves C from dispose list being processed on CPU0 to the new
dispose list, returns 1

5 - CPU0: shrink_dentry_list() finds dispose list empty, returns

6 - Goto 2 with CPU0 and CPU1 switched

Basically select_parent() steals the dentry from shrink_dentry_list() and thinks
it found a new one, causing shrink_dentry_list() to think it's making progress
and loop over and over.

One way to trigger this is to make udev calls stat() on the sysfs file while it
is going away.

Having a file in /lib/udev/rules.d/ with only this one rule seems to the trick:

ATTR{vendor}=="0x8086", ATTR{device}=="0x10ca", ENV{PCI_SLOT_NAME}="%k", ENV{MATCHADDR}="$attr{address}", RUN+="/bin/true"

Then execute the following loop:

while true; do
        echo -bond0 > /sys/class/net/bonding_masters
        echo +bond0 > /sys/class/net/bonding_masters
        echo -bond1 > /sys/class/net/bonding_masters
        echo +bond1 > /sys/class/net/bonding_masters
done

One fix would be to check all callers and prevent concurrent calls to
shrink_dcache_parent().  But I think a better solution is to stop the
stealing behavior.

This patch adds a new dentry flag that is set when the dentry is added to the
dispose list.  The flag is cleared in dentry_lru_del() in case the dentry gets a
new reference just before being pruned.

If the dentry has this flag, select_parent() will skip it and let
shrink_dentry_list() retry pruning it.  With select_parent() skipping those
dentries there will not be the appearance of progress (new dentries found) when
there is none, hence shrink_dcache_parent() will not loop forever.

Set the flag is also set in prune_dcache_sb() for consistency as suggested by
Linus.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-10 13:06:32 -05:00
Sage Weil
2ff179e650 ceph: avoid iput() while holding spinlock in ceph_dir_fsync
ceph_mdsc_put_request() can call iput(), which can sleep.  Don't do that.

Fixes: #1812
Signed-off-by: Sage Weil <sage@newdream.net>
2012-01-10 08:57:02 -08:00
Sage Weil
ee6b1baf67 ceph: avoid useless dget/dput in encode_fh
Nothing we do here sleeps, so just do it under d_lock and avoid the dget/
dput entirely.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-01-10 08:57:00 -08:00
Yehuda Sadeh
b8cd952b51 ceph: dereference pointer after checking for NULL
moved dereference after BUG_ON

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2012-01-10 08:56:59 -08:00
Sage Weil
3d8eb7a94e ceph: remove unnecessary d_fsdata conditional checks
We now set d_fsdata unconditionally on all dentries prior to setting up
the d_ops, so all of these checks are unnecessary.

Signed-off-by: Sage Weil <sage@newdream.net>
2012-01-10 08:56:56 -08:00
Theodore Ts'o
ff9cb1c4ee Merge branch 'for_linus' into for_linus_merged
Conflicts:
	fs/ext4/ioctl.c
2012-01-10 11:54:07 -05:00
Xi Wang
d50f2ab6f0 ext4: fix undefined behavior in ext4_fill_flex_info()
Commit 503358ae01 ("ext4: avoid divide by
zero when trying to mount a corrupted file system") fixes CVE-2009-4307
by performing a sanity check on s_log_groups_per_flex, since it can be
set to a bogus value by an attacker.

	sbi->s_log_groups_per_flex = sbi->s_es->s_log_groups_per_flex;
	groups_per_flex = 1 << sbi->s_log_groups_per_flex;

	if (groups_per_flex < 2) { ... }

This patch fixes two potential issues in the previous commit.

1) The sanity check might only work on architectures like PowerPC.
On x86, 5 bits are used for the shifting amount.  That means, given a
large s_log_groups_per_flex value like 36, groups_per_flex = 1 << 36
is essentially 1 << 4 = 16, rather than 0.  This will bypass the check,
leaving s_log_groups_per_flex and groups_per_flex inconsistent.

2) The sanity check relies on undefined behavior, i.e., oversized shift.
A standard-confirming C compiler could rewrite the check in unexpected
ways.  Consider the following equivalent form, assuming groups_per_flex
is unsigned for simplicity.

	groups_per_flex = 1 << sbi->s_log_groups_per_flex;
	if (groups_per_flex == 0 || groups_per_flex == 1) {

We compile the code snippet using Clang 3.0 and GCC 4.6.  Clang will
completely optimize away the check groups_per_flex == 0, leaving the
patched code as vulnerable as the original.  GCC keeps the check, but
there is no guarantee that future versions will do the same.

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2012-01-10 11:51:10 -05:00
Al Viro
f4947fbce2 coda: switch coda_cnode_make() to sane API as well, clean coda_lookup()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-10 11:13:16 -05:00
Al Viro
0b2c4e39c0 coda: deal correctly with allocation failure from coda_cnode_makectl()
lookup should fail with ENOMEM, not silently make dentry negative.
Switched to saner calling conventions, while we are at it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-10 11:13:13 -05:00
Linus Torvalds
e4e11180df Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: new helper - d_make_root()
  dcache: use a dispose list in select_parent
  ceph: d_alloc_root() may fail
  ext4: fix failure exits
  isofs: inode leak on mount failure
2012-01-09 17:37:37 -08:00
Al Viro
adc0e91ab1 vfs: new helper - d_make_root()
d_alloc_root() with iput() in case of allocation failure...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-09 19:23:45 -05:00
Dave Chinner
b48f03b319 dcache: use a dispose list in select_parent
select_parent currently abuses the dentry cache LRU to provide
cleanup features for child dentries that need to be freed. It moves
them to the tail of the LRU, then tells shrink_dcache_parent() to
calls __shrink_dcache_sb to unconditionally move them to a dispose
list (as DCACHE_REFERENCED is ignored). __shrink_dcache_sb() has to
relock the dentries to move them off the LRU onto the dispose list,
but otherwise does not touch the dentries that select_parent() moved
to the tail of the LRU. It then passses the dispose list to
shrink_dentry_list() which tries to free the dentries.

IOWs, the use of __shrink_dcache_sb() is superfluous - we can build
exactly the same list of dentries for disposal directly in
select_parent() and call shrink_dentry_list() instead of calling
__shrink_dcache_sb() to do that. This means that we avoid long holds
on the lru lock walking the LRU moving dentries to the dispose list
We also avoid the need to relock each dentry just to move it off the
LRU, reducing the numebr of times we lock each dentry to dispose of
them in shrink_dcache_parent() from 3 to 2 times.

Further, we remove one of the two callers of __shrink_dcache_sb().
This also means that __shrink_dcache_sb can be moved into back into
prune_dcache_sb() and we no longer have to handle referenced
dentries conditionally, simplifying the code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-09 19:22:52 -05:00
Al Viro
3c5184ef12 ceph: d_alloc_root() may fail
... and ceph_init_dentry(NULL) will oops

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-09 16:36:12 -05:00
Al Viro
94bf608a18 ext4: fix failure exits
a) leaking root dentry is bad
b) in case of failed ext4_mb_init() we don't want to do ext4_mb_release()
c) OTOH, in the same case we *do* want ext4_ext_release()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-09 15:57:20 -05:00
Linus Torvalds
ac69e09280 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2/3/4: delete unneeded includes of module.h
  ext{3,4}: Fix potential race when setversion ioctl updates inode
  udf: Mark LVID buffer as uptodate before marking it dirty
  ext3: Don't warn from writepage when readonly inode is spotted after error
  jbd: Remove j_barrier mutex
  reiserfs: Force inode evictions before umount to avoid crash
  reiserfs: Fix quota mount option parsing
  udf: Treat symlink component of type 2 as /
  udf: Fix deadlock when converting file from in-ICB one to normal one
  udf: Cleanup calling convention of inode_getblk()
  ext2: Fix error handling on inode bitmap corruption
  ext3: Fix error handling on inode bitmap corruption
  ext3: replace ll_rw_block with other functions
  ext3: NULL dereference in ext3_evict_inode()
  jbd: clear revoked flag on buffers before a new transaction started
  ext3: call ext3_mark_recovery_complete() when recovery is really needed
2012-01-09 12:51:21 -08:00
Linus Torvalds
9e203936ea Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  ore: Must support none-PAGE-aligned IO
  ore: fix BUG_ON, too few sgs when reading
  ore: Fix crash in case of an IO error.
  ore: FIX breakage when MISC_FILESYSTEMS is not set
2012-01-09 12:51:01 -08:00
Linus Torvalds
993ecff81a Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: fix endian conversion issue in discard code
2012-01-09 12:50:15 -08:00
Linus Torvalds
55b81e6f27 Merge branch 'usb-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
* 'usb-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (232 commits)
  USB: Add USB-ID for Multiplex RC serial adapter to cp210x.c
  xhci: Clean up 32-bit build warnings.
  USB: update documentation for usbmon
  usb: usb-storage doesn't support dynamic id currently, the patch disables the feature to fix an oops
  drivers/usb/class/cdc-acm.c: clear dangling pointer
  drivers/usb/dwc3/dwc3-pci.c: introduce missing kfree
  drivers/usb/host/isp1760-if.c: introduce missing kfree
  usb: option: add ZD Incorporated HSPA modem
  usb: ch9: fix up MaxStreams helper
  USB: usb-skeleton.c: cleanup open_count
  USB: usb-skeleton.c: fix open/disconnect race
  xhci: Properly handle COMP_2ND_BW_ERR
  USB: remove dead code from suspend/resume path
  USB: add quirk for another camera
  drivers: usb: wusbcore: Fix dependency for USB_WUSB
  xhci: Better debugging for critical host errors.
  xhci: Be less verbose during URB cancellation.
  xhci: Remove debugging about ring structure allocation.
  xhci: Remove debugging about toggling cycle bits.
  xhci: Remove debugging for individual transfers.
  ...
2012-01-09 12:09:47 -08:00
Linus Torvalds
21a2cb565a Merge branch 'char-misc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
* 'char-misc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  isl29020: Remove a redundant semi-colon from return statement
  BMP085: Remove redundant semi-colon from return statement
  drivers:misc: ti-st: DEBUG uart, baud rate mods
  drivers:misc: ti-st: flush UART upon fw failure
  drivers:misc: ti-st: protect registrations
  char_dev.c: fix up some whitespace errors
  s390: tape_class.h: remove kobj_map.h inclusion
  misc: ad525x_dpot: Add support for SPI module device table matching
2012-01-09 12:08:59 -08:00
Trond Myklebust
074b1d12fe NFSv4: Change the default setting of the nfs4_disable_idmapping parameter
Now that the use of numeric uids/gids is officially sanctioned in
RFC3530bis, it is time to change the default here to 'enabled'.

By doing so, we ensure that NFSv4 copies the behaviour of NFSv3 when we're
using the default AUTH_SYS authentication (i.e. when the client uses the
numeric uids/gids as authentication tokens), so that when new files are
created, they will appear to have the correct user/group.
It also fixes a number of backward compatibility issues when migrating
from NFSv3 to NFSv4 on a platform where the server uses different uid/gid
mappings than the client.

Note also that this setting has been successfully tested against servers
that do not support numeric uids/gids at several Connectathon/Bakeathon
events at this point, and the fall back to using string names/groups has
been shown to work well in all those test cases.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-09 14:22:27 -05:00
Artem Bityutskiy
800ffd3496 mtd: do not use mtd->block_markbad directly
Instead, use the new 'mtd_can_have_bb()', or just rely on 'mtd_block_markbad()'
return code, which will be -EOPNOTSUPP if bad blocks are not supported.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:26:26 +00:00
Artem Bityutskiy
d58b27ed58 logfs: do not use 'mtd->block_isbad' directly
Instead, use the new 'mtd_can_have_bb()' helper.

Cc: Jörn Engel <joern@logfs.org>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:26:25 +00:00
Artem Bityutskiy
327cf2922b mtd: do not use mtd->sync directly
This patch teaches 'mtd_sync()' to do nothing when the MTD driver does
not have the '->sync()' method, which allows us to remove all direct
'mtd->sync' accesses.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:26:21 +00:00
Artem Bityutskiy
1dbebd3256 mtd: harmonize mtd_writev usage
This patch makes the 'mtd_writev()' function more usable and logical. We first
teach it to fall-back to the 'default_mtd_writev()' function if the MTD driver
does not define its own '->writev()' method. Then we make block2mtd and JFFS2
just 'mtd_writev()' instead of 'default_mtd_writev()' function. This means we
can now stop exporting 'default_mtd_writev()' and instead, export
'mtd_writev()'. This is much cleaner and more logical, as well as allows us to
get read of another direct 'mtd->writev' access in JFFS2.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:26:19 +00:00
Artem Bityutskiy
4991e7251e romfs: do not use mtd->get_unmapped_area directly
Remove direct usage of mtd->get_unmapped_area. Instead, just call
'mtd_get_unmapped_area()' which will return -EOPNOTSUPP if the function
is not implemented, and then test for this code.

We also translate -EOPNOTSUPP to -ENOSYS because this return code is
probably part of the kernel ABI which we do not want to break.

Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:26:12 +00:00
Artem Bityutskiy
10934478e4 mtd: do use mtd->point directly
Remove direct usage of the "mtd->point" function pointer. Instead,
test the mtd_point() return code for '-EOPNOTSUPP'.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:26:09 +00:00
Artem Bityutskiy
4ccf2f1349 jffs: remove custom mtd_fake_writev function
Instead, use 'default_mtd_writev()' function which MTD provides.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:26:04 +00:00
Artem Bityutskiy
5942ddbc50 mtd: introduce mtd_block_markbad interface
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:25:48 +00:00
Artem Bityutskiy
7086c19d07 mtd: introduce mtd_block_isbad interface
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:25:47 +00:00
Artem Bityutskiy
85f2f2a809 mtd: introduce mtd_sync interface
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:25:35 +00:00
Artem Bityutskiy
b0a31f7b2a mtd: introduce mtd_writev interface
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:25:34 +00:00
Artem Bityutskiy
a2cc5ba075 mtd: introduce mtd_write_oob interface
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:25:24 +00:00
Artem Bityutskiy
fd2819bbc9 mtd: introduce mtd_read_oob interface
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:25:23 +00:00
Artem Bityutskiy
eda95cbf75 mtd: introduce mtd_write interface
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:25:20 +00:00
Artem Bityutskiy
329ad399a9 mtd: introduce mtd_read interface
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:25:19 +00:00
Artem Bityutskiy
04c601bfa4 mtd: introduce mtd_get_unmapped_area interface
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:25:18 +00:00
Artem Bityutskiy
7219778ad9 mtd: introduce mtd_unpoint interface
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:25:17 +00:00
Artem Bityutskiy
d35ea200c0 mtd: introduce mtd_point interface
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:25:15 +00:00
Artem Bityutskiy
7e1f0dc055 mtd: introduce mtd_erase interface
This patch is part of a patch-set which changes the MTD interface
from 'mtd->func()' form to 'mtd_func()' form. We need this because
we want to add common code to to all drivers in the mtd core level,
which is impossible with the current interface when MTD clients
call driver functions like 'read()' or 'write()' directly.

At this point we just introduce a new inline wrapper function, but
later some of them are expected to gain more code. E.g., the input
parameters check should be moved to the wrappers rather than be
duplicated at many drivers.

This particular patch introduced the 'mtd_erase()' interface. The
following patches add all the other interfaces one by one.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:25:11 +00:00
Artem Bityutskiy
48d3610268 logfs: rename functions starting with mtd_
We are going to re-work the MTD interface and change 'mtd->write()' to
'mtd_write()', 'mtd->read()' to 'mtd_read()' and so forth for all functions
in the 'struct mtd_info' structure.

However, logfs has its own 'mtd_read()', 'mtd_write()', etc functions
which collide with our changes. This patch renames these logfs functions
to 'logfs_mtd_read()', 'logfs_mtd_write()', etc.

Additionally, to make the 'fs/logfs/dev_mtd.c' file look consistent, rename
similarly all the other functions starting with 'mtd_'.

Cc: Jörn Engel <joern@logfs.org>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:24:54 +00:00
Eric Sandeen
5346671020 jffs2: fix up error handling for insert_inode_locked
after 250df6ed27
(fs: protect inode->i_state with inode->i_lock), insert_inode_locked()
no longer returns the inode with I_NEW set on failure.  However,
the error handler still calls unlock_new_inode() on failure,
which does a WARN_ON if I_NEW is not set, so any failure spews
a lot of warnings.

We can just drop the unlock_new_inode() if insert_inode_locked()
fails here.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-01-09 18:18:03 +00:00
Al Viro
8fdd8c49fe isofs: inode leak on mount failure
d_alloc_root() failure leaves root inode leaked...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-09 10:48:11 -05:00
Paul Gortmaker
302bf2f325 ext2/3/4: delete unneeded includes of module.h
Delete any instances of include module.h that were not strictly
required.  In the case of ext2, the declaration of MODULE_LICENSE
etc. were in inode.c but the module_init/exit were in super.c, so
relocate the MODULE_LICENCE/AUTHOR block to super.c which makes it
consistent with ext3 and ext4 at the same time.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:10 +01:00
Djalal Harouni
6c2155b9cc ext{3,4}: Fix potential race when setversion ioctl updates inode
The EXT{3,4}_IOC_SETVERSION ioctl() updates i_ctime and i_generation
without i_mutex. This can lead to a race with the other operations that
update i_ctime. This is not a big issue but let's make the ioctl consistent
with how we handle e.g. other timestamp updates and use i_mutex to protect
inode changes.

Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:10 +01:00
Jan Kara
853a0c25ba udf: Mark LVID buffer as uptodate before marking it dirty
When we hit EIO while writing LVID, the buffer uptodate bit is cleared.
This then results in an anoying warning from mark_buffer_dirty() when we
write the buffer again. So just set uptodate flag unconditionally.

Reviewed-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:10 +01:00
Jan Kara
33c104d415 ext3: Don't warn from writepage when readonly inode is spotted after error
WARN_ON_ONCE(IS_RDONLY(inode)) tends to trip when filesystem hits error and is
remounted read-only. This unnecessarily scares users (well, they should be
scared because of filesystem error, but the stack trace distracts them from the
right source of their fear ;-). We could as well just remove the WARN_ON but
it's not hard to fix it to not trip on filesystem with errors and not use more
cycles in the common case so that's what we do.

CC: stable@kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:09 +01:00
Jan Kara
0048278552 jbd: Remove j_barrier mutex
j_barrier mutex is used for serializing different journal lock operations.  The
problem with it is that e.g. FIFREEZE ioctl results in process leaving kernel
with j_barrier mutex held which makes lockdep freak out. Also hibernation code
wants to freeze filesystem but it cannot do so because it then cannot hibernate
the system because of mutex being locked.

So we remove j_barrier mutex and use direct wait on j_barrier_count instead.
Since locking journal is a rare operation we don't have to care about fairness
or such things.

CC: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:09 +01:00
Jeff Mahoney
a9e36da655 reiserfs: Force inode evictions before umount to avoid crash
This patch fixes a crash in reiserfs_delete_xattrs during umount.

When shrink_dcache_for_umount clears the dcache from
generic_shutdown_super, delayed evictions are forced to disk. If an
evicted inode has extended attributes associated with it, it will
need to walk the xattr tree to locate and remove them.

But since shrink_dcache_for_umount will BUG if it encounters active
dentries, the xattr tree must be released before it's called or it will
crash during every umount.

This patch forces the evictions to occur before generic_shutdown_super
by calling shrink_dcache_sb first. The additional evictions caused
by the removal of each associated xattr file and dir will be automatically
handled as they're added to the LRU list.

CC: reiserfs-devel@vger.kernel.org
CC: stable@kernel.org
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:09 +01:00
Jan Kara
a06d789b42 reiserfs: Fix quota mount option parsing
When jqfmt mount option is not specified on remount, we mistakenly clear
s_jquota_fmt value stored in superblock. Fix the problem.

CC: stable@kernel.org
CC: reiserfs-devel@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:09 +01:00
Jan Kara
fef2e9f330 udf: Treat symlink component of type 2 as /
Currently, we ignore symlink component of type 2. But mkisofs and other OS'
seem to treat it as / so do the same for compatibility.

Reported-by: "Gábor S." <otnaccess@hotmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:08 +01:00
Jan Kara
d2eb8c3593 udf: Fix deadlock when converting file from in-ICB one to normal one
During BKL removal in 2.6.38, conversion of files from in-ICB format to normal
format got broken. We call ->writepage with i_data_sem held but udf_get_block()
also acquires i_data_sem thus creating A-A deadlock.

We fix the problem by dropping i_data_sem before calling ->writepage() which is
safe since i_mutex still protects us against any changes in the file. Also fix
pagelock - i_data_sem lock inversion in udf_expand_file_adinicb() by dropping
i_data_sem before calling find_or_create_page().

CC: stable@kernel.org
Reported-by: Matthias Matiak <netzpython@mail-on.us>
Tested-by: Matthias Matiak <netzpython@mail-on.us>
Reviewed-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:08 +01:00
Jan Kara
7b0b0933a3 udf: Cleanup calling convention of inode_getblk()
inode_getblk() always returned NULL and passed results in its parameters.
Make the function return something useful - found block number.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:08 +01:00
Jan Kara
ef6919c283 ext2: Fix error handling on inode bitmap corruption
When insert_inode_locked() fails in ext2_new_inode() it most likely means inode
bitmap got corrupted and we allocated again inode which is already in use. Also
doing unlock_new_inode() during error recovery is wrong since the inode does
not have I_NEW set. Fix the problem by informing about filesystem error and
jumping to fail: (instead of fail_drop:) which doesn't call unlock_new_inode().

Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:07 +01:00
Jan Kara
1415dd8705 ext3: Fix error handling on inode bitmap corruption
When insert_inode_locked() fails in ext3_new_inode() it most likely
means inode bitmap got corrupted and we allocated again inode which
is already in use. Also doing unlock_new_inode() during error recovery
is wrong since inode does not have I_NEW set. Fix the problem by jumping
to fail: (instead of fail_drop:) which declares filesystem error and
does not call unlock_new_inode().

Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:07 +01:00
Zheng Liu
d03e1292c4 ext3: replace ll_rw_block with other functions
ll_rw_block() is deprecated. Thus we replace it with other functions.

CC: Jan Kara <jack@suse.cz>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:07 +01:00
Al Viro
0ce8c0109f ext[34]: avoid i_nlink warnings triggered by drop_nlink/inc_nlink kludge in symlink()
Both ext3 and ext4 put the half-created symlink inode into the orphan list
for a while (see the comment in ext[34]_symlink() for gory details).  Then,
if everything went fine, they pull it out of the orphan list and bump the
link count back to 1.  The thing is, inc_nlink() is going to complain about
seeing somebody changing i_nlink from 0 to 1.  With a good reason, since
normally something like that is a bug.  Explicit set_nlink(inode, 1) does
the same thing as inc_nlink() here, but it does *not* complain - exactly
because it should be usable in strange situations like this one.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08 20:19:30 -05:00
Al Viro
da01636a65 exofs: oops after late failure in mount
We have already set ->s_root, so ->put_super() is going to be called.
Freeing ->s_fs_info is a bloody bad idea when it's going to be
dereferenced very shortly...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08 20:19:12 -05:00
Al Viro
3850aba748 devpts: fix double-free on mount failure
devpts_kill_sb() is called even if devpts_fill_super() fails;
we should not do that kfree() in the latter, especially not
with ->s_fs_info left pointing to freed object.  Double kfree()
is a Bad Thing(tm)...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08 20:19:03 -05:00
Al Viro
f84a8bd60e btrfs: take allocation of ->tree_root into open_ctree()
now that we don't need it for sget() anymore...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08 19:37:02 -05:00
Al Viro
815745cf3e btrfs: let ->s_fs_info point to fs_info, not root...
the latter can be obtained from the former (by looking as ->tree_root)
just as cheaply as we currently are doing the other way round.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08 19:35:37 -05:00
Al Viro
59553edf11 btrfs: consolidate failure exits in btrfs_mount() a bit
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08 19:34:41 -05:00
Al Viro
d22ca7de77 btrfs: make free_fs_info() call ->kill_sb() unconditional
... and don't bother with it after btrfs_fill_super() failure -
->kill_sb() (unlike ->put_super()) will be called even if we
have not got non-NULL ->s_root.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08 19:34:41 -05:00
Al Viro
be7e0950de btrfs: merge free_fs_info() calls on fill_super failures
... all the way up into btrfs_mount().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08 19:34:40 -05:00
Al Viro
29db78aa0a btrfs: kill pointless reassignment of ->s_fs_info in btrfs_fill_super()
We do not (fortunately) modify ->s_fs_info of superblock on the fly in
btrfs_fill_super(); apparent assignment is a no-op.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08 19:34:40 -05:00
Al Viro
ad2b2c802b btrfs: make open_ctree() return int
It returns either ERR_PTR(-ve) or sb->s_fs_info.  The latter can
be found by caller just as well, TYVM, no need to return it.  Just
return -ve or 0...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08 19:34:39 -05:00
Al Viro
e3029d9fd4 btrfs: sanitizing ->fs_info, part 5
close_ctree() uses a weird mix of accesses to root->fs_info and
its value at the beginning of function stored in local variable.
Since ->fs_info *never* changes, let's just use the local variable
to avoid confusion.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08 19:34:38 -05:00
Al Viro
6f07e42ee6 btrfs: sanitizing ->fs_info, part 4
A new helper: btrfs_alloc_root(fs_info); allocates btrfs_root
and sets ->fs_info.  All places allocating the suckers converted
to it.  At that point we *never* reassign ->fs_info of btrfs_root;
it's set before anyone sees the address of newly allocated
struct btrfs_root and never assigned anywhere else.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08 19:34:38 -05:00
Al Viro
38a77db49a btrfs: sanitizing ->fs_info, part 3
move assignments to ->fs_info in open_ctree() up, to the place
just after the original allocations.  Assignment for tree_root
becomes a no-op - we'd obtained fs_info from tree_root->fs_info
in the first place.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08 19:34:37 -05:00
Al Viro
1233f546ec btrfs: sanitizing ->fs_info, part 2
lift assignment to callers of find_and_setup_root()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08 19:34:37 -05:00
Al Viro
2eb34cd368 btrfs: sanitizing ->fs_info, part 1
take assignment of ->fs_info to callers of __setup_root()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08 19:34:36 -05:00
Al Viro
10f6327b5d btrfs: fix a deadlock in btrfs_scan_one_device()
pathname resolution under a global mutex, taken on some paths in ->mount()
is a Bad Idea(tm) - think what happens if said pathname resolution triggers
automount of some btrfs instance and walks into attempt to grab the same
mutex.  Deadlock - we are waiting for daemon to finish walking the path,
daemon is waiting for us to release the mutex...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08 19:33:24 -05:00
Al Viro
6de1d09d96 btrfs: fix mount/umount race
Do *NOT* skip doomed superblocks in btrfs_test_super(); we want
sget() to wait for their shutdown to complete.  Since we don't
mutilate ->s_fs_info in ->put_super() anymore (or free what it
used to point to until the superblock is past being findable
by sget()), we can just DTRT there and report a match.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08 19:33:23 -05:00
Al Viro
aea52e19dd btrfs: get ->kill_sb() of its own
... and move free_fs_info() to that, out of ->put_super().  Do NOT
set ->s_fs_info to NULL in the latter; we need it for sget() to
be able to see and wait for fs in the middle of umount if we get a
mount/umount race.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08 19:33:23 -05:00
Al Viro
98c7089c76 btrfs: preparation to fixing mount/umount race
We need fs_info and root to live until the moment when the victim
superblock leaves the list, so we need to postpone free_fs_info()
until after ->put_super().  The call is buried in close_ctree(),
though, so we need to lift it into the callers (including
btrfs_put_super()) first.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-08 19:33:22 -05:00
Linus Torvalds
98793265b4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits)
  Kconfig: acpi: Fix typo in comment.
  misc latin1 to utf8 conversions
  devres: Fix a typo in devm_kfree comment
  btrfs: free-space-cache.c: remove extra semicolon.
  fat: Spelling s/obsolate/obsolete/g
  SCSI, pmcraid: Fix spelling error in a pmcraid_err() call
  tools/power turbostat: update fields in manpage
  mac80211: drop spelling fix
  types.h: fix comment spelling for 'architectures'
  typo fixes: aera -> area, exntension -> extension
  devices.txt: Fix typo of 'VMware'.
  sis900: Fix enum typo 'sis900_rx_bufer_status'
  decompress_bunzip2: remove invalid vi modeline
  treewide: Fix comment and string typo 'bufer'
  hyper-v: Update MAINTAINERS
  treewide: Fix typos in various parts of the kernel, and fix some comments.
  clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR
  gpio: Kconfig: drop unknown symbol 'CS5535_GPIO'
  leds: Kconfig: Fix typo 'D2NET_V2'
  sound: Kconfig: drop unknown symbol ARCH_CLPS7500
  ...

Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new
kconfig additions, close to removed commented-out old ones)
2012-01-08 13:21:22 -08:00
Linus Torvalds
eb59c505f8 Merge branch 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
* 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits)
  PM / Hibernate: Implement compat_ioctl for /dev/snapshot
  PM / Freezer: fix return value of freezable_schedule_timeout_killable()
  PM / shmobile: Allow the A4R domain to be turned off at run time
  PM / input / touchscreen: Make st1232 use device PM QoS constraints
  PM / QoS: Introduce dev_pm_qos_add_ancestor_request()
  PM / shmobile: Remove the stay_on flag from SH7372's PM domains
  PM / shmobile: Don't include SH7372's INTCS in syscore suspend/resume
  PM / shmobile: Add support for the sh7372 A4S power domain / sleep mode
  PM: Drop generic_subsys_pm_ops
  PM / Sleep: Remove forward-only callbacks from AMBA bus type
  PM / Sleep: Remove forward-only callbacks from platform bus type
  PM: Run the driver callback directly if the subsystem one is not there
  PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers
  PM/Devfreq: Add Exynos4-bus device DVFS driver for Exynos4210/4212/4412.
  PM / Sleep: Merge internal functions in generic_ops.c
  PM / Sleep: Simplify generic system suspend callbacks
  PM / Hibernate: Remove deprecated hibernation snapshot ioctls
  PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()
  ARM: S3C64XX: Implement basic power domain support
  PM / shmobile: Use common always on power domain governor
  ...

Fix up trivial conflict in fs/xfs/xfs_buf.c due to removal of unused
XBT_FORCE_SLEEP bit
2012-01-08 13:10:57 -08:00
Linus Torvalds
1619ed8f60 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
  GFS2: local functions should be static
  GFS2: We only need one ACL getting function
  GFS2: Fix multi-block allocation
  GFS2: decouple quota allocations from block allocations
  GFS2: split function rgblk_search
  GFS2: Fix up "off by one" in the previous patch
  GFS2: move toward a generic multi-block allocator
  GFS2: O_(D)SYNC support for fallocate
  GFS2: remove vestigial al_alloced
  GFS2: combine gfs2_alloc_block and gfs2_alloc_di
  GFS2: Add non-try locks back to get_local_rgrp
  GFS2: f_ra is always valid in dir readahead function
  GFS2: Fix very unlikley memory leak in ACL xattr code
  GFS2: More automated code analysis fixes
  GFS2: Add readahead to sequential directory traversal
  GFS2: Fix up REQ flags
2012-01-08 13:07:54 -08:00
Linus Torvalds
29ad0de279 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs: (22 commits)
  xfs: mark the xfssyncd workqueue as non-reentrant
  xfs: simplify xfs_qm_detach_gdquots
  xfs: fix acl count validation in xfs_acl_from_disk()
  xfs: remove unused XBT_FORCE_SLEEP bit
  xfs: remove XFS_QMOPT_DQSUSER
  xfs: kill xfs_qm_idtodq
  xfs: merge xfs_qm_dqinit_core into the only caller
  xfs: add a xfs_dqhold helper
  xfs: simplify xfs_qm_dqattach_grouphint
  xfs: nest qm_dqfrlist_lock inside the dquot qlock
  xfs: flatten the dquot lock ordering
  xfs: implement lazy removal for the dquot freelist
  xfs: remove XFS_DQ_INACTIVE
  xfs: cleanup xfs_qm_dqlookup
  xfs: cleanup dquot locking helpers
  xfs: remove the sync_mode argument to xfs_qm_dqflush_all
  xfs: remove xfs_qm_sync
  xfs: make sure to really flush all dquots in xfs_qm_quotacheck
  xfs: untangle SYNC_WAIT and SYNC_TRYLOCK meanings for xfs_qm_dqflush
  xfs: remove the lid_size field in struct log_item_desc
  ...

Fix up trivial conflict in fs/xfs/xfs_sync.c
2012-01-08 13:05:29 -08:00
Linus Torvalds
972b2c7199 Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits)
  reiserfs: Properly display mount options in /proc/mounts
  vfs: prevent remount read-only if pending removes
  vfs: count unlinked inodes
  vfs: protect remounting superblock read-only
  vfs: keep list of mounts for each superblock
  vfs: switch ->show_options() to struct dentry *
  vfs: switch ->show_path() to struct dentry *
  vfs: switch ->show_devname() to struct dentry *
  vfs: switch ->show_stats to struct dentry *
  switch security_path_chmod() to struct path *
  vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb
  vfs: trim includes a bit
  switch mnt_namespace ->root to struct mount
  vfs: take /proc/*/mounts and friends to fs/proc_namespace.c
  vfs: opencode mntget() mnt_set_mountpoint()
  vfs: spread struct mount - remaining argument of next_mnt()
  vfs: move fsnotify junk to struct mount
  vfs: move mnt_devname
  vfs: move mnt_list to struct mount
  vfs: switch pnode.h macros to struct mount *
  ...
2012-01-08 12:19:57 -08:00
Boaz Harrosh
724577ca35 ore: Must support none-PAGE-aligned IO
NFS might send us offsets that are not PAGE aligned. So
we must read in the reminder of the first/last pages, in cases
we need it for Parity calculations.

We only add an sg segments to read the partial page. But
we don't mark it as read=true because it is a lock-for-write
page.

TODO: In some cases (IO spans a single unit) we can just
adjust the raid_unit offset/length, but this is left for
later Kernels.

[Bug in 3.2.0 Kernel]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-01-08 10:43:13 +02:00
Wu Fengguang
bc31b86a59 writeback: move MIN_WRITEBACK_PAGES to fs-writeback.c
Fix compile error

 fs/fs-writeback.c:515:33: error: ‘PAGE_CACHE_SHIFT’ undeclared (first use in this function)

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2012-01-08 10:35:19 +08:00
Alexandre Oliva
1bb91902dc Btrfs: revamp clustered allocation logic
Parameterize clusters on minimum total size, minimum chunk size and
minimum contiguous size for at least one chunk, without limits on
cluster, window or gap sizes.  Don't tolerate any fragmentation for
SSD_SPREAD; accept it for metadata, but try to keep data dense.

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-07 19:15:15 -05:00
Alexandre Oliva
fc7c1077ce Btrfs: don't set up allocation result twice
We store the allocation start and length twice in ins, once right
after the other, but with intervening calls that may prevent the
duplicate from being optimized out by the compiler.  Remove one of the
assignments.

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-07 19:15:14 -05:00
Linus Torvalds
7affca3537 Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits)
  arm: fix up some samsung merge sysdev conversion problems
  firmware: Fix an oops on reading fw_priv->fw in sysfs loading file
  Drivers:hv: Fix a bug in vmbus_driver_unregister()
  driver core: remove __must_check from device_create_file
  debugfs: add missing #ifdef HAS_IOMEM
  arm: time.h: remove device.h #include
  driver-core: remove sysdev.h usage.
  clockevents: remove sysdev.h
  arm: convert sysdev_class to a regular subsystem
  arm: leds: convert sysdev_class to a regular subsystem
  kobject: remove kset_find_obj_hinted()
  m86k: gpio - convert sysdev_class to a regular subsystem
  mips: txx9_sram - convert sysdev_class to a regular subsystem
  mips: 7segled - convert sysdev_class to a regular subsystem
  sh: dma - convert sysdev_class to a regular subsystem
  sh: intc - convert sysdev_class to a regular subsystem
  power: suspend - convert sysdev_class to a regular subsystem
  power: qe_ic - convert sysdev_class to a regular subsystem
  power: cmm - convert sysdev_class to a regular subsystem
  s390: time - convert sysdev_class to a regular subsystem
  ...

Fix up conflicts with 'struct sysdev' removal from various platform
drivers that got changed:
 - arch/arm/mach-exynos/cpu.c
 - arch/arm/mach-exynos/irq-eint.c
 - arch/arm/mach-s3c64xx/common.c
 - arch/arm/mach-s3c64xx/cpu.c
 - arch/arm/mach-s5p64x0/cpu.c
 - arch/arm/mach-s5pv210/common.c
 - arch/arm/plat-samsung/include/plat/cpu.h
 - arch/powerpc/kernel/sysfs.c
and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h
2012-01-07 12:03:30 -08:00
Trond Myklebust
6926afd192 NFSv4: Save the owner/group name string when doing open
...so that we can do the uid/gid mapping outside the asynchronous RPC
context.
This fixes a bug in the current NFSv4 atomic open code where the client
isn't able to determine what the true uid/gid fields of the file are,
(because the asynchronous nature of the OPEN call denies it the ability
to do an upcall) and so fills them with default values, marking the
inode as needing revalidation.
Unfortunately, in some cases, the VFS will do some additional sanity
checks on the file, and may override the server's decision to allow
the open because it sees the wrong owner/group fields.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-07 13:22:46 -05:00
Jan Kara
c3aa077648 reiserfs: Properly display mount options in /proc/mounts
Make reiserfs properly display mount options in /proc/mounts.

CC: reiserfs-devel@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06 23:20:13 -05:00
Miklos Szeredi
8e8b87964b vfs: prevent remount read-only if pending removes
If there are any inodes on the super block that have been unlinked
(i_nlink == 0) but have not yet been deleted then prevent the
remounting the super block read-only.

Reported-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06 23:20:13 -05:00
Miklos Szeredi
7ada4db886 vfs: count unlinked inodes
Add a new counter to the superblock that keeps track of unlinked but
not yet deleted inodes.

Do not WARN_ON if set_nlink is called with zero count, just do a
ratelimited printk.  This happens on xfs and probably other
filesystems after an unclean shutdown when the filesystem reads inodes
which already have zero i_nlink.  Reported by Christoph Hellwig.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06 23:20:12 -05:00
Miklos Szeredi
4ed5e82fe7 vfs: protect remounting superblock read-only
Currently remouting superblock read-only is racy in a major way.

With the per mount read-only infrastructure it is now possible to
prevent most races, which this patch attempts.

Before starting the remount read-only, iterate through all mounts
belonging to the superblock and if none of them have any pending
writes, set sb->s_readonly_remount.  This indicates that remount is in
progress and no further write requests are allowed.  If the remount
succeeds set MS_RDONLY and reset s_readonly_remount.

If the remounting is unsuccessful just reset s_readonly_remount.
This can result in transient EROFS errors, despite the fact the
remount failed.  Unfortunately hodling off writes is difficult as
remount itself may touch the filesystem (e.g. through load_nls())
which would deadlock.

A later patch deals with delayed writes due to nlink going to zero.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06 23:20:12 -05:00
Miklos Szeredi
39f7c4db1d vfs: keep list of mounts for each superblock
Keep track of vfsmounts belonging to a superblock.  List is protected
by vfsmount_lock.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06 23:20:12 -05:00
Al Viro
34c80b1d93 vfs: switch ->show_options() to struct dentry *
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06 23:19:54 -05:00
Al Viro
a6322de67b vfs: switch ->show_path() to struct dentry *
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06 23:16:55 -05:00
Al Viro
d861c630e9 vfs: switch ->show_devname() to struct dentry *
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06 23:16:54 -05:00
Al Viro
64132379d5 vfs: switch ->show_stats to struct dentry *
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06 23:16:54 -05:00
Al Viro
cdcf116d44 switch security_path_chmod() to struct path *
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06 23:16:53 -05:00
Al Viro
d8c9584ea2 vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06 23:16:53 -05:00
Al Viro
ece2ccb668 Merge branches 'vfsmount-guts', 'umode_t' and 'partitions' into Z 2012-01-06 23:15:54 -05:00
Linus Torvalds
6ed23fd6c0 Merge branch 'pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
* 'pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  pstore: gracefully handle NULL pstore_info functions
  pstore: pass reason to backend write callback
2012-01-06 18:03:02 -08:00
Linus Torvalds
9753dfe19a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1958 commits)
  net: pack skb_shared_info more efficiently
  net_sched: red: split red_parms into parms and vars
  net_sched: sfq: extend limits
  cnic: Improve error recovery on bnx2x devices
  cnic: Re-init dev->stats_addr after chip reset
  net_sched: Bug in netem reordering
  bna: fix sparse warnings/errors
  bna: make ethtool_ops and strings const
  xgmac: cleanups
  net: make ethtool_ops const
  vmxnet3" make ethtool ops const
  xen-netback: make ops structs const
  virtio_net: Pass gfp flags when allocating rx buffers.
  ixgbe: FCoE: Add support for ndo_get_fcoe_hbainfo() call
  netdev: FCoE: Add new ndo_get_fcoe_hbainfo() call
  igb: reset PHY after recovering from PHY power down
  igb: add basic runtime PM support
  igb: Add support for byte queue limits.
  e1000: cleanup CE4100 MDIO registers access
  e1000: unmap ce4100_gbe_mdio_base_virt in e1000_remove
  ...
2012-01-06 17:22:09 -08:00
Alexandre Oliva
a5f6f719a5 Btrfs: test free space only for unclustered allocation
Since the clustered allocation may be taking extents from a different
block group, there's no point in spin-locking and testing the current
block group free space before attempting to allocate space from a
cluster, even more so when we might refrain from even trying the
cluster in the current block group because, after the cluster was set
up, not enough free space remained.  Furthermore, cluster creation
attempts fail fast when the block group doesn't have enough free
space, so the test was completely superfluous.

I've move the free space test past the cluster allocation attempt,
where it is more useful, and arranged for a cluster in the current
block group to be released before trying an unclustered allocation,
when we reach the LOOP_NO_EMPTY_SIZE stage, so that the free space in
the cluster stands a chance of being combined with additional free
space in the block group so as to succeed in the allocation attempt.

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-06 15:48:21 -05:00
Chris Mason
1100373f8a Btrfs: use bigger metadata chunks on bigger filesystems
The 256MB chunk is a little small on a huge FS.  This scales up the
chunk size.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-06 15:47:38 -05:00
Chris Mason
cf1d72c9ce Btrfs: lower the bar for chunk allocation
The chunk allocation code has tried to keep a pretty tight lid on creating new
metadata chunks.  This is partially because in the past the reservation
code didn't give us an accurate idea of how much space was being used.

The new code is much more accurate, so we're able to get rid of some of these
checks.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-06 15:41:34 -05:00
Chris Mason
203bf287cb Btrfs: run chunk allocations while we do delayed refs
Btrfs tries to batch extent allocation tree changes to improve performance
and reduce metadata trashing.  But it doesn't allocate new metadata chunks
while it is doing allocations for the extent allocation tree.

This commit changes the delayed refence code to do chunk allocations if we're
getting low on room.  It prevents crashes and improves performance.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-06 15:23:57 -05:00
Greg Kroah-Hartman
ff4b8a57f0 Merge branch 'driver-core-next' into Linux 3.2
This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file,
and it fixes the build error in the arch/x86/kernel/microcode_core.c
file, that the merge did not catch.

The microcode_core.c patch was provided by Stephen Rothwell
<sfr@canb.auug.org.au> who was invaluable in the merge issues involved
with the large sysdev removal process in the driver-core tree.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06 11:42:52 -08:00
Linus Torvalds
0db49b72bc Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  sched/tracing: Add a new tracepoint for sleeptime
  sched: Disable scheduler warnings during oopses
  sched: Fix cgroup movement of waking process
  sched: Fix cgroup movement of newly created process
  sched: Fix cgroup movement of forking process
  sched: Remove cfs bandwidth period check in tg_set_cfs_period()
  sched: Fix load-balance lock-breaking
  sched: Replace all_pinned with a generic flags field
  sched: Only queue remote wakeups when crossing cache boundaries
  sched: Add missing rcu_dereference() around ->real_parent usage
  [S390] fix cputime overflow in uptime_proc_show
  [S390] cputime: add sparse checking and cleanup
  sched: Mark parent and real_parent as __rcu
  sched, nohz: Fix missing RCU read lock
  sched, nohz: Set the NOHZ_BALANCE_KICK flag for idle load balancer
  sched, nohz: Fix the idle cpu check in nohz_idle_balance
  sched: Use jump_labels for sched_feat
  sched/accounting: Fix parameter passing in task_group_account_field
  sched/accounting: Fix user/system tick double accounting
  sched/accounting: Re-use scheduler statistics for the root cgroup
  ...

Fix up conflicts in
 - arch/ia64/include/asm/cputime.h, include/asm-generic/cputime.h
	usecs_to_cputime64() vs the sparse cleanups
 - kernel/sched/fair.c, kernel/time/tick-sched.c
	scheduler changes in multiple branches
2012-01-06 08:44:54 -08:00
Aneesh Kumar K.V
f766619db2 fs/9p: iattr_valid flags are kernel internal flags map them to 9p values.
Kernel internal values can change, add protocol values for these constant
and use them.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2012-01-06 10:26:07 -06:00
Boaz Harrosh
361aba569f ore: fix BUG_ON, too few sgs when reading
When reading RAID5 files, in rare cases, we calculated too
few sg segments. There should be two extra for the beginning
and end partial units.

Also "too few sg segments" should not be a BUG_ON there is
all the mechanics in place to handle it, as a short read.
So just return -ENOMEM and the rest of the code will gracefully
split the IO.

[Bug in 3.2.0 Kernel]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-01-06 16:49:07 +02:00
Boaz Harrosh
ffefb8eaa3 ore: Fix crash in case of an IO error.
The users of ore_check_io() expect the reported device
(In case of error) to be indexed relative to the passed-in
ore_components table, and not the logical dev index.

This causes a crash inside objlayoutdriver in case of
an IO error.

[Bug in 3.2.0 Kernel]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-01-06 16:49:06 +02:00
Boaz Harrosh
831c2dc5f4 ore: FIX breakage when MISC_FILESYSTEMS is not set
As Reported by Randy Dunlap

When MISC_FILESYSTEMS is not enabled and NFS4.1 is:

fs/built-in.o: In function `objio_alloc_io_state':
objio_osd.c:(.text+0xcb525): undefined reference to `ore_get_rw_state'
fs/built-in.o: In function `_write_done':
objio_osd.c:(.text+0xcb58d): undefined reference to `ore_check_io'
fs/built-in.o: In function `_read_done':
...

When MISC_FILESYSTEMS, which is more of a GUI thing then anything else,
is not selected. exofs/Kconfig is never examined during Kconfig,
and it can not do it's magic stuff to automatically select everything
needed.

We must split exofs/Kconfig in two. The ore one is always included.
And the exofs one is left in it's old place in the menu.

[Needed for the 3.2.0 Kernel]
CC: Stable Tree <stable@kernel.org>
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-01-06 16:48:14 +02:00
Trond Myklebust
e2fecb215b NFS: Remove pNFS bloat from the generic write path
We have no business doing any this in the standard write release path.
Get rid of it, and put it in the pNFS layer.

Also, while we're at it, get rid of the completely bogus unlock/relock
semantics that were present in nfs_writeback_release_full(). It is
not only unnecessary, but actually dangerous to release the write lock
just in order to take it again in nfs_page_async_flush(). Better just
to open code the pgio operations in a pnfs helper.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-06 08:57:46 -05:00
Boaz Harrosh
fe0fe83585 pnfs-obj: Must return layout on IO error
As mandated by the standard. In case of an IO error, a pNFS
objects layout driver must return it's layout. This is because
all device errors are reported to the server as part of the
layout return buffer.

This is implemented the same way PNFS_LAYOUTRET_ON_SETATTR
is done, through a bit flag on the pnfs_layoutdriver_type->flags
member. The flag is set by the layout driver that wants a
layout_return preformed at pnfs_ld_{write,read}_done in case
of an error.
(Though I have not defined a wrapper like pnfs_ld_layoutret_on_setattr
 because this code is never called outside of pnfs.c and pnfs IO
 paths)

Without this patch 3.[0-2] Kernels leak memory and have an annoying
WARN_ON after every IO error utilizing the pnfs-obj driver.

[This patch is for 3.2 Kernel. 3.1/0 Kernels need a different patch]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-06 08:55:33 -05:00
Boaz Harrosh
5c0b4129c0 pnfs-obj: pNFS errors are communicated on iodata->pnfs_error
Some time along the way pNFS IO errors were switched to
communicate with a special iodata->pnfs_error member instead
of the regular RPC members. But objlayout was not switched
over.

Fix that!
Without this fix any IO error is hanged, because IO is not
switched to MDS and pages are never cleared or read.

[Applies to 3.2.0. Same bug different patch for 3.1/0 Kernels]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-06 08:55:23 -05:00
Mauro Carvalho Chehab
b35009a9a7 Merge tag 'v3.2' into staging/for_v3.3
* tag 'v3.2': (83 commits)
  Linux 3.2
  minixfs: misplaced checks lead to dentry leak
  ptrace: ensure JOBCTL_STOP_SIGMASK is not zero after detach
  ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE race
  Revert "rtc: Expire alarms after the time is set."
  [CIFS] default ntlmv2 for cifs mount delayed to 3.3
  cifs: fix bad buffer length check in coalesce_t2
  Revert "rtc: Disable the alarm in the hardware"
  hung_task: fix false positive during vfork
  security: Fix security_old_inode_init_security() when CONFIG_SECURITY is not set
  fix CAN MAINTAINERS SCM tree type
  mwifiex: fix crash during simultaneous scan and connect
  b43: fix regression in PIO case
  ath9k: Fix kernel panic in AR2427 in AP mode
  CAN MAINTAINERS update
  net: fsl: fec: fix build for mx23-only kernel
  sch_qfq: fix overflow in qfq_update_start()
  drm/radeon/kms/atom: fix possible segfault in pm setup
  gspca: Fix falling back to lower isoc alt settings
  futex: Fix uninterruptible loop due to gate_area
  ...
2012-01-06 10:18:43 -02:00
Eric Paris
69f594a389 ptrace: do not audit capability check when outputing /proc/pid/stat
Reading /proc/pid/stat of another process checks if one has ptrace permissions
on that process.  If one does have permissions it outputs some data about the
process which might have security and attack implications.  If the current
task does not have ptrace permissions the read still works, but those fields
are filled with inocuous (0) values.  Since this check and a subsequent denial
is not a violation of the security policy we should not audit such denials.

This can be quite useful to removing ptrace broadly across a system without
flooding the logs when ps is run or something which harmlessly walks proc.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
2012-01-05 18:53:00 -05:00
Linus Torvalds
07d106d0a3 vfs: fix up ENOIOCTLCMD error handling
We're doing some odd things there, which already messes up various users
(see the net/socket.c code that this removes), and it was going to add
yet more crud to the block layer because of the incorrect error code
translation.

ENOIOCTLCMD is not an error return that should be returned to user mode
from the "ioctl()" system call, but it should *not* be translated as
EINVAL ("Invalid argument").  It should be translated as ENOTTY
("Inappropriate ioctl for device").

That EINVAL confusion has apparently so permeated some code that the
block layer actually checks for it, which is sad.  We continue to do so
for now, but add a big comment about how wrong that is, and we should
remove it entirely eventually.  In the meantime, this tries to keep the
changes localized to just the EINVAL -> ENOTTY fix, and removing code
that makes it harder to do the right thing.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-05 15:40:12 -08:00
J. Bruce Fields
7a6ef8c723 nfsd4: nfsd4_create_clid_dir return value is unused
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-01-05 15:38:41 -05:00
Chuck Lever
9b4146e855 NFSD: Change name of extended attribute containing junction
As of fedfs-utils-0.8.0, user space stores all NFS junction
information in a single extended attribute: "trusted.junction.nfs".

Both FedFS and NFS basic junctions are stored in this one attribute,
and the intention is that all future forms of NFS junction metadata
will be stored in this attribute.  Other protocols may use a different
extended attribute.

Thus NFSD needs to look only for that one extended attribute.  The
"trusted.junction.type" xattr is deprecated.  fedfs-utils-0.8.0 will
continue to attach a "trusted.junction.type" xattr to junctions, but
future fedfs-utils releases may no longer do that.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-01-05 15:35:57 -05:00
J. Bruce Fields
b8548894bd nfsd4: be forgiving in the absence of the recovery directory
If the recovery directory doesn't exist, then behavior after a reboot
will be suboptimal.  But it's unnecessarily harsh to then prevent the
nfsv4 server from working at all.  Instead just print a warning
(already done in nfsd4_init_recdir()) and soldier on.

Tested-by: Lior <lior@tonian.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-01-05 15:23:19 -05:00
Chuck Lever
0aaaf5c424 NFS: Cache state owners after files are closed
Servers have a finite amount of memory to store NFSv4 open and lock
owners.  Moreover, servers may have a difficult time determining when
they can reap their state owner table, thanks to gray areas in the
NFSv4 protocol specification.  Thus clients should be careful to reuse
state owners when possible.

Currently Linux is not too careful.  When a user has closed all her
files on one mount point, the state owner's reference count goes to
zero, and it is released.  The next OPEN allocates a new one.  A
workload that serially opens and closes files can run through a large
number of open owners this way.

When a state owner's reference count goes to zero, slap it onto a free
list for that nfs_server, with an expiry time.  Garbage collect before
looking for a state owner.  This makes state owners for active users
available for re-use.

Now that there can be unused state owners remaining at umount time,
purge the state owner free list when a server is destroyed.  Also be
sure not to reclaim unused state owners during state recovery.

This change has benefits for the client as well.  For some workloads,
this approach drops the number of OPEN_CONFIRM calls from the same as
the number of OPEN calls, down to just one.  This reduces wire traffic
and thus open(2) latency.  Before this patch, untarring a kernel
source tarball shows the OPEN_CONFIRM call counter steadily increasing
through the test.  With the patch, the OPEN_CONFIRM count remains at 1
throughout the entire untar.

As long as the expiry time is kept short, I don't think garbage
collection should be terribly expensive, although it does bounce the
clp->cl_lock around a bit.

[ At some point we should rationalize the use of the nfs_server
->destroy method. ]

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[Trond: Fixed a garbage collection race and a few efficiency issues]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05 11:59:18 -05:00
Aneesh Kumar K.V
b605479306 fs/9p: We should not allocate a new inode when creating hardlines.
Don't do new_inode_from fid in case of hardlink creation. This ensures
that link count for hardlink files get updated properly. Earlier link count
was not updated on removing a hardlink with cache mode enabled.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2012-01-05 10:51:44 -06:00
Aneesh Kumar K.V
df345c674b fs/9p: v9fs_stat2inode should update suid/sgid bits.
Create a new helper that update the permission bits and use
that, instead of opencoding the logic.

Reported and bisected by:  M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2012-01-05 10:51:44 -06:00
Joe Perches
5d3851530d 9p: Reduce object size with CONFIG_NET_9P_DEBUG
Reduce object size by deduplicating formats.

Use vsprintf extension %pV.
Rename P9_DPRINTK uses to p9_debug, align arguments.
Add function for _p9_debug and macro to add __func__.
Add missing "\n"s to p9_debug uses.
Remove embedded function names as p9_debug adds it.
Remove P9_EPRINTK macro and convert use to pr_<level>.
Add and use pr_fmt and pr_<level>.

$ size fs/9p/built-in.o*
   text	   data	    bss	    dec	    hex	filename
  62133	    984	  16000	  79117	  1350d	fs/9p/built-in.o.new
  67342	    984	  16928	  85254	  14d06	fs/9p/built-in.o.old
$ size net/9p/built-in.o*
   text	   data	    bss	    dec	    hex	filename
  88792	   4148	  22024	 114964	  1c114	net/9p/built-in.o.new
  94072	   4148	  23232	 121452	  1da6c	net/9p/built-in.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2012-01-05 10:51:44 -06:00
Jim Garlick
a0ea787b02 fs/9p: check schedule_timeout_interruptible return value
In v9fs_file_do_lock() we need to check return value of
schedule_timeout_interruptible() and exit the loop when it
returns nonzero, otherwise the loop is not really interruptible
and after the signal, the loop is no longer throttled by
P9_LOCK_TIMEOUT.

Signed-off-by: Jim Garlick <garlick.jim@gmail.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2012-01-05 10:51:44 -06:00
Chuck Lever
414adf14cd NFS: Clean up nfs4_find_state_owners_locked()
There's no longer a need to check the so_server field in the state
owner, because nowadays the RB tree we search for state owners
contains owners for that only server.

Make nfs4_find_state_owners_locked() use the same tree searching logic
as nfs4_insert_state_owner_locked().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05 10:42:42 -05:00
Andy Adamson
bf118a342f NFSv4: include bitmap in nfsv4 get acl data
The NFSv4 bitmap size is unbounded: a server can return an arbitrary
sized bitmap in an FATTR4_WORD0_ACL request.  Replace using the
nfs4_fattr_bitmap_maxsz as a guess to the maximum bitmask returned by a server
with the inclusion of the bitmap (xdr length plus bitmasks) and the acl data
xdr length to the (cached) acl page data.

This is a general solution to commit e5012d1f "NFSv4.1: update
nfs4_fattr_bitmap_maxsz" and fixes hitting a BUG_ON in xdr_shrink_bufhead
when getting ACLs.

Fix a bug in decode_getacl that returned -EINVAL on ACLs > page when getxattr
was called with a NULL buffer, preventing ACL > PAGE_SIZE from being retrieved.

Cc: stable@kernel.org
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05 10:42:42 -05:00
Chris Metcalf
3476f114ad nfs: fix a minor do_div portability issue
This change modifies filelayout_get_dense_offset() to use the functions
in math64.h and thus avoid a 32-bit platform compile error trying to
use do_div() on an s64 type.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Reviewed-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05 10:42:42 -05:00
Andy Adamson
0b1c8fc43c NFSv4.1: cleanup comment and debug printk
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05 10:42:41 -05:00
Andy Adamson
aabd0b40b3 NFSv4.1: change nfs4_free_slot parameters for dynamic slots
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05 10:42:41 -05:00
Andy Adamson
aacd553727 NFSv4.1: cleanup init and reset of session slot tables
We are either initializing or resetting a session. Initialize or reset
the session slot tables accordingly.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05 10:42:40 -05:00
Andy Adamson
61f2e51065 NFSv4.1: fix backchannel slotid off-by-one bug
Cc:stable@kernel.org
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05 10:42:40 -05:00
Jeff Layton
8a0d551a59 nfs: fix regression in handling of context= option in NFSv4
Setting the security context of a NFSv4 mount via the context= mount
option is currently broken. The NFSv4 codepath allocates a parsed
options struct, and then parses the mount options to fill it. It
eventually calls nfs4_remote_mount which calls security_init_mnt_opts.
That clobbers the lsm_opts struct that was populated earlier. This bug
also looks like it causes a small memory leak on each v4 mount where
context= is used.

Fix this by moving the initialization of the lsm_opts into
nfs_alloc_parsed_mount_data. Also, add a destructor for
nfs_parsed_mount_data to make it easier to free all of the allocations
hanging off of it, and to ensure that the security_free_mnt_opts is
called whenever security_init_mnt_opts is.

I believe this regression was introduced quite some time ago, probably
by commit c02d7adf.

Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05 10:42:40 -05:00
NeilBrown
2edb6bc385 NFS - fix recent breakage to NFS error handling.
From c6d615d2b97fe305cbf123a8751ced859dca1d5e Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Wed, 16 Nov 2011 09:39:05 +1100
Subject: [PATCH] NFS - fix recent breakage to NFS error handling.

commit 02c24a8218 made a small and
presumably unintended change to write error handling in NFS.

Previously an error from filemap_write_and_wait_range would only be of
interest if nfs_file_fsync did not return an error.  After this commit,
an error from filemap_write_and_wait_range would mean that (the rest of)
nfs_file_fsync would not even be called.

This means that:
 1/ you are more likely to see EIO than e.g. EDQUOT or ENOSPC.
 2/ NFS_CONTEXT_ERROR_WRITE remains set for longer so more writes are
    synchronous.

This patch restores previous behaviour.

Cc: stable@kernel.org
Cc: Josef Bacik <josef@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05 10:42:39 -05:00
Trond Myklebust
68c97153fb SUNRPC: Clean up the RPCSEC_GSS service ticket requests
Instead of hacking specific service names into gss_encode_v1_msg, we should
just allow the caller to specify the service name explicitly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
2012-01-05 10:42:38 -05:00
Jan Schmidt
6bf7e080d5 Btrfs: make sure we're not using obsolete code in btrfs_get_extent
There's code in btrfs_get_extent that should never be used. This patch turns
a WARN_ON(1) into a BUG(), hoping we can remove the transaction code from
btrfs_get_extent soon.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-01-05 15:11:55 +01:00
Jan Schmidt
4692cf58aa Btrfs: new backref walking code
The old backref iteration code could only safely be used on commit roots.
Besides this limitation, it had bugs in finding the roots for these
references. This commit replaces large parts of it by btrfs_find_all_roots()
which a) really finds all roots and the correct roots, b) works correctly
under heavy file system load, c) considers delayed refs.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-01-05 10:49:43 +01:00
Eric Sandeen
5f163cc759 ext4: make more symbols static
A couple more functions can reasonably be made static if desired.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-01-04 22:33:28 -05:00
Djalal Harouni
176576dbc8 ext4: make local symbol ext4_initxattrs static
The ext4_initxattrs symbol is used only in this file, so it should be
declared static.

Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-01-04 22:32:12 -05:00
Jan Kara
9837d8e982 jbd2: fix hung processes in jbd2_journal_lock_updates()
Toshiyuki Okajima found out that when running

for ((i=0; i < 100000; i++)); do
        if ((i%2 == 0)); then
                chattr +j /mnt/file
        else
                chattr -j /mnt/file
        fi
        echo "0" >> /mnt/file
done

process sometimes hangs indefinitely in jbd2_journal_lock_updates().

Toshiyuki identified that the following race happens:

jbd2_journal_lock_updates()            |jbd2_journal_stop()
---------------------------------------+---------------------------------------
 write_lock(&journal->j_state_lock)    |    .
 ++journal->j_barrier_count            |    .
 spin_lock(&tran->t_handle_lock)       |    .
 atomic_read(&tran->t_updates) //not 0 |
                                       | atomic_dec_and_test(&tran->t_updates)
                                       |    // t_updates = 0
                                       | wake_up(&journal->j_wait_updates)
 prepare_to_wait()                     |    // no process is woken up.
 spin_unlock(&tran->t_handle_lock)     |
 write_unlock(&journal->j_state_lock)  |
 schedule() // never return            |

We fix the problem by first calling prepare_to_wait() and only after that
checking t_updates in jbd2_journal_lock_updates().

Reported-and-analyzed-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-01-04 22:03:11 -05:00
Theodore Ts'o
9b90e5e028 ext4: reserve new feature flag codepoints
Reserve the ext4 features flags EXT4_FEATURE_RO_COMPAT_METADATA_CSUM,
EXT4_FEATURE_INCOMPAT_INLINEDATA, and EXT4_FEATURE_INCOMPAT_LARGEDIR.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-01-04 22:01:53 -05:00
David S. Miller
117ff42fd4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-01-04 21:35:43 -05:00
Ben Hutchings
1d526fc91b ext4: Report max_batch_time option correctly
Currently the value reported for max_batch_time is really the
value of min_batch_time.

Reported-by: Russell Coker <russell@coker.com.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-01-04 21:22:51 -05:00
Al Viro
d6042eac44 minixfs: misplaced checks lead to dentry leak
bitmap size sanity checks should be done *before* allocating ->s_root;
there their cleanup on failure would be correct.  As it is, we do iput()
on root inode, but leak the root dentry...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-04 15:03:06 -08:00
Djalal Harouni
014a177037 ext4: add missing ext4_resize_end on error paths
Online resize ioctls 'EXT4_IOC_GROUP_EXTEND' and 'EXT4_IOC_GROUP_ADD'
call ext4_resize_begin() to check permissions and to set the
EXT4_RESIZING bit lock, they do their work and they must finish with
ext4_resize_end() which calls clear_bit_unlock() to unlock and to
avoid -EBUSY errors for the next resize operations.

This patch adds the missing ext4_resize_end() calls on error paths.

Patch tested.

Cc: stable@vger.kernel.org
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-01-04 17:09:52 -05:00
Yongqiang Yang
61f296cc49 ext4: let ext4_group_add() use common code
This patch lets ext4_group_add() call ext4_flex_group_add().

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-01-04 17:09:50 -05:00
Yongqiang Yang
d89651c8e2 ext4: let ext4_group_extend() use common code
ext4_group_extend_no_check() is moved out from ext4_group_extend(),
this patch lets ext4_group_extend() call ext4_group_extentd_no_check()
instead.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-01-04 17:09:48 -05:00
Yongqiang Yang
19c5246d25 ext4: add new online resize interface
This patch adds new online resize interface, whose input argument is a
64-bit integer indicating how many blocks there are in the resized fs.

In new resize impelmentation, all work like allocating group tables
are done by kernel side, so the new resize interface can support
flex_bg feature and prepares ground for suppoting resize with features
like bigalloc and exclude bitmap. Besides these, user-space tools just
passes in the new number of blocks.

We delay initializing the bitmaps and inode tables of added groups if
possible and add multi groups (a flex groups) each time, so new resize
is very fast like mkfs.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-01-04 17:09:44 -05:00
Jan Schmidt
8da6d5815c Btrfs: added btrfs_find_all_roots()
This function gets a byte number (a data extent), collects all the leafs
pointing to it and walks up the trees to find all fs roots pointing to those
leafs. It also returns the list of all leafs pointing to that extent.

It does proper locking for the involved trees, can be used on busy file
systems and honors delayed refs.

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-01-04 16:26:38 +01:00
Jan Schmidt
a168650c08 Btrfs: add waitqueue instead of doing busy waiting for more delayed refs
Now that we may be holding back delayed refs for a limited period, we
might end up having no runnable delayed refs. Without this commit, we'd
do busy waiting in that thread until another (runnable) ref arives.
Instead, we're detecting this situation and use a waitqueue, such that
we only try to run more refs after
	a) another runnable ref was added  or
	b) delayed refs are no longer held back

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-01-04 16:12:48 +01:00
Arne Jansen
d1270cd91f Btrfs: put back delayed refs that are too new
When processing a delayed ref, first check if there are still old refs in
the process of being added. If so, put this ref back to the tree. To avoid
looping on this ref, choose a newer one in the next loop.
btrfs_find_ref_cluster has to take care of that.

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-01-04 16:12:45 +01:00
Arne Jansen
00f04b8879 Btrfs: add sequence numbers to delayed refs
Sequence numbers are needed to reconstruct the backrefs of a given extent to
a certain point in time. The total set of backrefs consist of the set of
backrefs recorded on disk plus the enqueued delayed refs for it that existed
at that moment.

This patch also adds a list that records all delayed refs which are
currently in the process of being added.

When walking all refs of an extent in btrfs_find_all_roots(), we freeze the
current state of delayed refs, honor anythinh up to this point and prevent
processing newer delayed refs to assert consistency.

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-01-04 16:12:42 +01:00
Arne Jansen
5b25f70f42 Btrfs: add nested locking mode for paths
This patch adds the possibilty to read-lock an extent even if it is already
write-locked from the same thread. btrfs_find_all_roots() needs this
capability.

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-01-04 16:12:29 +01:00
David Teigland
60f98d1839 dlm: add recovery callbacks
These new callbacks notify the dlm user about lock recovery.
GFS2, and possibly others, need to be aware of when the dlm
will be doing lock recovery for a failed lockspace member.

In the past, this coordination has been done between dlm and
file system daemons in userspace, which then direct their
kernel counterparts.  These callbacks allow the same
coordination directly, and more simply.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-01-04 08:56:31 -06:00
David Teigland
757a427196 dlm: add node slots and generation
Slot numbers are assigned to nodes when they join the lockspace.
The slot number chosen is the minimum unused value starting at 1.
Once a node is assigned a slot, that slot number will not change
while the node remains a lockspace member.  If the node leaves
and rejoins it can be assigned a new slot number.

A new generation number is also added to a lockspace.  It is
set and incremented during each recovery along with the slot
collection/assignment.

The slot numbers will be passed to gfs2 which will use them as
journal id's.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-01-04 08:55:57 -06:00
David Teigland
f95a34c665 dlm: move recovery barrier calls
Put all the calls to recovery barriers in the same function
to clarify where they each happen.  Should not change any behavior.
Also modify some recovery debug lines to make them consistent.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-01-04 08:53:27 -06:00
Steve French
225de11e31 [CIFS] default ntlmv2 for cifs mount delayed to 3.3
Turned out the ntlmv2 (default security authentication)
upgrade was harder to test than expected, and we ran
out of time to test against Apple and a few other servers
that we wanted to.  Delay upgrade of default security
from ntlm to ntlmv2 (on mount) to 3.3.  Still works
fine to specify it explicitly via "sec=ntlmv2" so this
should be fine.

Acked-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-01-04 07:54:40 -06:00
Yongqiang Yang
4bac1f8cef ext4: add a new function which adds a flex group to a fs
This patch adds a new function named ext4_flex_group_add() which adds a
flex group to a fs.  The function is used by 64bit-resize interface.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-01-03 23:44:38 -05:00
Yongqiang Yang
3fbea4b368 ext4: add a new function which allocates bitmaps and inode tables
This patch adds a new function named ext4_allocates_group_table()
which allocates block bitmaps, inode bitmaps and inode tables for a
flex groups and is used by resize code.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-01-03 23:44:38 -05:00
Yongqiang Yang
c72df9f928 ext4: pass verify_reserved_gdb() the number of group decriptors
The 64bit resizer adds a flex group each time, so verify_reserved_gdb
can not use s_groups_count directly, it should use the number of group
decriptors before the added group.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-01-03 23:43:39 -05:00
Yongqiang Yang
2e10e2f2e5 ext4: add a function which updates the super block during online resizing
This patch adds a function named ext4_update_super() which updates
super block so the newly created block groups are visible to the file
system.  This code is copied from ext4_group_add().

The function will be used by new resize implementation.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-01-03 23:41:39 -05:00
Yongqiang Yang
083f5b24cc ext4: add a function which sets up a block group descriptors of a flex bg
This patch adds a function named ext4_setup_new_descs which sets up the
block group descriptors of a flex bg.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-01-03 23:37:31 -05:00
Yongqiang Yang
33afdcc540 ext4: add a function which sets up group blocks of a flex bg
This patch adds a function named setup_new_flex_group_blocks() which
sets up group blocks of a flex bg.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-01-03 23:32:52 -05:00
Yongqiang Yang
28c7bac009 ext4: add a structure which will be used by 64bit-resize interface
This patch adds a structure which will be used by 64bit-resize interface.
Two functions which allocate and destroy the structure respectively are
added.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-01-03 23:22:50 -05:00
Yongqiang Yang
bb08c1e7d8 ext4: add a function which adds a new group descriptors to a fs
This patch adds a function named ext4_add_new_descs() which adds one
or more new group descriptors to a fs and whose code is copied from
ext4_group_add().

The function will be used by new resize implementation.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-01-03 23:20:50 -05:00
Yongqiang Yang
18e3143848 ext4: add a function which extends a group without checking parameters
This patch added a function named ext4_group_extend_no_check() whose code
is copied from ext4_group_extend().  ext4_group_extend_no_check() assumes
the parameter is valid and has been checked by caller.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-01-03 23:18:50 -05:00
Al Viro
d10577a8d8 vfs: trim includes a bit
[folded fix for missing magic.h from Tetsuo Handa]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:13 -05:00
Al Viro
be08d6d260 switch mnt_namespace ->root to struct mount
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:13 -05:00
Al Viro
0226f4923f vfs: take /proc/*/mounts and friends to fs/proc_namespace.c
rationale: that stuff is far tighter bound to fs/namespace.c than to
the guts of procfs proper.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:13 -05:00
Al Viro
3a2393d71d vfs: opencode mntget() mnt_set_mountpoint()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:12 -05:00
Al Viro
909b0a88ef vfs: spread struct mount - remaining argument of next_mnt()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:12 -05:00
Al Viro
c63181e6b6 vfs: move fsnotify junk to struct mount
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:12 -05:00
Al Viro
52ba1621de vfs: move mnt_devname
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:11 -05:00
Al Viro
1a4eeaf2a8 vfs: move mnt_list to struct mount
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:11 -05:00
Al Viro
fc7be130c7 vfs: switch pnode.h macros to struct mount *
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:11 -05:00
Al Viro
863d684f94 vfs: move the rest of int fields to struct mount
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:10 -05:00
Al Viro
15169fe784 vfs: mnt_id/mnt_group_id moved
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:10 -05:00
Al Viro
143c8c91ce vfs: mnt_ns moved to struct mount
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:09 -05:00
Al Viro
900148dcac vfs: spread struct mount - mntput_no_expire
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:09 -05:00
Al Viro
95bc5f25c1 vfs: spread struct mount - do_add_mount and graft_tree
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:09 -05:00
Al Viro
6776db3d32 vfs: take mnt_share/mnt_slave/mnt_slave_list and mnt_expire to struct mount
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:08 -05:00
Al Viro
32301920f4 vfs: and now we can make ->mnt_master point to struct mount
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:08 -05:00
Al Viro
d10e8def07 vfs: take mnt_master to struct mount
make IS_MNT_SLAVE take struct mount * at the same time

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:08 -05:00
Al Viro
14cf1fa8f5 vfs: spread struct mount - remaining argument of mnt_set_mountpoint()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:07 -05:00
Al Viro
a8d56d8e4f vfs: spread struct mount - propagate_mnt()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:07 -05:00
Al Viro
c937135d98 vfs: spread struct mount - shared subtree iterators
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:07 -05:00
Al Viro
6fc7871fed vfs: spread struct mount - get_dominating_id / do_make_slave
next pile of horrors, similar to mnt_parent one; this time it's
mnt_master.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:06 -05:00
Al Viro
6b41d536f7 vfs: take mnt_child/mnt_mounts to struct mount
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:06 -05:00
Al Viro
68e8a9feab vfs: all counters taken to struct mount
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:06 -05:00
Al Viro
83adc75322 vfs: spread struct mount - work with counters
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:05 -05:00
Al Viro
a73324da7a vfs: move mnt_mountpoint to struct mount
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:05 -05:00
Al Viro
0714a53380 vfs: now it can be done - make mnt_parent point to struct mount
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:05 -05:00
Al Viro
3376f34fff vfs: mnt_parent moved to struct mount
the second victim...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:04 -05:00
Al Viro
643822b41e vfs: spread struct mount - is_path_reachable
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:04 -05:00
Al Viro
676da58df7 vfs: spread struct mount - mnt_has_parent
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:04 -05:00
Al Viro
1ab5973862 vfs: spread struct mount - do_umount/propagate_mount_busy
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:03 -05:00
Al Viro
44d964d609 vfs: spread struct mount mnt_set_mountpoint child argument
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:03 -05:00
Al Viro
87129cc0e3 vfs: spread struct mount - clone_mnt/copy_tree argument
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:03 -05:00
Al Viro
692afc312b vfs: spread struct mount - shrink_submounts/select_submounts
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:02 -05:00
Al Viro
761d5c38eb vfs: spread struct mount - umount_tree argument
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:02 -05:00
Al Viro
1b8e5564b9 vfs: the first spoils - mnt_hash moved
taken out of struct vfsmount into struct mount

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:02 -05:00
Al Viro
d5e50f74dd vfs: spread struct mount to remaining users of ->mnt_hash
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:01 -05:00
Al Viro
cb338d06e9 vfs: spread struct mount - clone_mnt/copy_tree result
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:01 -05:00
Al Viro
0f0afb1dcf vfs: spread struct mount - change_mnt_propagation/set_mnt_shared
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:01 -05:00
Al Viro
b105e270b4 vfs: spread struct mount - alloc_vfsmnt/free_vfsmnt/mnt_alloc_id/mnt_free_id
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:00 -05:00
Al Viro
cbbe362cd6 vfs: spread struct mount - tree_contains_unbindable
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:00 -05:00
Al Viro
0fb54e5056 vfs: spread struct mount - attach_recursive_mnt
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:57:00 -05:00
Al Viro
4b8b21f4fe vfs: spread struct mount - mount group id handling
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:56:59 -05:00
Al Viro
4b2619a571 vfs: spread struct mount - commit_tree
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:56:59 -05:00
Al Viro
419148da6e vfs: spread struct mount - attach_mnt/detach_mnt
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:56:59 -05:00
Al Viro
315fc83e56 vfs: spread struct mount - namespace.c internal iterators
next_mnt() return value, first argument
skip_mnt_tree() return value and argument

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:56:58 -05:00
Al Viro
61ef47b1e4 vfs: spread struct mount - __propagate_umount() argument
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:56:58 -05:00
Al Viro
c71053659e vfs: spread struct mount - __lookup_mnt() result
switch __lookup_mnt() to returning struct mount *; callers adjusted.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:56:58 -05:00
Al Viro
7d6fec45a5 vfs: start hiding vfsmount guts series
Almost all fields of struct vfsmount are used only by core VFS (and
a fairly small part of it, at that).  The plan: embed struct vfsmount
into struct mount, making the latter visible only to core parts of VFS.
Then move fields from vfsmount to mount, eventually leaving only
mnt_root/mnt_sb/mnt_flags in struct vfsmount.  Filesystem code still
gets pointers to struct vfsmount and remains unchanged; all such
pointers go to struct vfsmount embedded into the instances of struct
mount allocated by fs/namespace.c.  When fs/namespace.c et.al. get
a pointer to vfsmount, they turn it into pointer to mount (using
container_of) and work with that.

This is the first part of series; struct mount is introduced,
allocation switched to using it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:56:57 -05:00
Al Viro
a218d0fdc5 switch open and mkdir syscalls to umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:19 -05:00
Al Viro
5706b27dea ceph: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:16 -05:00
Al Viro
138d570de2 switch hostfs_iattr to explicit unsigned short
It's shared between kernel-compiled hostfs_kern and userland-compiled
hostfs_user (it's uml stuff).  Use explicit type instead of playing
silly buggers with mode_t.  It's not a userland API per se; it interacts
between code compiled with types same as for host kernel and, directly
linked to it, code talking to libc.  Both sides come from the same
kernel source...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:15 -05:00
Al Viro
f69aac0006 switch may_mknod() to umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:14 -05:00
Al Viro
49f0a07672 switch sys_chmod()/sys_fchmod()/sys_fchmodat() to umode_t
SYSCALLx magic should take care of things, according to Linus...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:12 -05:00
Al Viro
dba19c6064 get rid of open-coded S_ISREG(), etc.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:12 -05:00
Al Viro
8d334acdd2 switch is_sxid() to umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:11 -05:00
Al Viro
62bb109170 switch inode_init_owner() to umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:11 -05:00
Al Viro
175a4eb7ea fs: propagate umode_t, misc bits
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:10 -05:00
Al Viro
030a8ba48f autofs4: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:10 -05:00
Al Viro
c47da79851 hfsplus: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:10 -05:00
Al Viro
e021d7b7fd hfs: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:09 -05:00
Al Viro
5206efd62c cifs: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:09 -05:00
Al Viro
dacd0e7b39 fat: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:09 -05:00
Al Viro
d0c00d0671 ntfs: propagate umode_t
same story as with isofs and udf...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:08 -05:00
Al Viro
7328bdd6cf isofs: propagate umode_t
situation with mount options is the same as for udf

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:08 -05:00
Al Viro
faa17292fd udf: propagate umode_t
note re mount options: fmask and dmask are explicitly truncated to 12bit,
UDF_INVALID_MODE just needs to be guaranteed to differ from any such value.
And umask is used only in &= with umode_t, so we ignore other bits anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:08 -05:00
Al Viro
541af6a074 fuse: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:07 -05:00
Al Viro
632861f05a pohmelfs: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:07 -05:00
Al Viro
8817644611 logfs: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:06 -05:00
Al Viro
ad44be5c78 ubifs: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:06 -05:00
Al Viro
5eee25cacd ncpfs: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:05 -05:00
Al Viro
18df225242 hugetlbfs: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:05 -05:00
Al Viro
bef41c267e exofs: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:05 -05:00
Al Viro
c6e49e3f26 nilfs: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:04 -05:00
Al Viro
a760b03dc0 affs: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:04 -05:00
Al Viro
faef2b6c99 sysfs: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:03 -05:00
Al Viro
67697cbdcc ocfs2: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:02 -05:00
Al Viro
2b15ad0684 dlmfs: use inode_init_owner()
don't open-code it...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:01 -05:00
Al Viro
3eda0de677 9p: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:01 -05:00
Al Viro
587228be4a omfs: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:01 -05:00
Al Viro
8e0718924e reiserfs: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:00 -05:00
Al Viro
576b1d67ce xfs: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:00 -05:00
Al Viro
dd716e64d6 sysv: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:00 -05:00
Al Viro
6a9a06d9ca ufs: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:59 -05:00
Al Viro
4f45ba3d10 minix: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:59 -05:00
Al Viro
dcca3fec9f ext4: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:59 -05:00
Al Viro
69b34f3ab3 ext3: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:58 -05:00
Al Viro
3ea40bc947 ext2: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:58 -05:00
Al Viro
c2837de73e 9p: don't bother with unixmode2p9mode() for link() and symlink()
Pass perm to v9fs_vfs_mkspecial() instead of passing mode;
calculate in caller when done for mknod(), use known value for link()
and symlink().  As the result, we avoid a bit of work *and* stop
mixing mode_t with P9_DMLINK.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:58 -05:00
Al Viro
18cb1b08d2 kill ecryptfs_create_underlying_file()
it's a just a wrapper for vfs_create()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:57 -05:00
Al Viro
439475140b configfs: convert to umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:57 -05:00
Al Viro
f4ae40a6a5 switch debugfs to umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:56 -05:00
Al Viro
48176a973d switch sysfs_chmod_file() to umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:56 -05:00
Al Viro
d161a13f97 switch procfs to umode_t use
both proc_dir_entry ->mode and populating functions

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:56 -05:00
Al Viro
587a1f1659 switch ->is_visible() to returning umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:55 -05:00
Al Viro
7d54fa6472 hugetlbfs: switch to inode_init_owner()
... rather than open-coding it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:54 -05:00
Al Viro
1a67aafb5f switch ->mknod() to umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:54 -05:00
Al Viro
4acdaf27eb switch ->create() to umode_t
vfs_create() ignores everything outside of 16bit subset of its
mode argument; switching it to umode_t is obviously equivalent
and it's the only caller of the method

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:53 -05:00
Al Viro
18bb1db3e7 switch vfs_mkdir() and ->mkdir() to umode_t
vfs_mkdir() gets int, but immediately drops everything that might not
fit into umode_t and that's the only caller of ->mkdir()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:53 -05:00
Al Viro
8208a22bb8 switch sys_mknodat(2) to umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:52 -05:00
Al Viro
ff01bb4832 fs: move code out of buffer.c
Move invalidate_bdev, block_sync_page into fs/block_dev.c.  Export
kill_bdev as well, so brd doesn't have to open code it.  Reduce
buffer_head.h requirement accordingly.

Removed a rather large comment from invalidate_bdev, as it looked a bit
obsolete to bother moving.  The small comment replacing it says enough.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:07 -05:00
Al Viro
9be96f3fd1 move fs/partitions to block/
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:06 -05:00
Al Viro
dabe0dc194 vfs: fix the rest of sget() races
unfortunately, just checking MS_BORN after having grabbed ->s_umount in
sget() is not enough; places that pick superblock from a list and
grab s_umount shared need the same check in addition to checking for
->s_root; otherwise three-way race between failing mount, sget() and
such list-walker can leave us with list-walker coming *second*, when
temporary active ref grabbed by sget() (to be dropped when sget()
notices that original mount has failed by checking MS_BORN) has
lead to deactivate_locked_super() from failing ->mount() *not* doing
->kill_sb() and just releasing ->s_umount.  Once sget() gets through
and notices that MS_BORN had never been set it will drop the active
ref and fs will be shut down and kicked out of all lists, but it's
too late for something like sync_supers().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:53:10 -05:00
Al Viro
cf31e70d6c vfs: new helper - vfs_ustat()
... and bury user_get_super()/statfs_by_dentry() - they are
purely internal now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:53:07 -05:00
Al Viro
c972b4bc83 vfs: live vfsmounts never have NULL ->mnt_sb
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:42 -05:00
Al Viro
4c1d5a64f1 vfs: for usbfs, etc. internal vfsmounts ->mnt_sb->s_root == ->mnt_root
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:41 -05:00
Al Viro
84b92d39f9 vfs: pipe.c is really non-modular
... so no exitcalls there.  Not much would work if pipe(2) would stop
working, after all...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:41 -05:00
Al Viro
6b520e0565 vfs: fix the stupidity with i_dentry in inode destructors
Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
the cost of taking it into inode_init_always() will be negligible for pipes
and sockets and negative for everything else.  Not to mention the removal of
boilerplate code from ->destroy_inode() instances...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:40 -05:00
Al Viro
2a79f17e4a vfs: mnt_drop_write_file()
new helper (wrapper around mnt_drop_write()) to be used in pair with
mnt_want_write_file().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:40 -05:00
Al Viro
8c9379e972 constify seq_file stuff
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:40 -05:00
Al Viro
79e801a906 vfs: make do_kern_mount() static
the only user outside of fs/namespace.c has died

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:39 -05:00
Al Viro
a5166169f9 vfs: convert fs_supers to hlist
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:39 -05:00
Al Viro
5352d3b65a make nfs_follow_remote_path() handle ERR_PTR() passed as root_mnt
... rather than duplicating that in callers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:39 -05:00
Al Viro
5ffc2836a2 vfs: kill ->mnt_devname use in afs printks
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:38 -05:00
Al Viro
e407699ef5 btrfs, nfs, apparmor: don't pull mnt_namespace.h for no reason...
it's not needed anymore; we used to, back when we had to do
mount_subtree() by hand, complete with put_mnt_ns() in it.
No more...  Apparmor didn't need it since the __d_path() fix.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:38 -05:00
Al Viro
aa0a4cf0ab vfs: dentry_reset_mounted() doesn't use vfsmount argument
lose it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:37 -05:00
Al Viro
6c449c8dfe unexport put_mnt_ns(), make create_mnt_ns() static outright
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:37 -05:00
Al Viro
aafd08dad0 vfs: add missing parens in pnode.h macros
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:37 -05:00
Al Viro
afac7cba7e vfs: more mnt_parent cleanups
a) mount --move is checking that ->mnt_parent is non-NULL before
looking if that parent happens to be shared; ->mnt_parent is never
NULL and it's not even an misspelled !mnt_has_parent()

b) pivot_root open-codes is_path_reachable(), poorly.

c) so does path_is_under(), while we are at it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:36 -05:00
Al Viro
b2dba1af3c vfs: new internal helper: mnt_has_parent(mnt)
vfsmounts have ->mnt_parent pointing either to a different vfsmount
or to itself; it's never NULL and termination condition in loops
traversing the tree towards root is mnt == mnt->mnt_parent.  At least
one place (see the next patch) is confused about what's going on;
let's add an explicit helper checking it right way and use it in
all places where we need it.  Not that there had been too many,
but...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:36 -05:00
Al Viro
aa9c0e07bb vfs: kill pointless helpers in namespace.c
mnt_{inc,dec}_count() is not cleaner than doing the corresponding
mnt_add_count() directly and mnt_set_count() is not used at all.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:36 -05:00
Al Viro
bad0dcffc2 new helpers: fh_{want,drop}_write()
A bunch of places in nfsd does mnt_{want,drop}_write on vfsmount of
export of given fhandle.  Switched to obvious inlined helpers...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:35 -05:00
Al Viro
a561be7100 switch a bunch of places to mnt_want_write_file()
it's both faster (in case when file has been opened for write) and cleaner.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:35 -05:00
Al Viro
f47ec3f283 trim fs/internal.h
some stuff in there can actually become static; some belongs to pnode.h
as it's a private interface between namespace.c and pnode.c...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:35 -05:00
Al Viro
5ede7b1cfa pull manipulations of rpc_cred inside alloc_nfs_open_context()
No need to duplicate them in both callers; make it return
ERR_PTR(-ENOMEM) on allocation failure instead of NULL and
it'll be able to report rpc_lookup_cred() failures just
fine.  Callers are much happier that way...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:34 -05:00
Jeff Layton
497728e11a cifs: fix bad buffer length check in coalesce_t2
The current check looks to see if the RFC1002 length is larger than
CIFSMaxBufSize, and fails if it is. The buffer is actually larger than
that by MAX_CIFS_HDR_SIZE.

This bug has been around for a long time, but the fact that we used to
cap the clients MaxBufferSize at the same level as the server tended
to paper over it. Commit c974befa changed that however and caused this
bug to bite in more cases.

Reported-and-Tested-by: Konstantinos Skarlatos <k.skarlatos@gmail.com>
Tested-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-01-03 20:34:17 -06:00
Heiko Carstens
3b85e4ab2e debugfs: add missing #ifdef HAS_IOMEM
"debugfs: add tools to printk 32-bit registers" adds new functions which rely
on IOMEM functionality which is not present on all architectures and therefore
result in compile errors:

fs/debugfs/file.c: In function 'debugfs_print_regs32':
fs/debugfs/file.c:561:7: error: implicit declaration of function 'readl' [-Werror=implicit-function-declaration]

Add an #ifdef CONFIG_HAS_IOMEM to fix this

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Alessandro Rubini <rubini@gnudd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-03 16:45:16 -08:00
Dave Chinner
b1c770c273 xfs: fix endian conversion issue in discard code
When finding the longest extent in an AG, we read the value directly
out of the AGF buffer without endian conversion. This will give an
incorrect length, resulting in FITRIM operations potentially not
trimming everything that it should.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-03 11:39:55 -06:00
Phillip Lougher
3d4a1c80c4 Squashfs: fix i_blocks calculation with extended regular files
The le64_to_cpu() forces the calculation to be unsigned, with
the effect that it can underflow leading to an incorrect large
value.

This bug only triggers in rare(ish) circumstances, an empty file
encoded as an extended regular file or a completely sparse file.
Normally empty files are encoded as a regular file rather than as
an extended regular file (and the regular file i_blocks calculation
doesn't have this bug).  To save space regular file inodes are
optimised to encode the most commonly occurring files.  Less
common regular files are encoded using extended regular file inodes
which contain extra information.

Empty files with nlinks greater than 1, and or empty files
with extended attributes are encoded using extended regular file
inodes and they will hit this bug.

Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
2012-01-03 04:00:43 +00:00
David S. Miller
455ffa607f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-01-02 18:56:49 -05:00
J. Bruce Fields
aec39680b0 nfsd4: fix spurious 4.1 post-reboot failures
In the NFSv4.1 case, this could cause a spurious "NFSD: failed to write
recovery record (err -17); please check that /var/lib/nfs/v4recovery
exists and is writable.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reported-by: Steve Dickson <SteveD@redhat.com>
2012-01-02 17:32:59 -05:00
Phillip Lougher
cc37f75a9f Squashfs: fix mount time sanity check for corrupted superblock
A Squashfs filesystem containing nothing but an empty directory,
although unusual and ultimately pointless, is still valid.

The directory_table >= next_table sanity check rejects these
filesystems as invalid because the directory_table is empty and
equal to next_table.

Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
2012-01-02 17:47:14 +00:00
Mauro Carvalho Chehab
e97a5d893f [media] fs/compat_ioctl: it needs to see the DVBv3 compat stuff
Only the ioctl core should see the DVBv3 compat stuff, as its
contents are not available anymore to the drivers.

As fs/compat_ioctl also handles DVBv3 ioctl's, it needs those
definitions:

    fs/compat_ioctl.c:1345: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_parameters’
    fs/compat_ioctl.c:1345: error: array type has incomplete element type
    fs/compat_ioctl.c:1345: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_parameters’
    fs/compat_ioctl.c:1345: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_parameters’
    fs/compat_ioctl.c:1345: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_parameters’
    fs/compat_ioctl.c:1345: error: array type has incomplete element type
    fs/compat_ioctl.c:1345: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_parameters’
    fs/compat_ioctl.c:1345: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_parameters’
    fs/compat_ioctl.c:1345: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_parameters’
    fs/compat_ioctl.c:1345: error: array type has incomplete element type
    fs/compat_ioctl.c:1345: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_parameters’
    fs/compat_ioctl.c:1345: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_parameters’
    fs/compat_ioctl.c:1345: error: initializer element is not constant
    fs/compat_ioctl.c:1345: error: (near initialization for ‘ioctl_pointer[462]’)
    fs/compat_ioctl.c:1346: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_parameters’
    fs/compat_ioctl.c:1346: error: array type has incomplete element type
    fs/compat_ioctl.c:1346: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_parameters’
    fs/compat_ioctl.c:1346: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_parameters’
    fs/compat_ioctl.c:1346: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_parameters’
    fs/compat_ioctl.c:1346: error: array type has incomplete element type
    fs/compat_ioctl.c:1346: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_parameters’
    fs/compat_ioctl.c:1346: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_parameters’
    fs/compat_ioctl.c:1346: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_parameters’
    fs/compat_ioctl.c:1346: error: array type has incomplete element type
    fs/compat_ioctl.c:1346: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_parameters’
    fs/compat_ioctl.c:1346: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_parameters’
    fs/compat_ioctl.c:1346: error: initializer element is not constant
    fs/compat_ioctl.c:1346: error: (near initialization for ‘ioctl_pointer[463]’)
    fs/compat_ioctl.c:1347: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_event’
    fs/compat_ioctl.c:1347: error: array type has incomplete element type
    fs/compat_ioctl.c:1347: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_event’
    fs/compat_ioctl.c:1347: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_event’
    fs/compat_ioctl.c:1347: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_event’
    fs/compat_ioctl.c:1347: error: array type has incomplete element type
    fs/compat_ioctl.c:1347: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_event’
    fs/compat_ioctl.c:1347: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_event’
    fs/compat_ioctl.c:1347: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_event’
    fs/compat_ioctl.c:1347: error: array type has incomplete element type
    fs/compat_ioctl.c:1347: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_event’
    fs/compat_ioctl.c:1347: error: invalid application of ‘sizeof’ to incomplete type ‘struct dvb_frontend_event’
    fs/compat_ioctl.c:1347: error: initializer element is not constant
    fs/compat_ioctl.c:1347: error: (near initialization for ‘ioctl_pointer[464]’)

Reported-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-12-31 16:57:27 -02:00
Linus Torvalds
d65616a92c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: disable use of dcache for readdir etc.
2011-12-30 13:34:22 -08:00
David S. Miller
7f8e3234c5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2011-12-30 13:04:14 -05:00
Ajeet Yadav
d7fbd89338 Squashfs: optimise squashfs_cache_get entry search
squashfs_cache_get() iterates over all entries to search for
 block its looking for. Often get() / put() are called for
 same block.

If we cache the current entry index, then we can optimise the
subsequent *_get() calls.

Signed-off-by: Ajeet Yadav <ajeet.yadav.77@gmail.com>
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
2011-12-30 01:24:13 +00:00
Phillip Lougher
e552a59668 Squashfs: add missing block release on error condition
squashfs_read_metadata forgets to release the cache block if
an error has occurred.

Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
2011-12-30 01:20:14 +00:00
Linus Torvalds
d2bac6ab93 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: log all dirty inodes in xfs_fs_sync_fs
  xfs: log the inode in ->write_inode calls for kupdate
2011-12-29 17:05:45 -08:00
Andreas Schwab
34845636a1 procfs: do not confuse jiffies with cputime64_t
Commit 2a95ea6c0d ("procfs: do not overflow get_{idle,iowait}_time
for nohz") did not take into account that one some architectures jiffies
and cputime use different units.

This causes get_idle_time() to return numbers in the wrong units, making
the idle time fields in /proc/stat wrong.

Instead of converting the usec value returned by
get_cpu_{idle,iowait}_time_us to units of jiffies, use the new function
usecs_to_cputime64 to convert it to the correct unit of cputime64_t.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Artem S. Tashkinov" <t.artem@mailcity.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-29 16:31:57 -08:00
Sage Weil
a4d46363ce ceph: disable use of dcache for readdir etc.
Ceph attempts to use the dcache to satisfy negative lookups and readdir
when the entire directory contents are in cache.  Disable this behavior
until lingering bugs in this code are shaken out; we'll re-enable these
hooks once things are fully stable.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-12-29 08:05:14 -08:00
Akinobu Mita
597d508c17 ext4: use proper little-endian bitops
ext4_{set,clear}_bit() is defined as __test_and_{set,clear}_bit_le() for
ext4.  Only two ext4_{set,clear}_bit() calls check the return value.  The
rest of calls ignore the return value and they can be replaced with
__{set,clear}_bit_le().

This changes ext4_{set,clear}_bit() from __test_and_{set,clear}_bit_le()
to __{set,clear}_bit_le() and introduces ext4_test_and_{set,clear}_bit()
for the two places where old bit needs to be returned.

This ext4_{set,clear}_bit() change is considered safe, because if someone
uses these macros without noticing the change, new ext4_{set,clear}_bit
don't have return value and causes compiler errors where the return value
is used.

This also removes unused ext4_find_first_zero_bit().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-12-28 20:32:07 -05:00
Zheng Liu
ccb4d7af91 ext4: remove no longer used functions in inode.c
The functions ext4_block_truncate_page() and ext4_block_zero_page_range()
are no longer used, so remove them.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-12-28 20:25:40 -05:00
Theodore Ts'o
14c83c9fdd ext4: avoid counting the number of free inodes twice in find_group_orlov()
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-12-28 20:25:13 -05:00
Zheng Liu
88635ca277 ext4: add missing spaces to debugging printk's
Fix ext4_debug format in ext4_ext_handle_uninitialized_extents() and
ext4_end_io_dio().

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-12-28 19:00:25 -05:00
Yongqiang Yang
1ba37268cd jbd2: clear revoked flag on buffers before a new transaction started
Currently, we clear revoked flag only when a block is reused.  However,
this can tigger a false journal error.  Consider a situation when a block
is used as a meta block and is deleted(revoked) in ordered mode, then the
block is allocated as a data block to a file.  At this moment, user changes
the file's journal mode from ordered to journaled and truncates the file.
The block will be considered re-revoked by journal because it has revoked
flag still pending from the last transaction and an assertion triggers.

We fix the problem by keeping the revoked status more uptodate - we clear
revoked flag when switching revoke tables to reflect there is no revoked
buffers in current transaction any more.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-12-28 17:46:46 -05:00
Yongqiang Yang
5872ddaaf0 ext4: flush journal when switching from data=journal mode
It's necessary to flush the journal when switching away from
data=journal mode.  This is because there are no revoke records when
data blocks are journalled, but revoke records are required in the
other journal modes.

However, it is not necessary to flush the journal when switching into
data=journal mode, and flushing the journal is expensive.  So let's
avoid it in that case.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-12-28 13:55:51 -05:00
Yongqiang Yang
2aff57b0c0 ext4: allocate delalloc blocks before changing journal mode
delalloc blocks should be allocated before changing journal mode,
otherwise they can not be allocated and even more truncate on
delalloc blocks could triggre BUG by flushing delalloc buffers.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-12-28 12:02:13 -05:00
Linus Torvalds
6d4b9e38d3 vfs: fix handling of lock allocation failure in lease-break case
Bruce Fields notes that commit 778fc546f7 ("locks: fix tracking of
inprogress lease breaks") introduced a possible error pointer
dereference on failure to allocate memory.  locks_conflict() will
dereference the passed-in new lease lock structure that may be an error pointer.

This means an open (without O_NONBLOCK set) on a file with a lease
applied (generally only done when Samba or nfsd (with v4) is running)
could crash if a kmalloc() fails.

So instead of playing games with IS_ERROR() all over the place, just
check the allocation failure early.  That makes the code more
straightforward, and avoids this possible bad pointer dereference.

Based-on-patch-by: J. Bruce Fields <bfields@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-26 10:25:26 -08:00
Rafael J. Wysocki
b7ba68c4a0 Merge branch 'pm-sleep' into pm-for-linus
* pm-sleep: (51 commits)
  PM: Drop generic_subsys_pm_ops
  PM / Sleep: Remove forward-only callbacks from AMBA bus type
  PM / Sleep: Remove forward-only callbacks from platform bus type
  PM: Run the driver callback directly if the subsystem one is not there
  PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers
  PM / Sleep: Merge internal functions in generic_ops.c
  PM / Sleep: Simplify generic system suspend callbacks
  PM / Hibernate: Remove deprecated hibernation snapshot ioctls
  PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()
  PM / Sleep: Recommend [un]lock_system_sleep() over using pm_mutex directly
  PM / Sleep: Replace mutex_[un]lock(&pm_mutex) with [un]lock_system_sleep()
  PM / Sleep: Make [un]lock_system_sleep() generic
  PM / Sleep: Use the freezer_count() functions in [un]lock_system_sleep() APIs
  PM / Freezer: Remove the "userspace only" constraint from freezer[_do_not]_count()
  PM / Hibernate: Replace unintuitive 'if' condition in kernel/power/user.c with 'else'
  Freezer / sunrpc / NFS: don't allow TASK_KILLABLE sleeps to block the freezer
  PM / Sleep: Unify diagnostic messages from device suspend/resume
  ACPI / PM: Do not save/restore NVS on Asus K54C/K54HR
  PM / Hibernate: Remove deprecated hibernation test modes
  PM / Hibernate: Thaw processes in SNAPSHOT_CREATE_IMAGE ioctl test path
  ...

Conflicts:
	kernel/kmod.c
2011-12-25 23:42:20 +01:00
Linus Torvalds
6d451c578c for linus: writeback reason binary tracing format fix
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJO9EbVAAoJECvKgwp+S8JaUG0P/RDICTvG5b6/YD1wwh4cHBTF
 xu4av5o+Okablr282vLt1d9N4nLP6A4Jp2XOxNoLdyUVMtwRNCMjO62vcBetKmqU
 9GJTKh3H72/amqNrfvf9E0Fl3rOv2U71x7k4KTwKVdUvITXEL/U0Vsl8a9WVNUZ0
 mZERzf0vrOCSN6gEzh4iNzMuZpKRSnNNP4iUilkwcD9cXPk85hFCNZx/nyMhKtcF
 9XzhSJgg1wJAwmBc9bdhkEm7jKYvxmslb4nMdQHoQNDGpEjwRbS7jQ/iHuD2AhPH
 DFTQ8LOhxxaTOiDjHJav0z/FRw+q6ZYbrkbLVt2qTOxfMxvHJdlfu7vTglq4PK9n
 Bo02K9zZisCM76uCUTHcp1aMjzU9tsx9tYipBz8YXNPoEuhYn/1F3tbt7FkCGBck
 wwTCe/J0+IKHWiXSAkZMj5PiSeMwliMpF7INdkLExkinwNu719dS6pTZDs/o8CMD
 M/0/M8jYnWOmylYDAbhKyEzAAHbAm0YGuUG7IVGP0H5YJucfmRGJzQMNaBTUUsP7
 pXdFA02rUTodCrSHqXscmA0Lb9ypsFnmAYMbb+YF5UNOW9zcQ9b2J23wmna7prIv
 FNKVAgDEjWk/SpN0mG3zZk7ixUagkbo9DfalZCBZsveZPktq1KZor1KaOIFzkUuB
 DUdtr4+GjhfDqFWywZ9+
 =dOhj
 -----END PGP SIGNATURE-----

Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux

for linus: writeback reason binary tracing format fix

* tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: show writeback reason with __print_symbolic
2011-12-23 20:25:36 -08:00
Linus Torvalds
827fa4c762 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: call d_instantiate after all ops are setup
  Btrfs: fix worker lock misuse in find_worker
2011-12-23 14:58:39 -08:00
Christoph Hellwig
be4f1ac828 xfs: log all dirty inodes in xfs_fs_sync_fs
Since Linux 2.6.36 the writeback code has introduces various measures for
live lock prevention during sync().  Unfortunately some of these are
actively harmful for the XFS model, where the inode gets marked dirty for
metadata from the data I/O handler.

The older_than_this checks that are now more strictly enforced since

    writeback: avoid livelocking WB_SYNC_ALL writeback

by only calling into __writeback_inodes_sb and thus only sampling the
current cut off time once.  But on a slow enough devices the previous
asynchronous sync pass might not have fully completed yet, and thus XFS
might mark metadata dirty only after that sampling of the cut off time for
the blocking pass already happened.  I have not myself reproduced this
myself on a real system, but by introducing artificial delay into the
XFS I/O completion workqueues it can be reproduced easily.

Fix this by iterating over all XFS inodes in ->sync_fs and log all that
are dirty.  This might log inode that only got redirtied after the
previous pass, but given how cheap delayed logging of inodes is it
isn't a major concern for performance.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-23 16:41:47 -06:00
Christoph Hellwig
0b8fd3033c xfs: log the inode in ->write_inode calls for kupdate
If the writeback code writes back an inode because it has expired we currently
use the non-blockin ->write_inode path.  This means any inode that is pinned
is skipped.  With delayed logging and a workload that has very little log
traffic otherwise it is very likely that an inode that gets constantly
written to is always pinned, and thus we keep refusing to write it.  The VM
writeback code at that point redirties it and doesn't try to write it again
for another 30 seconds.  This means under certain scenarious time based
metadata writeback never happens.

Fix this by calling into xfs_log_inode for kupdate in addition to data
integrity syncs, and thus transfer the inode to the log ASAP.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-23 16:41:47 -06:00
David S. Miller
abb434cb05 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/bluetooth/l2cap_core.c

Just two overlapping changes, one added an initialization of
a local variable, and another change added a new local variable.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-23 17:13:56 -05:00
Chris Mason
c1111b1fce Merge branch 'integration' into for-linus 2011-12-23 11:51:00 -05:00
Al Viro
08c422c27f Btrfs: call d_instantiate after all ops are setup
This closes races where btrfs is calling d_instantiate too soon during
inode creation.  All of the callers of btrfs_add_nondir are updated to
instantiate after the inode is fully setup in memory.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-23 08:02:26 -05:00
Chris Mason
8d532b2afb Btrfs: fix worker lock misuse in find_worker
Dan Carpenter noticed that we were doing a double unlock on the worker
lock, and sometimes picking a worker thread without the lock held.

This fixes both errors.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
2011-12-23 07:53:00 -05:00
Arne Jansen
eebe063b7f Btrfs: always save ref_root in delayed refs
For consistent backref walking and (later) qgroup calculation the
information to which root a delayed ref belongs is useful even for shared
refs.

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-12-22 16:22:28 +01:00
Arne Jansen
66d7e7f09f Btrfs: mark delayed refs as for cow
Add a for_cow parameter to add_delayed_*_ref and pass the appropriate value
from every call site. The for_cow parameter will later on be used to
determine if a ref will change anything with respect to qgroups.

Delayed refs coming from relocation are always counted as for_cow, as they
don't change subvol quota.

Also pass in the fs_info for later use.

btrfs_find_all_roots() will use this as an optimization, as changes that are
for_cow will not change anything with respect to which root points to a
certain leaf. Thus, we don't need to add the current sequence number to
those delayed refs.

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-12-22 16:22:27 +01:00
Jan Schmidt
c7d22a3c3c Btrfs: added helper btrfs_next_item()
btrfs_next_item() makes the btrfs path point to the next item, crossing leaf
boundaries if needed.

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-12-22 16:22:26 +01:00
Arne Jansen
da5c813564 Btrfs: generic data structure to build unique lists
ulist is a generic data structures to hold a collection of unique u64
values. The only operations it supports is adding to the list and
enumerating it.

It is possible to store an auxiliary value along with the key. The
implementation is preliminary and can probably be sped up significantly.

It is used by btrfs_find_all_roots() quota to translate recursions into
iterative loops.

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-12-22 16:22:24 +01:00
Rafael J. Wysocki
b00f4dc5ff Merge branch 'master' into pm-sleep
* master: (848 commits)
  SELinux: Fix RCU deref check warning in sel_netport_insert()
  binary_sysctl(): fix memory leak
  mm/vmalloc.c: remove static declaration of va from __get_vm_area_node
  ipmi_watchdog: restore settings when BMC reset
  oom: fix integer overflow of points in oom_badness
  memcg: keep root group unchanged if creation fails
  nilfs2: potential integer overflow in nilfs_ioctl_clean_segments()
  nilfs2: unbreak compat ioctl
  cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask
  evm: prevent racing during tfm allocation
  evm: key must be set once during initialization
  mmc: vub300: fix type of firmware_rom_wait_states module parameter
  Revert "mmc: enable runtime PM by default"
  mmc: sdhci: remove "state" argument from sdhci_suspend_host
  x86, dumpstack: Fix code bytes breakage due to missing KERN_CONT
  IB/qib: Correct sense on freectxts increment and decrement
  RDMA/cma: Verify private data length
  cgroups: fix a css_set not found bug in cgroup_attach_proc
  oprofile: Fix uninitialized memory access when writing to writing to oprofilefs
  Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"
  ...

Conflicts:
	kernel/cgroup_freezer.c
2011-12-21 21:59:45 +01:00
Theodore Ts'o
22cdfca564 ext4: remove unneeded file_remove_suid() from ext4_ioctl()
In the code to support EXT4_IOC_MOVE_EXT, ext4_ioctl calls
file_remove_suid() after the call to ext4_move_extents() if any
extents has been moved.  There are at least three things wrong with
this.  First, file_remove_suid() should be called with i_mutex down,
which is not here.  Second, it should be called before the donor file
has been modified, to avoid a potential race condition.  Third, and
most importantly, it's pointless, because ext4_file_extents() already
checks if the donor file has the setuid or setgid bit set, and will
return an error in that case.  So the first two objections don't
really matter, since file_remove_suid() will never need to modify the
inode in any case.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-12-21 14:14:31 -05:00
Stefan Behrens
21adbd5cbb Btrfs: integrate integrity check module into btrfs
This is the last part of the patch series. It modifies the btrfs
code to use the integrity check module if configured to do so
with the define BTRFS_FS_CHECK_INTEGRITY. If this define is not set,
the only effective change is that code is added that handles the
mount option to activate the integrity check. If the mount option is
set and the define BTRFS_FS_CHECK_INTEGRITY is not set, that code
complains in the log and the mount fails with EINVAL.

Add the mount option to activate the usage of the integrity check
code.
Add invocation of btrfs integrity check code init and cleanup
function on mount and umount, respectively.
Add hook to call btrfs integrity check code version of
submit_bh/submit_bio.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2011-12-21 19:14:17 +01:00
Stefan Behrens
f11e4d7f53 Btrfs: Makefile changes to optionally include btrfs integrity check
If the btrfs integrity check is enabled, the files required to
implement the checks are included in the build.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2011-12-21 19:14:16 +01:00
Stefan Behrens
c975dd469d Btrfs: add config option to enable btrfs integrity check
Added the BTRFS_FS_CHECK_INTEGRITY option to Kconfig. It depends on
BTRFS_FS.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2011-12-21 19:14:16 +01:00
Stefan Behrens
5db0276014 Btrfs: add optional integrity check code
The two files added in this patch contain all the code that is
required to implement the integrity checks.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2011-12-21 19:14:09 +01:00
Linus Torvalds
822a5d3131 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Fix a regression in nfs_file_llseek()
  NFSv4: Do not accept delegated opens when a delegation recall is in effect
  NFSv4: Ensure correct locking when accessing the 'lock_states' list
  NFSv4.1: Ensure that we handle _all_ SEQUENCE status bits.
  NFSv4: Don't error if we handled it in nfs4_recovery_handle_error
  SUNRPC: Ensure we always bump the backlog queue in xprt_free_slot
  SUNRPC: Fix the execution time statistics in the face of RPC restarts
2011-12-20 11:31:56 -08:00
Haogang Chen
481fe17e97 nilfs2: potential integer overflow in nilfs_ioctl_clean_segments()
There is a potential integer overflow in nilfs_ioctl_clean_segments().
When a large argv[n].v_nmembs is passed from the userspace, the subsequent
call to vmalloc() will allocate a buffer smaller than expected, which
leads to out-of-bound access in nilfs_ioctl_move_blocks() and
lfs_clean_segments().

The following check does not prevent the overflow because nsegs is also
controlled by the userspace and could be very large.

		if (argv[n].v_nmembs > nsegs * nilfs->ns_blocks_per_segment)
			goto out_free;

This patch clamps argv[n].v_nmembs to UINT_MAX / argv[n].v_size, and
returns -EINVAL when overflow.

Signed-off-by: Haogang Chen <haogangchen@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-20 10:25:04 -08:00
Thomas Meyer
695c60f21c nilfs2: unbreak compat ioctl
commit 828b1c50ae ("nilfs2: add compat ioctl") incidentally broke all
other NILFS compat ioctls.  Make them work again.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org> [3.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-20 10:25:04 -08:00
Christoph Hellwig
40d344ec5e xfs: mark the xfssyncd workqueue as non-reentrant
On a system with lots of memory pressure that is stuck on synchronous inode
reclaim the workqueue code will run one instance of the inode reclaim work
item on every CPU. which is not what we want.  Make sure to mark the
xfssyncd workqueue as non-reentrant to make sure there only is one instace
of each running globally.  Also stop using special paramater for the
workqueue; now that we guarantee each fs has only running one of each works
at a time there is no need to artificially lower max_active and compensate
for that by setting the WQ_CPU_INTENSIVE flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-19 22:17:01 -06:00
Martin Schwidefsky
612ef28a04 Merge branch 'sched/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into cputime-tip
Conflicts:
	drivers/cpufreq/cpufreq_conservative.c
	drivers/cpufreq/cpufreq_ondemand.c
	drivers/macintosh/rack-meter.c
	fs/proc/stat.c
	fs/proc/uptime.c
	kernel/sched/core.c
2011-12-19 19:23:15 +01:00
Robin Dong
8c48f7e88e ext4: optimize ext4_find_delalloc_range() in nodelalloc mode
We found performance regression when using bigalloc with "nodelalloc"
(1MB cluster size):

1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda
2. mount -o nodelalloc /dev/sda /test/
3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024

The "dd" will cost about 2 seconds to finish, but if we mke2fs without
"bigalloc", "dd" will only cost less than 1 second.

The reason is: when using ext4 with "nodelalloc", it will call
ext4_find_delalloc_cluster() nearly everytime it call
ext4_ext_map_blocks(), and ext4_find_delalloc_range() will also scan
all pages in cluster because no buffer is "delayed".  A cluster has
256 pages (1MB cluster), so it will scan 256 * 256k pags when creating
a 1G file. That severely hurts the performance.

Therefore, we return immediately from ext4_find_delalloc_range() in
nodelalloc mode, since by definition there can't be any delalloc
pages.

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-12-18 23:05:43 -05:00
Curt Wohlgemuth
14d7f3efe9 ext4: remove unused local variable
In get_implied_cluster_alloc(), rr_cluster_end was being
defined and set, but was never used.  Removed this.

Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-12-18 17:39:02 -05:00
Jan Kara
acd6ad8351 ext4: fix error handling on inode bitmap corruption
When insert_inode_locked() fails in ext4_new_inode() it most likely means inode
bitmap got corrupted and we allocated again inode which is already in use. Also
doing unlock_new_inode() during error recovery is wrong since the inode does
not have I_NEW set. Fix the problem by jumping to fail: (instead of fail_drop:)
which declares filesystem error and does not call unlock_new_inode().

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-12-18 17:37:02 -05:00
Zheng Liu
5635a62b83 ext4: add missing space to ext4_msg output in ext4_fill_super()
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-12-18 16:13:58 -05:00
Yongqiang Yang
60e07cf515 ext4: do not reference pa_inode from group_pa
pa_inode in group_pa is set NULL in ext4_mb_new_group_pa, so
pa_inode should be not referenced.

Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-12-18 15:49:54 -05:00
Wu Fengguang
32c7f202a4 btrfs: fix dirtied pages accounting on sub-page writes
When doing 1KB sequential writes to the same page,
balance_dirty_pages_ratelimited_nr() should be called once instead of 4
times, the latter makes the dirtier tasks be throttled much too heavy.

Fix it with proper de-accounting on clear_page_dirty_for_io().

CC: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-12-18 14:20:25 +08:00
Jan Kara
1bc36b6426 writeback: Include all dirty inodes in background writeback
Current livelock avoidance code makes background work to include only inodes
that were dirtied before background writeback has started. However background
writeback can be running for a long time and thus excluding newly dirtied
inodes can eventually exclude significant portion of dirty inodes making
background writeback inefficient. Since background writeback avoids livelocking
the flusher thread by yielding to any other work, there is no real reason why
background work should not include all dirty inodes so change the logic in
wb_writeback().

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-12-18 14:20:18 +08:00
Wu Fengguang
b3bba872dd writeback: show writeback reason with __print_symbolic
This makes the binary trace understandable by trace-cmd.

CC: Dave Chinner <david@fromorbit.com>
CC: Curt Wohlgemuth <curtw@google.com>
CC: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-12-18 14:20:17 +08:00
Christoph Hellwig
28fb588c9b xfs: simplify xfs_qm_detach_gdquots
There is no reason to drop qi_dqlist_lock around calls to xfs_qm_dqrele
because the free list lock now nests inside qi_dqlist_lock and the
dquot lock.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-16 15:33:30 -06:00
Xi Wang
093019cf1b xfs: fix acl count validation in xfs_acl_from_disk()
Commit fa8b18ed didn't prevent the integer overflow and possible
memory corruption.  "count" can go negative and bypass the check.

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-16 15:17:42 -06:00
Eric Sandeen
687d1c5e8e xfs: remove unused XBT_FORCE_SLEEP bit
XBT_FORCE_SLEEP is no longer ever tested; it is only set
and cleared.  Remove it.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-16 15:12:33 -06:00
Linus Torvalds
c9a7fe9672 Merge branches 'for-linus' and 'for-linus-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: unplug every once and a while
  Btrfs: deal with NULL srv_rsv in the delalloc inode reservation code
  Btrfs: only set cache_generation if we setup the block group
  Btrfs: don't panic if orphan item already exists
  Btrfs: fix leaked space in truncate
  Btrfs: fix how we do delalloc reservations and how we free reservations on error
  Btrfs: deal with enospc from dirtying inodes properly
  Btrfs: fix num_workers_starting bug and other bugs in async thread
  BTRFS: Establish i_ops before calling d_instantiate
  Btrfs: add a cond_resched() into the worker loop
  Btrfs: fix ctime update of on-disk inode
  btrfs: keep orphans for subvolume deletion
  Btrfs: fix inaccurate available space on raid0 profile
  Btrfs: fix wrong disk space information of the files
  Btrfs: fix wrong i_size when truncating a file to a larger size
  Btrfs: fix btrfs_end_bio to deal with write errors to a single mirror

* 'for-linus-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: lower the dirty balance poll interval
2011-12-16 12:15:50 -08:00
Wu Fengguang
142349f541 btrfs: lower the dirty balance poll interval
Tests show that the original large intervals can easily make the dirty
limit exceeded on 100 concurrent dd's. So adapt to as large as the
next check point selected by the dirty throttling algorithm.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-16 12:32:57 -05:00
Trond Myklebust
6c52961743 NFS: Fix a regression in nfs_file_llseek()
After commit 06222e491e (fs: handle
SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek)
the behaviour of llseek() was changed so that it always revalidates
the file size. The bug appears to be due to a logic error in the
afore-mentioned commit, which always evaluates to 'true'.

Reported-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>=3.1]
2011-12-15 18:44:36 -05:00
Chris Mason
d85c8a6f1b Btrfs: unplug every once and a while
The btrfs io submission threads can build up massive plug lists.  This
keeps things more reasonable so we don't hand over huge dumps of IO at
once.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-15 15:38:41 -05:00
Christoph Hellwig
7ae4440723 xfs: remove XFS_QMOPT_DQSUSER
Just read the id 0 dquot from disk directly in xfs_qm_init_quotainfo instead
of going through dqget and requiring a special flag to not add the dquot to
any lists.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-15 14:38:30 -06:00
Christoph Hellwig
97e7ade506 xfs: kill xfs_qm_idtodq
This function doesn't help the code flow, so merge the dquot allocation and
transaction handling into xfs_qm_dqread.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-15 14:37:32 -06:00
Christoph Hellwig
49d35a5cf1 xfs: merge xfs_qm_dqinit_core into the only caller
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-15 14:37:32 -06:00
Christoph Hellwig
78e55892d6 xfs: add a xfs_dqhold helper
Factor the common pattern of:

	xfs_dqlock(dqp);
	XFS_DQHOLD(dqp);
	xfs_dqunlock(dqp);

into a new helper, and remove XFS_DQHOLD now that only one other caller
is left.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-15 14:37:32 -06:00
Chris Mason
567a45e917 Merge branch 'for-chris' of http://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work into integration
Conflicts:
	fs/btrfs/inode.c

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-15 13:43:49 -05:00
Chris Mason
e755d9ab38 Btrfs: deal with NULL srv_rsv in the delalloc inode reservation code
btrfs_update_inode is sometimes called with a null reservation.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-15 13:36:29 -05:00
Christoph Hellwig
ab680bb739 xfs: simplify xfs_qm_dqattach_grouphint
No need to play games with the qlock now that the freelist lock nests inside
it.  Also clean up various outdated comments.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-15 10:04:31 -06:00
Josef Bacik
e65cbb94e0 Btrfs: only set cache_generation if we setup the block group
A user reported a problem booting into a new kernel with the old format inodes.
He was panicing in cow_file_range while writing out the inode cache.  This is
because if the block group is not cached we'll just skip writing out the cache,
however if it gets dirtied again in the same transaction and it finished caching
we'd go ahead and write it out, but since we set cache_generation to the transid
we think we've already truncated it and will just carry on, running into
cow_file_range and blowing up.  We need to make sure we only set
cache_generation if we've done the truncate.  The user tested this patch and
verified that the panic no longer occured.  Thanks,

Reported-and-Tested-by: Klaus Bitto <klaus.bitto@gmail.com>
Signed-off-by: Josef Bacik <josef@redhat.com>
2011-12-15 11:04:24 -05:00
Josef Bacik
ee4d89f0c4 Btrfs: don't panic if orphan item already exists
I've been hitting this BUG_ON() in btrfs_orphan_add when running xfstest 269 in
a loop.  This is because we will add an orphan item, do the truncate, the
truncate will fail for whatever reason (*cough*ENOSPC*cough*) and then we're
left with an orphan item still in the fs.  Then we come back later to do another
truncate and it blows up because we already have an orphan item.  This is ok so
just fix the BUG_ON() to only BUG() if ret is not EEXIST.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-12-15 11:04:24 -05:00
Josef Bacik
7041ee9728 Btrfs: fix leaked space in truncate
We were occasionaly leaking space when running xfstest 269.  This is because if
we failed to start the transaction in the truncate loop we'd just goto out, but
we need to break so that the inode is removed from the orphan list and the space
is properly freed.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-12-15 11:04:23 -05:00
Josef Bacik
660d3f6cde Btrfs: fix how we do delalloc reservations and how we free reservations on error
Running xfstests 269 with some tracing my scripts kept spitting out errors about
releasing bytes that we didn't actually have reserved.  This took me down a huge
rabbit hole and it turns out the way we deal with reserved_extents is wrong,
we need to only be setting it if the reservation succeeds, otherwise the free()
method will come in and unreserve space that isn't actually reserved yet, which
can lead to other warnings and such.  The math was all working out right in the
end, but it caused all sorts of other issues in addition to making my scripts
yell and scream and generally make it impossible for me to track down the
original issue I was looking for.  The other problem is with our error handling
in the reservation code.  There are two cases that we need to deal with

1) We raced with free.  In this case free won't free anything because csum_bytes
is modified before we dro the lock in our reservation path, so free rightly
doesn't release any space because the reservation code may be depending on that
reservation.  However if we fail, we need the reservation side to do the free at
that point since that space is no longer in use.  So as it stands the code was
doing this fine and it worked out, except in case #2

2) We don't race with free.  Nobody comes in and changes anything, and our
reservation fails.  In this case we didn't reserve anything anyway and we just
need to clean up csum_bytes but not free anything.  So we keep track of
csum_bytes before we drop the lock and if it hasn't changed we know we can just
decrement csum_bytes and carry on.

Because of the case where we can race with free()'s since we have to drop our
spin_lock to do the reservation, I'm going to serialize all reservations with
the i_mutex.  We already get this for free in the heavy use paths, truncate and
file write all hold the i_mutex, just needed to add it to page_mkwrite and
various ioctl/balance things.  With this patch my space leak scripts no longer
scream bloody murder.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-12-15 11:04:22 -05:00
Josef Bacik
22c44fe65a Btrfs: deal with enospc from dirtying inodes properly
Now that we're properly keeping track of delayed inode space we've been getting
a lot of warnings out of btrfs_dirty_inode() when running xfstest 83.  This is
because a bunch of people call mark_inode_dirty, which is void so we can't
return ENOSPC.  This needs to be fixed in a few areas

1) file_update_time - this updates the mtime and such when writing to a file,
which will call mark_inode_dirty.  So copy file_update_time into btrfs so we can
call btrfs_dirty_inode directly and return an error if we get one appropriately.

2) fix symlinks to use btrfs_setattr for ->setattr.  For some reason we weren't
setting ->setattr for symlinks, even though we should have been.  This catches
one of the cases where we were getting errors in mark_inode_dirty.

3) Fix btrfs_setattr and btrfs_setsize to call btrfs_dirty_inode directly
instead of mark_inode_dirty.  This lets us return errors properly for truncate
and chown/anything related to setattr.

4) Add a new btrfs_fs_dirty_inode which will just call btrfs_dirty_inode and
print an error if we have one.  The only remaining user we can't control for
this is touch_atime(), but we don't really want to keep people from walking
down the tree if we don't have space to save the atime update, so just complain
but don't worry about it.

With this patch xfstests 83 complains a handful of times instead of hundreds of
times.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-12-15 11:04:21 -05:00
Josef Bacik
0dc3b84a73 Btrfs: fix num_workers_starting bug and other bugs in async thread
Al pointed out we have some random problems with the way we account for
num_workers_starting in the async thread stuff.  First of all we need to make
sure to decrement num_workers_starting if we fail to start the worker, so make
__btrfs_start_workers do this.  Also fix __btrfs_start_workers so that it
doesn't call btrfs_stop_workers(), there is no point in stopping everybody if we
failed to create a worker.  Also check_pending_worker_creates needs to call
__btrfs_start_work in it's work function since it already increments
num_workers_starting.

People only start one worker at a time, so get rid of the num_workers argument
everywhere, and make btrfs_queue_worker a void since it will always succeed.
Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-12-15 11:04:21 -05:00
Casey Schaufler
ad19db71f4 BTRFS: Establish i_ops before calling d_instantiate
The Smack LSM hook for security_d_instantiate checks
the inode's i_op->getxattr value to determine if the
containing filesystem supports extended attributes.
The BTRFS filesystem sets the inode's i_op value only
after it has instantiated the inode. This results in
Smack incorrectly giving new BTRFS inodes attributes
from the filesystem defaults on the assumption that
values can't be stored on the filesystem. This patch
moves the assignment of inode operation vectors ahead
of the calls to d_instantiate, letting Smack know that
the filesystem supports extended attributes. There
should be no impact on the performance or behavior of
BTRFS.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-15 10:50:38 -05:00
Chris Mason
8f3b65a3d6 Btrfs: add a cond_resched() into the worker loop
If we have a constant stream of end_io completions or crc work,
we can hit softlockup messages from the async helper threads.  This
adds a cond_resched() into the loop to avoid them.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-15 10:50:38 -05:00
Li Zefan
306424cc88 Btrfs: fix ctime update of on-disk inode
To reproduce the bug:

    # touch /mnt/tmp
    # stat /mnt/tmp | grep Change
    Change: 2011-12-09 09:32:23.412105981 +0800
    # chattr +i /mnt/tmp
    # stat /mnt/tmp | grep Change
    Change: 2011-12-09 09:32:43.198105295 +0800
    # umount /mnt
    # mount /dev/loop1 /mnt
    # stat /mnt/tmp | grep Change
    Change: 2011-12-09 09:32:23.412105981 +0800

We should update ctime of in-memory inode before calling
btrfs_update_inode().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-15 10:50:37 -05:00
Arne Jansen
f8e9e0b07b btrfs: keep orphans for subvolume deletion
Since we have the free space caches, btrfs_orphan_cleanup also runs for
the tree_root. Unfortunately this also cleans up the orphans used to mark
subvol deletions in progress.

Currently if a subvol deletion gets interrupted twice by umount/mount, the
deletion will not be continued and the space permanently lost, though it
would be possible to write a tool to recover those lost subvol deletions.
This patch checks if the orphan belongs to a subvol (dead root) and skips
the deletion.

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-15 10:50:37 -05:00
Miao Xie
39fb26c398 Btrfs: fix inaccurate available space on raid0 profile
When we use raid0 as the data profile, df command may show us a very
inaccurate value of the available space, which may be much less than the
real one. It may make the users puzzled. Fix it by changing the calculation
of the available space, and making it be more similar to a fake chunk
allocation.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-15 10:50:36 -05:00
Miao Xie
3642320e07 Btrfs: fix wrong disk space information of the files
Btrfsck report errors after the 83th case of xfstests was run, The error
number is 400, it means the used disk space of the file is wrong.

The reason of this bug is that:
The file truncation may fail when the space of the file system is not enough,
and leave some file extents, whose offset are beyond the end of the files.
When we want to expand those files, we will drop those file extents, and
put in dummy file extents, and then we should update the i-node. But btrfs
forgets to do it.

This patch adds the forgotten i-node update.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-15 10:50:36 -05:00
Miao Xie
f4a2f4c548 Btrfs: fix wrong i_size when truncating a file to a larger size
Btrfsck report error 100 after the 83th case of xfstests was run, it means
the i_size of the file is wrong.

The reason of this bug is that:
Btrfs increased i_size of the file at the beginning, but it failed to expand
the file, and failed to update the i_size to the old size because there is no
enough space in the file system, so we found a wrong i_size.

This patch fixes this bug by updating the i_size just when we pass the file
expanding and get enough space to update i-node.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-15 10:50:35 -05:00
Justin P. Mattock
cb54f2571f btrfs: free-space-cache.c: remove extra semicolon.
The patch below removes an extra semicolon.

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
CC: Chris Mason <chris.mason@oracle.com>
CC: linux-btrfs@vger.kernel.org
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-12-15 16:42:25 +01:00
Geert Uytterhoeven
d7a83c0f7f fat: Spelling s/obsolate/obsolete/g
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-12-15 16:36:19 +01:00
Martin Schwidefsky
c3e0ef9a29 [S390] fix cputime overflow in uptime_proc_show
For 32-bit architectures using standard jiffies the idletime calculation
in uptime_proc_show will quickly overflow. It takes (2^32 / HZ) seconds
of idle-time, or e.g. 12.45 days with no load on a quad-core with HZ=1000.
Switch to 64-bit calculations.

Cc: stable@vger.kernel.org
Cc: Michael Abbott <michael.abbott@diamond.ac.uk>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-15 14:56:19 +01:00
Martin Schwidefsky
648616343c [S390] cputime: add sparse checking and cleanup
Make cputime_t and cputime64_t nocast to enable sparse checking to
detect incorrect use of cputime. Drop the cputime macros for simple
scalar operations. The conversion macros are still needed.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-15 14:56:19 +01:00
Ingo Molnar
6a54aebf69 Merge commit 'v3.2-rc5' into sched/core
Merge reason: Pick up the latest fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-15 08:21:30 +01:00
Christoph Hellwig
bf72de3194 xfs: nest qm_dqfrlist_lock inside the dquot qlock
Allow xfs_qm_dqput to work without trylock loops by nesting the freelist lock
inside the dquot qlock.  In turn that requires trylocks in the reclaim path
instead, but given it's a classic tradeoff between fast and slow path, and
we follow the model of the inode and dentry caches.

Document our new lock order now that it has settled.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-14 21:15:42 -06:00
Linus Torvalds
2240a7bb47 tytso-for-linus-20111214
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABCAAGBQJO6PXBAAoJENNvdpvBGATwpuEP/2RCxmdWYZ8/6Z6pmTh3hHN5
 fx6HckTdvLQOvbQs72wzVW0JKyc25QmW2mQc5z3MjSymjf/RbEKihPUITRNbHrTD
 T2sP/lWu09AKLioEg4ucAKn/A7Do3UDIkXTszvVVP/t2psVPzLeJ1njQKra14Nyz
 o0+gSlnwuGx9WaxfR+7MYNs2ikdSkXIeYsiFAOY4YOxwwC99J/lZ0YaNkbI7UBtC
 yu2XLIvPboa5JZXANq2G3VhVIETMmOyRTCC76OAXjqkdp9nLFWDG0ydqQh0vVZwL
 xQGOmAj+l3BNTE0QmMni1w7A0SBU3N6xBA5HN6Y49RlbsMYG27aN54Fy5K2R41I3
 QXVhBL53VD6b0KaITcoz7jIGIy6qk9Wx+2WcCYtQBSIjL2YwlaJq0PL07+vRamex
 sqHGDejcNY87i6AV0DP6SNuCFCi9xFYoAoMi9Wu5E9+T+Vck0okFzW/luk/FvsSP
 YA5Dh+vISyBeCnWQvcnBmsUQyf8d9MaNnejZ48ath+GiiMfY8USAZ29RAG4VuRtS
 9DAyTTIBA73dKpnvEV9u4i8Lwd8hRVMOnPyOO785NwEXk3Ng08pPSSbMklW6UfCY
 4nr5UNB13ZPbXx4uoAvATMpCpYxMaLEdxmeMvgXpkekl0hHBzpVDey1Vu9fb/a5n
 dQpo6WWG9HIJ23hOGAGR
 =n3Lm
 -----END PGP SIGNATURE-----

Merge tag 'tytso-for-linus-20111214' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

* tag 'tytso-for-linus-20111214' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: handle EOF correctly in ext4_bio_write_page()
  ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized
  ext4: correctly handle pages w/o buffers in ext4_discard_partial_buffers()
  ext4: avoid potential hang in mpage_submit_io() when blocksize < pagesize
  ext4: avoid hangs in ext4_da_should_update_i_disksize()
  ext4: display the correct mount option in /proc/mounts for [no]init_itable
  ext4: Fix crash due to getting bogus eh_depth value on big-endian systems
  ext4: fix ext4_end_io_dio() racing against fsync()

.. using the new signed tag merge of git that now verifies the gpg
signature automatically.  Yay.  The branchname was just 'dev', which is
prettier.  I'll tell Ted to use nicer tag names for future cases.
2011-12-14 18:25:58 -08:00
Linus Torvalds
30aaca4582 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: llseek fix race
  fuse: fix llseek bug
  fuse: fix fuse_retrieve
2011-12-14 18:23:35 -08:00
Linus Torvalds
ddb360778a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs/ncpfs: fix error paths and goto statements in ncp_fill_super()
  configfs: register_filesystem() called too early
  fuse: register_filesystem() called too early
  ubifs: too early register_filesystem()
  ... and the same kind of leak for mqueue
  procfs: fix a vfsmount longterm reference leak
2011-12-14 18:22:55 -08:00
Bryan Schumaker
2d3475c0ad NFSD: forget_delegations should use list_for_each_entry_safe
Otherwise the for loop could try to use a file recently removed from the
file_hashtbl list and oops.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Tested-by: Casey Bodley <cbodley@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-12-14 17:38:00 -05:00
Christoph Hellwig
92678554ab xfs: flatten the dquot lock ordering
Introduce a new XFS_DQ_FREEING flag that tells lookup and mplist walks
to skip a dquot that is beeing freed, and use this avoid the trylock
on the hash and mplist locks in xfs_qm_dqreclaim_one.  Also simplify
xfs_dqpurge by moving the inodes to a dispose list after marking them
XFS_DQ_FREEING and avoid the locker ordering constraints.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-14 16:32:21 -06:00
Djalal Harouni
759c361eb9 fs/ncpfs: fix error paths and goto statements in ncp_fill_super()
The label 'out_bdi' should be followed by bdi_destroy() instead of
fput() which should be after the 'out_fput' label.

If bdi_setup_and_register() fails then jump to the 'out_fput' label
instead of the 'out_bdi' one.

If fget(data.info_fd) fails then jump to the previously fixed 'out_bdi'
label to call bdi_destroy() otherwise the bdi object will not be
destroyed.

Compile tested only.

Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-12-14 00:45:33 -05:00
Yongqiang Yang
5a0dc7365c ext4: handle EOF correctly in ext4_bio_write_page()
We need to zero out part of a page which beyond EOF before setting uptodate,
otherwise, mapread or write will see non-zero data beyond EOF.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2011-12-13 22:29:12 -05:00
Yongqiang Yang
5b5ffa49d4 ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized
If a file is fallocated on a hole, map->m_lblk + map->m_len may be greater
than ee_block + ee_len.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2011-12-13 22:13:42 -05:00
Yongqiang Yang
093e6e3666 ext4: correctly handle pages w/o buffers in ext4_discard_partial_buffers()
If a page has been read into memory and never been written, it has no
buffers, but we should handle the page in truncate or punch hole.

VFS code of writing operations has handled holes correctly, so this
patch removes the code handling holes in writing operations.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2011-12-13 22:05:05 -05:00
Yongqiang Yang
13a79a4741 ext4: avoid potential hang in mpage_submit_io() when blocksize < pagesize
If there is an unwritten but clean buffer in a page and there is a
dirty buffer after the buffer, then mpage_submit_io does not write the
dirty buffer out.  As a result, da_writepages loops forever.

This patch fixes the problem by checking dirty flag.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2011-12-13 21:51:55 -05:00
Andrea Arcangeli
ea51d132db ext4: avoid hangs in ext4_da_should_update_i_disksize()
If the pte mapping in generic_perform_write() is unmapped between
iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
"copied" parameter to ->end_write can be zero. ext4 couldn't cope with
it with delayed allocations enabled. This skips the i_disksize
enlargement logic if copied is zero and no new data was appeneded to
the inode.

 gdb> bt
 #0  0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 #2  0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\
 ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440
 #3  generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\
 os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482
 #4  0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
 xffff88001e26be40) at mm/filemap.c:2600
 #5  0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\
 zed out>, pos=<value optimized out>) at mm/filemap.c:2632
 #6  0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
 t fs/ext4/file.c:136
 #7  0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \
 ppos=0xffff88001e26bf48) at fs/read_write.c:406
 #8  0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\
 000, pos=0xffff88001e26bf48) at fs/read_write.c:435
 #9  0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\
 4000) at fs/read_write.c:487
 #10 <signal handler called>
 #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
 #12 0x0000000000000000 in ?? ()
 gdb> print offset
 $22 = 0xffffffffffffffff
 gdb> print idx
 $23 = 0xffffffff
 gdb> print inode->i_blkbits
 $24 = 0xc
 gdb> up
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 2512                    if (ext4_da_should_update_i_disksize(page, end)) {
 gdb> print start
 $25 = 0x0
 gdb> print end
 $26 = 0xffffffffffffffff
 gdb> print pos
 $27 = 0x108000
 gdb> print new_i_size
 $28 = 0x108000
 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
 $29 = 0xd9000
 gdb> down
 2467            for (i = 0; i < idx; i++)
 gdb> print i
 $30 = 0xd44acbee

This is 100% reproducible with some autonuma development code tuned in
a very aggressive manner (not normal way even for knumad) which does
"exotic" changes to the ptes. It wouldn't normally trigger but I don't
see why it can't happen normally if the page is added to swap cache in
between the two faults leading to "copied" being zero (which then
hangs in ext4). So it should be fixed. Especially possible with lumpy
reclaim (albeit disabled if compaction is enabled) as that would
ignore the young bits in the ptes.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2011-12-13 21:41:15 -05:00
Tejun Heo
b2efa05265 block, cfq: unlink cfq_io_context's immediately
cic is association between io_context and request_queue.  A cic is
linked from both ioc and q and should be destroyed when either one
goes away.  As ioc and q both have their own locks, locking becomes a
bit complex - both orders work for removal from one but not from the
other.

Currently, cfq tries to circumvent this locking order issue with RCU.
ioc->lock nests inside queue_lock but the radix tree and cic's are
also protected by RCU allowing either side to walk their lists without
grabbing lock.

This rather unconventional use of RCU quickly devolves into extremely
fragile convolution.  e.g. The following is from cfqd going away too
soon after ioc and q exits raced.

 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:
 [   88.503444]
 Pid: 599, comm: hexdump Not tainted 3.1.0-rc10-work+ #158 Bochs Bochs
 RIP: 0010:[<ffffffff81397628>]  [<ffffffff81397628>] cfq_exit_single_io_context+0x58/0xf0
 ...
 Call Trace:
  [<ffffffff81395a4a>] call_for_each_cic+0x5a/0x90
  [<ffffffff81395ab5>] cfq_exit_io_context+0x15/0x20
  [<ffffffff81389130>] exit_io_context+0x100/0x140
  [<ffffffff81098a29>] do_exit+0x579/0x850
  [<ffffffff81098d5b>] do_group_exit+0x5b/0xd0
  [<ffffffff81098de7>] sys_exit_group+0x17/0x20
  [<ffffffff81b02f2b>] system_call_fastpath+0x16/0x1b

The only real hot path here is cic lookup during request
initialization and avoiding extra locking requires very confined use
of RCU.  This patch makes cic removal from both ioc and request_queue
perform double-locking and unlink immediately.

* From q side, the change is almost trivial as ioc->lock nests inside
  queue_lock.  It just needs to grab each ioc->lock as it walks
  cic_list and unlink it.

* From ioc side, it's a bit more difficult because of inversed lock
  order.  ioc needs its lock to walk its cic_list but can't grab the
  matching queue_lock and needs to perform unlock-relock dancing.

  Unlinking is now wholly done from put_io_context() and fast path is
  optimized by using the queue_lock the caller already holds, which is
  by far the most common case.  If the ioc accessed multiple devices,
  it tries with trylock.  In unlikely cases of fast path failure, it
  falls back to full double-locking dance from workqueue.

Double-locking isn't the prettiest thing in the world but it's *far*
simpler and more understandable than RCU trick without adding any
meaningful overhead.

This still leaves a lot of now unnecessary RCU logics.  Future patches
will trim them.

-v2: Vivek pointed out that cic->q was being dereferenced after
     cic->release() was called.  Updated to use local variable @this_q
     instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-12-14 00:33:39 +01:00
Tejun Heo
dc86900e0a block, cfq: move ioc ioprio/cgroup changed handling to cic
ioprio/cgroup change was handled by marking the changed state in ioc
and, on the following access to the ioc, performing RCU-protected
iteration through all cic's grabbing the matching queue_lock.

This patch moves the changed state to each cic.  When ioprio or cgroup
changes, the respective bit is set on all cic's of the ioc and when
each of those cic (not ioc) is accessed, change is applied for that
specific ioc-queue pair.

This also fixes the following two race conditions between setting and
clearing of changed states.

* Missing barrier between assign/load of ioprio and ioprio_changed
  allowed applying old ioprio.

* Change requests could happen between application of change and
  clearing of changed variables.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-12-14 00:33:38 +01:00
Tejun Heo
6e736be7f2 block: make ioc get/put interface more conventional and fix race on alloction
Ignoring copy_io() during fork, io_context can be allocated from two
places - current_io_context() and set_task_ioprio().  The former is
always called from local task while the latter can be called from
different task.  The synchornization between them are peculiar and
dubious.

* current_io_context() doesn't grab task_lock() and assumes that if it
  saw %NULL ->io_context, it would stay that way until allocation and
  assignment is complete.  It has smp_wmb() between alloc/init and
  assignment.

* set_task_ioprio() grabs task_lock() for assignment and does
  smp_read_barrier_depends() between "ioc = task->io_context" and "if
  (ioc)".  Unfortunately, this doesn't achieve anything - the latter
  is not a dependent load of the former.  ie, if ioc itself were being
  dereferenced "ioc->xxx", it would mean something (not sure what tho)
  but as the code currently stands, the dependent read barrier is
  noop.

As only one of the the two test-assignment sequences is task_lock()
protected, the task_lock() can't do much about race between the two.
Nothing prevents current_io_context() and set_task_ioprio() allocating
its own ioc for the same task and overwriting the other's.

Also, set_task_ioprio() can race with exiting task and create a new
ioc after exit_io_context() is finished.

ioc get/put doesn't have any reason to be complex.  The only hot path
is accessing the existing ioc of %current, which is simple to achieve
given that ->io_context is never destroyed as long as the task is
alive.  All other paths can happily go through task_lock() like all
other task sub structures without impacting anything.

This patch updates ioc get/put so that it becomes more conventional.

* alloc_io_context() is replaced with get_task_io_context().  This is
  the only interface which can acquire access to ioc of another task.
  On return, the caller has an explicit reference to the object which
  should be put using put_io_context() afterwards.

* The functionality of current_io_context() remains the same but when
  creating a new ioc, it shares the code path with
  get_task_io_context() and always goes through task_lock().

* get_io_context() now means incrementing ref on an ioc which the
  caller already has access to (be that an explicit refcnt or implicit
  %current one).

* PF_EXITING inhibits creation of new io_context and once
  exit_io_context() is finished, it's guaranteed that both ioc
  acquisition functions return %NULL.

* All users are updated.  Most are trivial but
  smp_read_barrier_depends() removal from cfq_get_io_context() needs a
  bit of explanation.  I suppose the original intention was to ensure
  ioc->ioprio is visible when set_task_ioprio() allocates new
  io_context and installs it; however, this wouldn't have worked
  because set_task_ioprio() doesn't have wmb between init and install.
  There are other problems with this which will be fixed in another
  patch.

* While at it, use NUMA_NO_NODE instead of -1 for wildcard node
  specification.

-v2: Vivek spotted contamination from debug patch.  Removed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-12-14 00:33:38 +01:00
Linus Torvalds
653f42f6b6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: add missing spin_unlock at ceph_mdsc_build_path()
  ceph: fix SEEK_CUR, SEEK_SET regression
  crush: fix mapping calculation when force argument doesn't exist
  ceph: use i_ceph_lock instead of i_lock
  rbd: remove buggy rollback functionality
  rbd: return an error when an invalid header is read
  ceph: fix rasize reporting by ceph_show_options
2011-12-13 14:59:42 -08:00
Linus Torvalds
4dde6dedad Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux
* 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: set max_pause to lowest value on zero bdi_dirty
  writeback: permit through good bdi even when global dirty exceeded
  writeback: comment on the bdi dirty threshold
  fs: Make write(2) interruptible by a fatal signal
  writeback: Fix issue on make htmldocs
2011-12-13 14:58:56 -08:00
Christoph Hellwig
be7ffc38a8 xfs: implement lazy removal for the dquot freelist
Do not remove dquots from the freelist when we grab a reference to them in
xfs_qm_dqlookup, but leave them on the freelist util scanning notices that
they have a reference.  This speeds up the lookup fastpath, and greatly
simplifies the lock ordering constraints.  Note that the same scheme is
used by the VFS inode and dentry caches.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-13 16:46:28 -06:00
Bryan Schumaker
39c4cc0fcc NFSD: Only reinitilize the recall_lru list under the recall lock
unhash_delegation() will grab the recall lock before calling
list_del_init() in each of these places.  This patch removes the
redundant calls.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-12-13 17:11:45 -05:00
Christoph Hellwig
80a376bfb7 xfs: remove XFS_DQ_INACTIVE
Free dquots when purging them during umount instead of keeping them around
on the freelist in a degraded state.  The out of order locking in
xfs_qm_dqpurge will be removed again later in this series.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-13 14:55:54 -06:00
Yehuda Sadeh
9d5a09e659 ceph: add missing spin_unlock at ceph_mdsc_build_path()
one of the paths was missing spin_unlock

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2011-12-13 11:59:53 -08:00
Greg Kroah-Hartman
1ff97647f0 char_dev.c: fix up some whitespace errors
Remove some minor whitespace errors (2 trailing spaces, and one space
needed for a comma) to make the file checkpatch.pl clean with the
exception of the exports, which is fine for now.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-13 11:18:17 -08:00
Christoph Hellwig
497507b9ee xfs: cleanup xfs_qm_dqlookup
Rearrange the code to avoid the conditional locking around the flist_locked
variable.  This means we lose a (rather pointless) assert, and hold the
freelist lock a bit longer for one corner case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-13 11:43:35 -06:00
Al Viro
7c6455e368 configfs: register_filesystem() called too early
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-12-13 12:35:15 -05:00
Al Viro
988f032567 fuse: register_filesystem() called too early
same story as with ubifs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-12-13 12:35:14 -05:00
Al Viro
5cc361e3b8 ubifs: too early register_filesystem()
doing that before you are ready to handle mount() is a Bad Idea(tm)...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-12-13 12:35:13 -05:00
Sage Weil
6a82c47aa8 ceph: fix SEEK_CUR, SEEK_SET regression
Commit 06222e491e got the if wrong so that
it always evaluates as true.  This is semantically harmless, but makes
SEEK_CUR and SEEK_SET needlessly query the server.

Rewrite the if to explicitly enumerate the cases we DO need a valid i_size
to make this code less fragile.

Reported-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-12-13 09:19:26 -08:00
John Muir
451d0f5999 FUSE: Notifying the kernel of deletion.
Allows a FUSE file-system to tell the kernel when a file or directory is
deleted. If the specified dentry has the specified inode number, the kernel will
unhash it.

The current 'fuse_notify_inval_entry' does not cause the kernel to clean up
directories that are in use properly, and as a result the users of those
directories see incorrect semantics from the file-system. The error condition
seen when 'fuse_notify_inval_entry' is used to notify of a deleted directory is
avoided when 'fuse_notify_delete' is used instead.

The following scenario demonstrates the difference:
1. User A chdirs into 'testdir' and starts reading 'testfile'.
2. User B rm -rf 'testdir'.
3. User B creates 'testdir'.
4. User C chdirs into 'testdir'.

If you run the above within the same machine on any file-system (including fuse
file-systems), there is no problem: user C is able to chdir into the new
testdir. The old testdir is removed from the dentry tree, but still open by user
A.

If operations 2 and 3 are performed via the network such that the fuse
file-system uses one of the notify functions to tell the kernel that the nodes
are gone, then the following error occurs for user C while user A holds the
original directory open:

muirj@empacher:~> ls /test/testdir
ls: cannot access /test/testdir: No such file or directory

The issue here is that the kernel still has a dentry for testdir, and so it is
requesting the attributes for the old directory, while the file-system is
responding that the directory no longer exists.

If on the other hand, if the file-system can notify the kernel that the
directory is deleted using the new 'fuse_notify_delete' function, then the above
ls will find the new directory as expected.

Signed-off-by: John Muir <john@jmuir.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2011-12-13 11:58:49 +01:00
Miklos Szeredi
b18da0c56e fuse: support ioctl on directories
Multiplexing filesystems may want to support ioctls on the underlying
files and directores (e.g. FS_IOC_{GET,SET}FLAGS).

Ioctl support on directories was missing so add it now.

Reported-by: Antonio SJ Musumeci <bile@landofbile.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2011-12-13 11:58:49 +01:00
Thomas Meyer
c411cc88d8 fuse: Use kcalloc instead of kzalloc to allocate array
The advantage of kcalloc is, that will prevent integer overflows which could
result from the multiplication of number of elements and size and it is also
a bit nicer to read.

The semantic patch that makes this change is available
in https://lkml.org/lkml/2011/11/25/107

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2011-12-13 11:58:49 +01:00
Miklos Szeredi
c07c3d1934 fuse: llseek optimize SEEK_CUR and SEEK_SET
Use generic_file_llseek() instead of open coding the seek function.

i_mutex protection is only necessary for SEEK_END (and SEEK_HOLE, SEEK_DATA), so
move SEEK_CUR and SEEK_SET out from under i_mutex.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2011-12-13 11:58:48 +01:00
Miklos Szeredi
73104b6e37 fuse: llseek fix race
Fix race between lseek(fd, 0, SEEK_CUR) and read/write.  This was fixed in
generic code by commit 5b6f1eb97d (vfs: lseek(fd, 0, SEEK_CUR) race condition).

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2011-12-13 11:40:59 +01:00
Roel Kluin
b48c6af208 fuse: fix llseek bug
The test in fuse_file_llseek() "not SEEK_CUR or not SEEK_SET" always evaluates
to true.

This was introduced in 3.1 by commit 06222e49 (fs: handle SEEK_HOLE/SEEK_DATA
properly in all fs's that define their own llseek) and changed the behavior of
SEEK_CUR and SEEK_SET to always retrieve the file attributes.  This is a
performance regression.

Fix the test so that it makes sense.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@vger.kernel.org
CC: Josef Bacik <josef@redhat.com>
CC: Al Viro <viro@zeniv.linux.org.uk>
2011-12-13 10:37:00 +01:00
Miklos Szeredi
48706d0a91 fuse: fix fuse_retrieve
Fix two bugs in fuse_retrieve():

 - retrieving more than one page would yield repeated instances of the
   first page

 - if more than FUSE_MAX_PAGES_PER_REQ pages were requested than the
   request page array would overflow

fuse_retrieve() was added in 2.6.36 and these bugs had been there since the
beginning.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@vger.kernel.org
2011-12-13 10:36:59 +01:00
Theodore Ts'o
fc6cb1cda5 ext4: display the correct mount option in /proc/mounts for [no]init_itable
/proc/mounts was showing the mount option [no]init_inode_table when
the correct mount option that will be accepted by parse_options() is
[no]init_itable.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2011-12-12 22:06:18 -05:00
Christoph Hellwig
800b484ec0 xfs: cleanup dquot locking helpers
Mark the trivial lock wrappers as inline, and make the naming consistent
for all of them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-12 17:28:20 -06:00
Greg Kroah-Hartman
007d00d4c1 Merge branch 'for-next/dwc3' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-next
* 'for-next/dwc3' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb: (392 commits)
  usb: dwc3: ep0: fix for possible early delayed_status
  usb: dwc3: gadget: fix stream enable bit
  usb: dwc3: ep0: fix GetStatus handling (again)
  usb: dwc3: ep0: use dwc3_request for ep0 requsts instead of usb_request
  usb: dwc3: use correct hwparam register for power mgm check
  usb: dwc3: omap: move to module_platform_driver
  usb: dwc3: workaround: missing disconnect event
  usb: dwc3: workaround: missing USB3 Reset event
  usb: dwc3: workaround: U1/U2 -> U0 transiton
  usb: dwc3: gadget: return early in dwc3_cleanup_done_reqs()
  usb: dwc3: ep0: handle delayed_status again
  usb: dwc3: ep0: push ep0state into xfernotready processing
  usb: dwc3: fix sparse errors
  usb: dwc3: fix few coding style problems
  usb: dwc3: move generic dwc3 code from gadget into core
  usb: dwc3: use a helper function for operation mode setting
  usb: dwc3: ep0: don't use ep0in for transfers
  usb: dwc3: ep0: use proper endianess in SetFeature for wIndex
  usb: dwc3: core: drop DWC3_EVENT_BUFFERS_MAX
  usb: dwc3: omap: add multiple instances support to OMAP
  ...
2011-12-12 15:19:53 -08:00
Christoph Hellwig
a7ef9bd79f xfs: remove the sync_mode argument to xfs_qm_dqflush_all
It always is zero, and removing it will make future changes easier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-12 16:46:20 -06:00
Christoph Hellwig
34625c661b xfs: remove xfs_qm_sync
Now that we can't have any dirty dquots around that aren't in the AIL we
can get rid of the explicit dquot syncing from xfssyncd and xfs_fs_sync_fs
and instead rely on AIL pushing to write out any quota updates.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-12 16:41:44 -06:00
Christoph Hellwig
f2fba558d3 xfs: make sure to really flush all dquots in xfs_qm_quotacheck
Make sure we do not skip any dquots when flushing them out after a
quotacheck to make sure that we will never have any dirty dquots on a
live filesystem.  At this point no dquot should be pinnable, but lets
be pedantic about it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-12 16:39:22 -06:00
Christoph Hellwig
fdedf28b94 xfs: untangle SYNC_WAIT and SYNC_TRYLOCK meanings for xfs_qm_dqflush
Only skip pinned dquots if SYNC_TRYLOCK is specified, and adjust the callers
to keep the behaviour unchanged.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-12 16:31:01 -06:00
J. Bruce Fields
f32f3c2d3f nfsd4: initialize special stateid's at compile time
Stateid's with "other" ("opaque") field all zeros or all ones are
reserved.  We define all_ones separately on the off chance there will be
more such some day, though currently all the other special stateid's
have zero other field.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-12-12 15:27:00 -05:00
Paul Mackerras
b4611abfa9 ext4: Fix crash due to getting bogus eh_depth value on big-endian systems
Commit 1939dd84b3 ("ext4: cleanup ext4_ext_grow_indepth code") added a
reference to ext4_extent_header.eh_depth, but forget to pass the value
read through le16_to_cpu.  The result is a crash on big-endian
machines, such as this crash on a POWER7 server:

attempt to access beyond end of device
sda8: rw=0, want=776392648163376, limit=168558560
Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6bcb
Faulting instruction address: 0xc0000000001f5f38
cpu 0x14: Vector: 300 (Data Access) at [c000001bd1aaecf0]
    pc: c0000000001f5f38: .__brelse+0x18/0x60
    lr: c0000000002e07a4: .ext4_ext_drop_refs+0x44/0x80
    sp: c000001bd1aaef70
   msr: 9000000000009032
   dar: 6b6b6b6b6b6b6bcb
 dsisr: 40000000
  current = 0xc000001bd15b8010
  paca    = 0xc00000000ffe4600
    pid   = 19911, comm = flush-8:0
enter ? for help
[c000001bd1aaeff0] c0000000002e07a4 .ext4_ext_drop_refs+0x44/0x80
[c000001bd1aaf090] c0000000002e0c58 .ext4_ext_find_extent+0x408/0x4c0
[c000001bd1aaf180] c0000000002e145c .ext4_ext_insert_extent+0x2bc/0x14c0
[c000001bd1aaf2c0] c0000000002e3fb8 .ext4_ext_map_blocks+0x628/0x1710
[c000001bd1aaf420] c0000000002b2974 .ext4_map_blocks+0x224/0x310
[c000001bd1aaf4d0] c0000000002b7f2c .mpage_da_map_and_submit+0xbc/0x490
[c000001bd1aaf5a0] c0000000002b8688 .write_cache_pages_da+0x2c8/0x430
[c000001bd1aaf720] c0000000002b8b28 .ext4_da_writepages+0x338/0x670
[c000001bd1aaf8d0] c000000000157280 .do_writepages+0x40/0x90
[c000001bd1aaf940] c0000000001ea830 .writeback_single_inode+0xe0/0x530
[c000001bd1aafa00] c0000000001eb680 .writeback_sb_inodes+0x210/0x300
[c000001bd1aafb20] c0000000001ebc84 .__writeback_inodes_wb+0xd4/0x140
[c000001bd1aafbe0] c0000000001ebfec .wb_writeback+0x2fc/0x3e0
[c000001bd1aafce0] c0000000001ed770 .wb_do_writeback+0x2f0/0x300
[c000001bd1aafdf0] c0000000001ed848 .bdi_writeback_thread+0xc8/0x340
[c000001bd1aafed0] c0000000000c5494 .kthread+0xb4/0xc0
[c000001bd1aaff90] c000000000021f48 .kernel_thread+0x54/0x70

This is due to getting ext_depth(inode) == 0x101 and therefore running
off the end of the path array in ext4_ext_drop_refs into following
unallocated structures.

This fixes it by adding the necessary le16_to_cpu.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-12-12 11:00:56 -05:00
Theodore Ts'o
b5a7e97039 ext4: fix ext4_end_io_dio() racing against fsync()
We need to make sure iocb->private is cleared *before* we put the
io_end structure on i_completed_io_list.  Otherwise fsync() could
potentially run on another CPU and free the iocb structure out from
under us.

Reported-by: Kent Overstreet <koverstreet@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2011-12-12 10:53:02 -05:00
Greg Kroah-Hartman
a36ae95c4e Merge v3.2-rc4 into usb-next
This lets us handle the PS3 merge easier, as well as syncing up with
other USB fixes already in the -rc4 tree.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 16:07:48 -08:00
Trond Myklebust
652f89f64f NFSv4: Do not accept delegated opens when a delegation recall is in effect
...and report the servers that try to return a delegation when the client
is using the CLAIM_DELEG_CUR open mode. That behaviour is explicitly
forbidden in RFC3530.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-12-09 19:05:58 -05:00
Linus Torvalds
8def5f51b0 Merge git://git.samba.org/sfrench/cifs-2.6
* git://git.samba.org/sfrench/cifs-2.6:
  cifs: check for NULL last_entry before calling cifs_save_resume_key
  cifs: attempt to freeze while looping on a receive attempt
  cifs: Fix sparse warning when calling cifs_strtoUCS
  CIFS: Add descriptions to the brlock cache functions
2011-12-09 14:45:44 -08:00
Trond Myklebust
4b44b40e04 NFSv4: Ensure correct locking when accessing the 'lock_states' list
There are currently 2 places in the state recovery code, where we do not
take sufficient precautions before accessing the state->lock_states. In
both cases, we should be holding the state->state_lock.

Reported-by: Pascal Bouchareine <pascal@gandi.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-12-09 16:31:52 -05:00
Chris Mason
5dbc8fca8e Btrfs: fix btrfs_end_bio to deal with write errors to a single mirror
btrfs_end_bio checks the number of errors on a bio against the max
number of errors allowed before sending any EIOs up to the higher
levels.

If we got enough copies of the bio done for a given raid level, it is
supposed to clear the bio error flag and return success.

We have pointers to the original bio sent down by the higher layers and
pointers to any cloned bios we made for raid purposes.  If the original
bio happens to be the one that got an io error, but not the last one to
finish, it might not have the BIO_UPTODATE bit set.

Then, when the last bio does finish, we'll call bio_end_io on the
original bio.  It won't have the uptodate bit set and we'll end up
sending EIO to the higher layers.

We already had a check for this, it just was conditional on getting the
IO error on the very last bio.  Make the check unconditional so we eat
the EIOs properly.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-09 11:07:37 -05:00
Michal Hocko
2a95ea6c0d procfs: do not overflow get_{idle,iowait}_time for nohz
Since commit a25cac5198 ("proc: Consider NO_HZ when printing idle and
iowait times") we are reporting idle/io_wait time also while a CPU is
tickless.  We rely on get_{idle,iowait}_time functions to retrieve
proper data.

These functions, however, use usecs_to_cputime to translate micro
seconds time to cputime64_t.  This is just an alias to usecs_to_jiffies
which reduces the data type from u64 to unsigned int and also checks
whether the given parameter overflows jiffies_to_usecs(MAX_JIFFY_OFFSET)
and returns MAX_JIFFY_OFFSET in that case.

When we overflow depends on CONFIG_HZ but especially for CONFIG_HZ_300
it is quite low (1431649781) so we are getting MAX_JIFFY_OFFSET for
>3000s! until we overflow unsigned int.  Just for reference
CONFIG_HZ_100 has an overflow window around 20s, CONFIG_HZ_250 ~8s and
CONFIG_HZ_1000 ~2s.

This results in a bug when people saw [h]top going mad reporting 100%
CPU usage even though there was basically no CPU load.  The reason was
simply that /proc/stat stopped reporting idle/io_wait changes (and
reported MAX_JIFFY_OFFSET) and so the only change happening was for user
system time.

Let's use nsecs_to_jiffies64 instead which doesn't reduce the precision
to 32b type and it is much more appropriate for cumulative time values
(unlike usecs_to_jiffies which intended for timeout calculations).

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Tested-by: Artem S. Tashkinov <t.artem@mailcity.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09 07:50:29 -08:00
Claudio Scordino
b53fc7c297 fs/proc/meminfo.c: fix compilation error
Fix the error message "directives may not be used inside a macro argument"
which appears when the kernel is compiled for the cris architecture.

Signed-off-by: Claudio Scordino <claudio@evidence.eu.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09 07:50:28 -08:00
Al Viro
905ad269c5 procfs: fix a vfsmount longterm reference leak
kern_mount() doesn't pair with plain mntput()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-12-09 00:40:19 -05:00
Jeff Layton
7023676f9e cifs: check for NULL last_entry before calling cifs_save_resume_key
Prior to commit eaf35b1, cifs_save_resume_key had some NULL pointer
checks at the top. It turns out that at least one of those NULL
pointer checks is needed after all.

When the LastNameOffset in a FIND reply appears to be beyond the end of
the buffer, CIFSFindFirst and CIFSFindNext will set srch_inf.last_entry
to NULL. Since eaf35b1, the code will now oops in this situation.

Fix this by having the callers check for a NULL last entry pointer
before calling cifs_save_resume_key. No change is needed for the
call site in cifs_readdir as it's not reachable with a NULL
current_entry pointer.

This should fix:

    https://bugzilla.redhat.com/show_bug.cgi?id=750247

Cc: stable@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Reported-by: Adam G. Metzler <adamgmetzler@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-12-08 22:04:47 -06:00
Jeff Layton
95edcff497 cifs: attempt to freeze while looping on a receive attempt
In the recent overhaul of the demultiplex thread receive path, I
neglected to ensure that we attempt to freeze on each pass through the
receive loop.

Reported-and-Tested-by: Woody Suwalski <terraluna977@gmail.com>
Reported-and-Tested-by: Adam Williamson <awilliam@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-12-08 22:04:47 -06:00
Steve French
59edb63ad0 cifs: Fix sparse warning when calling cifs_strtoUCS
Fix sparse endian check warning while calling cifs_strtoUCS

CHECK   fs/cifs/smbencrypt.c
fs/cifs/smbencrypt.c:216:37: warning: incorrect type in argument 1
(different base types)
fs/cifs/smbencrypt.c:216:37:    expected restricted __le16 [usertype] *<noident>
fs/cifs/smbencrypt.c:216:37:    got unsigned short *<noident>

Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com
2011-12-08 22:04:47 -06:00
Pavel Shilovsky
9a5101c896 CIFS: Add descriptions to the brlock cache functions
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-12-08 22:04:47 -06:00
Linus Torvalds
fb38f9b8fe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: drop spin lock when memory alloc fails
  Btrfs: check if the to-be-added device is writable
  Btrfs: try cluster but don't advance in search list
  Btrfs: try to allocate from cluster even at LOOP_NO_EMPTY_SIZE
2011-12-08 13:18:59 -08:00
Christoph Hellwig
b39342134a xfs: remove the lid_size field in struct log_item_desc
Outside the now removed nodelaylog code this field is only used for
asserts and can be safely removed now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-08 13:53:30 -06:00
Christoph Hellwig
0244b9603d xfs: cleanup the transaction commit path a bit
Now that the nodelaylog mode is gone we can simplify the transaction commit
path a bit by removing the xfs_trans_commit_cil routine.  Restoring the
process flags is merged into xfs_trans_commit which already does it for
the error path, and allocating the log vectors is merged into
xlog_cil_format_items, which already fills them with data, thus avoiding
one loop over all log items.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-08 13:53:30 -06:00
Christoph Hellwig
93b8a5854f xfs: remove the deprecated nodelaylog option
The delaylog mode has been the default for a long time, and the nodelaylog
option has been scheduled for removal in Linux 3.3.  Remove it and code
only used by it now that we have opened the 3.3 window.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-08 12:30:32 -06:00
Liu Bo
1cf4ffdb32 Btrfs: drop spin lock when memory alloc fails
Drop spin lock in convert_extent_bit() when memory alloc fails,
otherwise, it will be a deadlock.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-08 08:55:47 -05:00
Li Zefan
a5d1633361 Btrfs: check if the to-be-added device is writable
If we call ioctl(BTRFS_IOC_ADD_DEV) directly, we'll succeed in adding
a readonly device to a btrfs filesystem, and btrfs will write to
that device, emitting kernel errors:

[ 3109.833692] lost page write due to I/O error on loop2
[ 3109.833720] lost page write due to I/O error on loop2
...

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-08 08:55:46 -05:00
Alexandre Oliva
274bd4fb3e Btrfs: try cluster but don't advance in search list
When we find an existing cluster, we switch to its block group as the
current block group, possibly skipping multiple blocks in the process.
Furthermore, under heavy contention, multiple threads may fail to
allocate from a cluster and then release just-created clusters just to
proceed to create new ones in a different block group.

This patch tries to allocate from an existing cluster regardless of its
block group, and doesn't switch to that group, instead proceeding to
try to allocate a cluster from the group it was iterating before the
attempt.

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-08 08:55:40 -05:00
Alexandre Oliva
062c05c46b Btrfs: try to allocate from cluster even at LOOP_NO_EMPTY_SIZE
If we reach LOOP_NO_EMPTY_SIZE, we won't even try to use a cluster that
others might have set up.  Odds are that there won't be one, but if
someone else succeeded in setting it up, we might as well use it, even
if we don't try to set up a cluster again.

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-07 19:50:42 -05:00
Linus Torvalds
a694ad94bc Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: fix the logspace waiting algorithm
  xfs: fix nfs export of 64-bit inodes numbers on 32-bit kernels
  xfs: fix allocation length overflow in xfs_bmapi_write()
2011-12-07 16:13:54 -08:00
Stanislav Kinsbursky
f5c8593b94 NFSd: use network-namespace-aware cache registering routines
v2: cache_register_net() and cache_unregister_net() GPL exports added

This is a cleanup patch. Hope, some day generic cache_register() and
cache_unregister() will be removed.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-12-07 15:27:46 -05:00
Sage Weil
be655596b3 ceph: use i_ceph_lock instead of i_lock
We have been using i_lock to protect all kinds of data structures in the
ceph_inode_info struct, including lists of inodes that we need to iterate
over while avoiding races with inode destruction.  That requires grabbing
a reference to the inode with the list lock protected, but igrab() now
takes i_lock to check the inode flags.

Changing the list lock ordering would be a painful process.

However, using a ceph-specific i_ceph_lock in the ceph inode instead of
i_lock is a simple mechanical change and avoids the ordering constraints
imposed by igrab().

Reported-by: Amon Ott <a.ott@m-privacy.de>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-12-07 10:46:44 -08:00
Linus Torvalds
3172f8fe1c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix apparmor dereferencing potentially freed dentry, sanitize __d_path() API
2011-12-07 08:14:42 -08:00
Al Viro
02125a8264 fix apparmor dereferencing potentially freed dentry, sanitize __d_path() API
__d_path() API is asking for trouble and in case of apparmor d_namespace_path()
getting just that.  The root cause is that when __d_path() misses the root
it had been told to look for, it stores the location of the most remote ancestor
in *root.  Without grabbing references.  Sure, at the moment of call it had
been pinned down by what we have in *path.  And if we raced with umount -l, we
could have very well stopped at vfsmount/dentry that got freed as soon as
prepend_path() dropped vfsmount_lock.

It is safe to compare these pointers with pre-existing (and known to be still
alive) vfsmount and dentry, as long as all we are asking is "is it the same
address?".  Dereferencing is not safe and apparmor ended up stepping into
that.  d_namespace_path() really wants to examine the place where we stopped,
even if it's not connected to our namespace.  As the result, it looked
at ->d_sb->s_magic of a dentry that might've been already freed by that point.
All other callers had been careful enough to avoid that, but it's really
a bad interface - it invites that kind of trouble.

The fix is fairly straightforward, even though it's bigger than I'd like:
	* prepend_path() root argument becomes const.
	* __d_path() is never called with NULL/NULL root.  It was a kludge
to start with.  Instead, we have an explicit function - d_absolute_root().
Same as __d_path(), except that it doesn't get root passed and stops where
it stops.  apparmor and tomoyo are using it.
	* __d_path() returns NULL on path outside of root.  The main
caller is show_mountinfo() and that's precisely what we pass root for - to
skip those outside chroot jail.  Those who don't want that can (and do)
use d_path().
	* __d_path() root argument becomes const.  Everyone agrees, I hope.
	* apparmor does *NOT* try to use __d_path() or any of its variants
when it sees that path->mnt is an internal vfsmount.  In that case it's
definitely not mounted anywhere and dentry_path() is exactly what we want
there.  Handling of sysctl()-triggered weirdness is moved to that place.
	* if apparmor is asked to do pathname relative to chroot jail
and __d_path() tells it we it's not in that jail, the sucker just calls
d_absolute_path() instead.  That's the other remaining caller of __d_path(),
BTW.
        * seq_path_root() does _NOT_ return -ENAMETOOLONG (it's stupid anyway -
the normal seq_file logics will take care of growing the buffer and redoing
the call of ->show() just fine).  However, if it gets path not reachable
from root, it returns SEQ_SKIP.  The only caller adjusted (i.e. stopped
ignoring the return value as it used to do).

Reviewed-by: John Johansen <john.johansen@canonical.com>
ACKed-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
2011-12-06 23:57:18 -05:00
David S. Miller
959327c784 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2011-12-06 21:10:05 -05:00
Mi Jinlong
0cf99b91c6 nfsd41: allow non-reclaim open-by-fh's in 4.1
With NFSv4.0 it was safe to assume that open-by-filehandles were always
reclaims.

With NFSv4.1 there are non-reclaim open-by-filehandle operations, so we
should ensure we're only insisting on reclaims in the
OPEN_CLAIM_PREVIOUS case.

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-12-06 16:19:04 -05:00
Sasha Levin
b2ea70afad nfsd: Fix oops when parsing a 0 length export
expkey_parse() oopses when handling a 0 length export. This is easily
triggerable from usermode by writing 0 bytes into
'/proc/[proc id]/net/rpc/nfsd.fh/channel'.

Below is the log:

[ 1402.286893] BUG: unable to handle kernel paging request at ffff880077c49fff
[ 1402.287632] IP: [<ffffffff812b4b99>] expkey_parse+0x28/0x2e1
[ 1402.287632] PGD 2206063 PUD 1fdfd067 PMD 1ffbc067 PTE 8000000077c49160
[ 1402.287632] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 1402.287632] CPU 1
[ 1402.287632] Pid: 20198, comm: trinity Not tainted 3.2.0-rc2-sasha-00058-gc65cd37 #6
[ 1402.287632] RIP: 0010:[<ffffffff812b4b99>]  [<ffffffff812b4b99>] expkey_parse+0x28/0x2e1
[ 1402.287632] RSP: 0018:ffff880077f0fd68  EFLAGS: 00010292
[ 1402.287632] RAX: ffff880077c49fff RBX: 00000000ffffffea RCX: 0000000001043400
[ 1402.287632] RDX: 0000000000000000 RSI: ffff880077c4a000 RDI: ffffffff82283de0
[ 1402.287632] RBP: ffff880077f0fe18 R08: 0000000000000001 R09: ffff880000000000
[ 1402.287632] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880077c4a000
[ 1402.287632] R13: ffffffff82283de0 R14: 0000000001043400 R15: ffffffff82283de0
[ 1402.287632] FS:  00007f25fec3f700(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000
[ 1402.287632] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1402.287632] CR2: ffff880077c49fff CR3: 0000000077e1d000 CR4: 00000000000406e0
[ 1402.287632] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1402.287632] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1402.287632] Process trinity (pid: 20198, threadinfo ffff880077f0e000, task ffff880077db17b0)
[ 1402.287632] Stack:
[ 1402.287632]  ffff880077db17b0 ffff880077c4a000 ffff880077f0fdb8 ffffffff810b411e
[ 1402.287632]  ffff880000000000 ffff880077db17b0 ffff880077c4a000 ffffffff82283de0
[ 1402.287632]  0000000001043400 ffffffff82283de0 ffff880077f0fde8 ffffffff81111f63
[ 1402.287632] Call Trace:
[ 1402.287632]  [<ffffffff810b411e>] ? lock_release+0x1af/0x1bc
[ 1402.287632]  [<ffffffff81111f63>] ? might_fault+0x97/0x9e
[ 1402.287632]  [<ffffffff81111f1a>] ? might_fault+0x4e/0x9e
[ 1402.287632]  [<ffffffff81a8bcf2>] cache_do_downcall+0x3e/0x4f
[ 1402.287632]  [<ffffffff81a8c950>] cache_write.clone.16+0xbb/0x130
[ 1402.287632]  [<ffffffff81a8c9df>] ? cache_write_pipefs+0x1a/0x1a
[ 1402.287632]  [<ffffffff81a8c9f8>] cache_write_procfs+0x19/0x1b
[ 1402.287632]  [<ffffffff8118dc54>] proc_reg_write+0x8e/0xad
[ 1402.287632]  [<ffffffff8113fe81>] vfs_write+0xaa/0xfd
[ 1402.287632]  [<ffffffff8114142d>] ? fget_light+0x35/0x9e
[ 1402.287632]  [<ffffffff8113ff8b>] sys_write+0x48/0x6f
[ 1402.287632]  [<ffffffff81bbdb92>] system_call_fastpath+0x16/0x1b
[ 1402.287632] Code: c0 c9 c3 55 48 63 d2 48 89 e5 48 8d 44 32 ff 41 57 41 56 41 55 41 54 53 bb ea ff ff ff 48 81 ec 88 00 00 00 48 89 b5 58 ff ff ff
[ 1402.287632]  38 0a 0f 85 89 02 00 00 c6 00 00 48 8b 3d 44 4a e5 01 48 85
[ 1402.287632] RIP  [<ffffffff812b4b99>] expkey_parse+0x28/0x2e1
[ 1402.287632]  RSP <ffff880077f0fd68>
[ 1402.287632] CR2: ffff880077c49fff
[ 1402.287632] ---[ end trace 368ef53ff773a5e3 ]---

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: linux-nfs@vger.kernel.org
Cc: stable@kernel.org
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-12-06 16:18:37 -05:00
Jeff Layton
d310310cbf Freezer / sunrpc / NFS: don't allow TASK_KILLABLE sleeps to block the freezer
Allow the freezer to skip wait_on_bit_killable sleeps in the sunrpc
layer. This should allow suspend and hibernate events to proceed, even
when there are RPC's pending on the wire.

Also, wrap the TASK_KILLABLE sleeps in NFS layer in freezer_do_not_count
and freezer_count calls. This allows the freezer to skip tasks that are
sleeping while looping on EJUKEBOX or NFS4ERR_DELAY sorts of errors.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-12-06 22:12:27 +01:00
Christoph Hellwig
9f9c19ec1a xfs: fix the logspace waiting algorithm
Apply the scheme used in log_regrant_write_log_space to wake up any other
threads waiting for log space before the newly added one to
log_regrant_write_log_space as well, and factor the code into readable
helpers.  For each of the queues we have add two helpers:

 - one to try to wake up all waiting threads.  This helper will also be
   usable by xfs_log_move_tail once we remove the current opportunistic
   wakeups in it.
 - one to sleep on t_wait until enough log space is available, loosely
   modelled after Linux waitqueues.
 
And use them to reimplement the guts of log_regrant_write_log_space and
log_regrant_write_log_space.  These two function now use one and the same
algorithm for waiting on log space instead of subtly different ones before,
with an option to completely unify them in the near future.

Also move the filesystem shutdown handling to the common caller given
that we had to touch it anyway.

Based on hard debugging and an earlier patch from
Chandra Seetharaman <sekharan@us.ibm.com>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chandra Seetharaman <sekharan@us.ibm.com>
Tested-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-06 14:19:47 -06:00
Christoph Hellwig
c29f7d457a xfs: fix nfs export of 64-bit inodes numbers on 32-bit kernels
The i_ino field in the VFS inode is of type unsigned long and thus can't
hold the full 64-bit inode number on 32-bit kernels.  We have the full
inode number in the XFS inode, so use that one for nfs exports.  Note
that I've also switched the 32-bit file handles types to it, just to make
the code more consistent and copy & paste errors less likely to happen.

Reported-by: Guoquan Yang <ygq51@hotmail.com>
Reported-by: Hank Peng <pengxihan@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-06 10:46:23 -06:00
H Hartley Sweeten
46cc1e5fce GFS2: local functions should be static
Quiets the sparse noise:

warning: symbol 'gfs2_initxattrs' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-12-06 09:46:41 +00:00
Paul Bolle
90802ed9c3 treewide: Fix comment and string typo 'bufer'
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-12-06 09:53:40 +01:00
Glauber Costa
3292beb340 sched/accounting: Change cpustat fields to an array
This patch changes fields in cpustat from a structure, to an
u64 array. Math gets easier, and the code is more flexible.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Tuner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1322498719-2255-2-git-send-email-glommer@parallels.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 09:06:38 +01:00
Dave Chinner
a99ebf43f4 xfs: fix allocation length overflow in xfs_bmapi_write()
When testing the new xfstests --large-fs option that does very large
file preallocations, this assert was tripped deep in
xfs_alloc_vextent():

XFS: Assertion failed: args->minlen <= args->maxlen, file: fs/xfs/xfs_alloc.c, line: 2239

The allocation was trying to allocate a zero length extent because
the lower 32 bits of the allocation length was zero. The remaining
length of the allocation to be done was an exact multiple of 2^32 -
the first case I saw was at 496TB remaining to be allocated.

This turns out to be an overflow when converting the allocation
length (a 64 bit quantity) into the extent length to allocate (a 32
bit quantity), and it requires the length to be allocated an exact
multiple of 2^32 blocks to trip the assert.

Fix it by limiting the extent lenth to allocate to MAXEXTLEN.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-12-02 16:24:02 -06:00
David S. Miller
b3613118eb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2011-12-02 13:49:21 -05:00
Linus Torvalds
ffb8fb5469 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: fix attr2 vs large data fork assert
  xfs: force buffer writeback before blocking on the ilock in inode reclaim
  xfs: validate acl count
2011-12-02 10:38:20 -08:00
Sage Weil
2151937d7c ceph: fix rasize reporting by ceph_show_options
Fix typo.

Reported-by: mowang da <whooya.xxl@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-12-02 09:27:54 -08:00
Justin P. Mattock
42b2aa86c6 treewide: Fix typos in various parts of the kernel, and fix some comments.
The below patch fixes some typos in various parts of the kernel, as well as fixes some comments.
Please let me know if I missed anything, and I will try to get it changed and resent.

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-12-02 14:57:31 +01:00
Linus Torvalds
0a4ebed781 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (31 commits)
  ocfs2: avoid unaligned access to dqc_bitmap
  ocfs2: Use filemap_write_and_wait() instead of write_inode_now()
  ocfs2: honor O_(D)SYNC flag in fallocate
  ocfs2: Add a missing journal credit in ocfs2_link_credits() -v2
  ocfs2: send correct UUID to cleancache initialization
  ocfs2: Commit transactions in error cases -v2
  ocfs2: make direntry invalid when deleting it
  fs/ocfs2/dlm/dlmlock.c: free kmem_cache_zalloc'd data using kmem_cache_free
  ocfs2: Avoid livelock in ocfs2_readpage()
  ocfs2: serialize unaligned aio
  ocfs2: Implement llseek()
  ocfs2: Fix ocfs2_page_mkwrite()
  ocfs2: Add comment about orphan scanning
  ocfs2: Clean up messages in the fs
  ocfs2/cluster: Cluster up now includes network connections too
  ocfs2/cluster: Add new function o2net_fill_node_map()
  ocfs2/cluster: Fix output in file elapsed_time_in_ms
  ocfs2/dlm: dlmlock_remote() needs to account for remastery
  ocfs2/dlm: Take inflight reference count for remotely mastered resources too
  ocfs2/dlm: Cleanup dlm_wait_for_node_death() and dlm_wait_for_node_recovery()
  ...
2011-12-01 14:55:34 -08:00
Akinobu Mita
939255798a ocfs2: avoid unaligned access to dqc_bitmap
The dqc_bitmap field of struct ocfs2_local_disk_chunk is 32-bit aligned,
but not 64-bit aligned.  The dqc_bitmap is accessed by ocfs2_set_bit(),
ocfs2_clear_bit(), ocfs2_test_bit(), or ocfs2_find_next_zero_bit().  These
are wrapper macros for ext2_*_bit() which need to take an unsigned long
aligned address (though some architectures are able to handle unaligned
address correctly)

So some 64bit architectures may not be able to access the dqc_bitmap
correctly.

This avoids such unaligned access by using another wrapper functions for
ext2_*_bit().  The code is taken from fs/ext4/mballoc.c which also need to
handle unaligned bitmap access.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-12-01 14:39:32 -08:00
Trond Myklebust
111d489f0f NFSv4.1: Ensure that we handle _all_ SEQUENCE status bits.
Currently, the code assumes that the SEQUENCE status bits are mutually
exclusive. They are not...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 2.6.34]
2011-12-01 16:37:42 -05:00
Trond Myklebust
4f38e4aadc NFSv4: Don't error if we handled it in nfs4_recovery_handle_error
If we handled an error condition, then nfs4_recovery_handle_error should
return '0' so that the state recovery thread can continue.
Also ensure that nfs4_check_lease() continues to abort if we haven't got
any credentials by having it return ENOKEY (which is not handled).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-12-01 16:31:34 -05:00
Linus Torvalds
b930c26416 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix meta data raid-repair merge problem
  Btrfs: skip allocation attempt from empty cluster
  Btrfs: skip block groups without enough space for a cluster
  Btrfs: start search for new cluster at the beginning
  Btrfs: reset cluster's max_size when creating bitmap
  Btrfs: initialize new bitmaps' list
  Btrfs: fix oops when calling statfs on readonly device
  Btrfs: Don't error on resizing FS to same size
  Btrfs: fix deadlock on metadata reservation when evicting a inode
  Fix URL of btrfs-progs git repository in docs
  btrfs scrub: handle -ENOMEM from init_ipath()
2011-12-01 08:28:53 -08:00
Jan Schmidt
f4a8e6563e Btrfs: fix meta data raid-repair merge problem
Commit 4a54c8c16 introduced raid-repair, killing the individual
readpage_io_failed_hook entries from inode.c and disk-io.c. Commit
4bb31e92 introduced new readahead code, adding a readpage_io_failed_hook to
disk-io.c.

The raid-repair commit had logic to disable raid-repair, if
readpage_io_failed_hook is set. Thus, the readahead commit effectively
disabled raid-repair for meta data.

This commit changes the logic to always attempt raid-repair when needed and
call the readpage_io_failed_hook in case raid-repair fails. This is much
more straight forward and should have been like that from the beginning.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Reported-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-01 09:30:36 -05:00
Alexandre Oliva
be064d1139 Btrfs: skip allocation attempt from empty cluster
If we don't have a cluster, don't bother trying to allocate from it,
jumping right away to the attempt to allocate a new cluster.

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-30 13:43:00 -05:00
Alexandre Oliva
425d83156c Btrfs: skip block groups without enough space for a cluster
We test whether a block group has enough free space to hold the
requested block, but when we're doing clustered allocation, we can
save some cycles by testing whether it has enough room for the cluster
upfront, otherwise we end up attempting to set up a cluster and
failing.  Only in the NO_EMPTY_SIZE loop do we attempt an unclustered
allocation, and by then we'll have zeroed the cluster size, so this
patch won't stop us from using the block group as a last resort.

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-30 13:43:00 -05:00
Alexandre Oliva
1b22bad779 Btrfs: start search for new cluster at the beginning
Instead of starting at zero (offset is always zero), request a cluster
starting at search_start, that denotes the beginning of the current
block group.

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-30 13:43:00 -05:00
Alexandre Oliva
b78d09bceb Btrfs: reset cluster's max_size when creating bitmap
The field that indicates the size of the largest contiguous chunk of
free space in the cluster is not initialized when setting up bitmaps,
it's only increased when we find a larger contiguous chunk.  We end up
retaining a larger value than appropriate for highly-fragmented
clusters, which may cause pointless searches for large contiguous
groups, and even cause clusters that do not meet the density
requirements to be set up.

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-30 13:43:00 -05:00
Alexandre Oliva
f2d0f6765d Btrfs: initialize new bitmaps' list
We're failing to create clusters with bitmaps because
setup_cluster_no_bitmap checks that the list is empty before inserting
the bitmap entry in the list for setup_cluster_bitmap, but the list
field is only initialized when it is restored from the on-disk free
space cache, or when it is written out to disk.

Besides a potential race condition due to the multiple use of the list
field, filesystem performance severely degrades over time: as we use
up all non-bitmap free extents, the try-to-set-up-cluster dance is
done at every metadata block allocation.  For every block group, we
fail to set up a cluster, and after failing on them all up to twice,
we fall back to the much slower unclustered allocation.

To make matters worse, before the unclustered allocation, we try to
create new block groups until we reach the 1% threshold, which
introduces additional bitmaps and thus block groups that we'll iterate
over at each metadata block request.
2011-11-30 18:46:06 +01:00
Li Zefan
b772a86ea6 Btrfs: fix oops when calling statfs on readonly device
To reproduce this bug:

  # dd if=/dev/zero of=img bs=1M count=256
  # mkfs.btrfs img
  # losetup -r /dev/loop1 img
  # mount /dev/loop1 /mnt
  OOPS!!

It triggered BUG_ON(!nr_devices) in btrfs_calc_avail_data_space().

To fix this, instead of checking write-only devices, we check all open
deivces:

  # df -h /dev/loop1
  Filesystem            Size  Used Avail Use% Mounted on
  /dev/loop1            250M   28K  238M   1% /mnt

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2011-11-30 18:46:05 +01:00
Mike Fleetwood
ece7d20e8b Btrfs: Don't error on resizing FS to same size
It seems overly harsh to fail a resize of a btrfs file system to the
same size when a shrink or grow would succeed.  User app GParted trips
over this error.  Allow it by bypassing the shrink or grow operation.

Signed-off-by: Mike Fleetwood <mike.fleetwood@googlemail.com>
2011-11-30 18:46:04 +01:00
Miao Xie
aa38a711a8 Btrfs: fix deadlock on metadata reservation when evicting a inode
When I ran the xfstests, I found the test tasks was blocked on meta-data
reservation.

By debugging, I found the reason of this bug:
   start transaction
        |
	v
   reserve meta-data space
	|
	v
   flush delay allocation -> iput inode -> evict inode
	^					|
	|					v
   wait for delay allocation flush <- reserve meta-data space

And besides that, the flush on evicting inode will block the thread, which
is reclaiming the memory, and make oom happen easily.

Fix this bug by skipping the flush step when evicting inode.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2011-11-30 18:46:03 +01:00
Dan Carpenter
26bdef541d btrfs scrub: handle -ENOMEM from init_ipath()
init_ipath() can return an ERR_PTR(-ENOMEM).

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2011-11-30 18:46:01 +01:00
Christoph Hellwig
4c393a6059 xfs: fix attr2 vs large data fork assert
With Dmitry fsstress updates I've seen very reproducible crashes in
xfs_attr_shortform_remove because xfs_attr_shortform_bytesfit claims that
the attributes would not fit inline into the inode after removing an
attribute.  It turns out that we were operating on an inode with lots
of delalloc extents, and thus an if_bytes values for the data fork that
is larger than biggest possible on-disk storage for it which utterly
confuses the code near the end of xfs_attr_shortform_bytesfit.

Fix this by always allowing the current attribute fork, like we already
do for the attr1 format, given that delalloc conversion will take care
for moving either the data or attribute area out of line if it doesn't
fit at that point - or making the point moot by merging extents at this
point.

Also document the function better, and clean up some loose bits.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-11-29 13:03:12 -06:00
Christoph Hellwig
4dd2cb4a28 xfs: force buffer writeback before blocking on the ilock in inode reclaim
If we are doing synchronous inode reclaim we block the VM from making
progress in memory reclaim.  So if we encouter a flush locked inode
promote it in the delwri list and wake up xfsbufd to write it out now.
Without this we can get hangs of up to 30 seconds during workloads hitting
synchronous inode reclaim.

The scheme is copied from what we do for dquot reclaims.

Reported-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-11-29 12:06:14 -06:00
Linus Torvalds
883381d9f1 Merge branch 'dev' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'dev' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix racy use-after-free in ext4_end_io_dio()
2011-11-29 08:59:12 -08:00
Marcos Paulo de Souza
786228ab30 writeback: Fix issue on make htmldocs
Document the @reason parameter to make "make htmldocs" happy.

Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Marcos Paulo de Souza <marcos.mage@gmail.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-11-29 15:50:28 +08:00
Christoph Hellwig
fa8b18edd7 xfs: validate acl count
This prevents in-memory corruption and possible panics if the on-disk
ACL is badly corrupted.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2011-11-28 22:14:24 -06:00
Linus Torvalds
cb3599926e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  pstore: pass allocated memory region back to caller
2011-11-28 11:27:57 -08:00
Dan Carpenter
5858441c95 debugfs: remove unneeded cast in debugfs_print_regs32()
The cast here causes a Sparse warning:
fs/debugfs/file.c:561:42: warning: cast removes address space of expression
fs/debugfs/file.c:561:42: warning: incorrect type in argument 1 (different address spaces)
fs/debugfs/file.c:561:42:    expected void const volatile [noderef] <asn:2>*addr
fs/debugfs/file.c:561:42:    got void *<noident>

It's redundant to cast it to a (void *) anyway when it is already a
(void __iomem *).

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-26 20:12:47 -08:00
Alan Stern
045ddc8991 NLS: raname "maxlen" to "maxout" in UTF conversion routines
As requested by NamJae Jeon, this patch (as1503) changes the name of
the "maxlen" parameters to "maxout" in the various UTF conversion
routines.  This should make the role of that parameter more clear.

The patch also renames the "len" parameters to "inlen", for the same
reason.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reviewed-by: NamJae Jeon <linkinjeon@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-26 19:58:47 -08:00
Greg Kroah-Hartman
47b649590d Merge 3.2-rc3 into usb-linus
This pulls in the latest USB bugfixes and helps a few of the drivers
merge nicer in the future due to changes in both branches.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-26 19:46:48 -08:00
Thomas Meyer
67114fe610 nfsd4: Use kmemdup rather than duplicating its implementation
The semantic patch that makes this change is available
in scripts/coccinelle/api/memdup.cocci.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-11-25 18:44:22 -05:00
Tejun Heo
4c81f045c0 ext4: fix racy use-after-free in ext4_end_io_dio()
ext4_end_io_dio() queues io_end->work and then clears iocb->private;
however, io_end->work calls aio_complete() which frees the iocb
object.  If that slab object gets reallocated, then ext4_end_io_dio()
can end up clearing someone else's iocb->private, this use-after-free
can cause a leak of a struct ext4_io_end_t structure.

Detected and tested with slab poisoning.

[ Note: Can also reproduce using 12 fio's against 12 file systems with the
  following configuration file:

  [global]
  direct=1
  ioengine=libaio
  iodepth=1
  bs=4k
  ba=4k
  size=128m

  [create]
  filename=${TESTDIR}
  rw=write

  -- tytso ]

Google-Bug-Id: 5354697
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Kent Overstreet <koverstreet@google.com>
Tested-by: Kent Overstreet <koverstreet@google.com>
Cc: stable@kernel.org
2011-11-24 19:22:24 -05:00
Linus Torvalds
de7badf1ad Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  eCryptfs: Extend array bounds for all filename chars
  eCryptfs: Flush file in vma close
  eCryptfs: Prevent file create race condition
2011-11-23 14:28:13 -08:00
Tyler Hicks
0f751e641a eCryptfs: Extend array bounds for all filename chars
From mhalcrow's original commit message:

    Characters with ASCII values greater than the size of
    filename_rev_map[] are valid filename characters.
    ecryptfs_decode_from_filename() will access kernel memory beyond
    that array, and ecryptfs_parse_tag_70_packet() will then decrypt
    those characters. The attacker, using the FNEK of the crafted file,
    can then re-encrypt the characters to reveal the kernel memory past
    the end of the filename_rev_map[] array. I expect low security
    impact since this array is statically allocated in the text area,
    and the amount of memory past the array that is accessible is
    limited by the largest possible ASCII filename character.

This patch solves the issue reported by mhalcrow but with an
implementation suggested by Linus to simply extend the length of
filename_rev_map[] to 256. Characters greater than 0x7A are mapped to
0x00, which is how invalid characters less than 0x7A were previously
being handled.

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Michael Halcrow <mhalcrow@google.com>
Cc: stable@kernel.org
2011-11-23 15:43:53 -06:00
Tyler Hicks
32001d6fe9 eCryptfs: Flush file in vma close
Dirty pages weren't being written back when an mmap'ed eCryptfs file was
closed before the mapping was unmapped. Since f_ops->flush() is not
called by the munmap() path, the lower file was simply being released.
This patch flushes the eCryptfs file in the vm_ops->close() path.

https://launchpad.net/bugs/870326

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Cc: stable@kernel.org [2.6.39+]
2011-11-23 15:40:09 -06:00
Tyler Hicks
b59db43ad4 eCryptfs: Prevent file create race condition
The file creation path prematurely called d_instantiate() and
unlock_new_inode() before the eCryptfs inode info was fully
allocated and initialized and before the eCryptfs metadata was written
to the lower file.

This could result in race conditions in subsequent file and inode
operations leading to unexpected error conditions or a null pointer
dereference while attempting to use the unallocated memory.

https://launchpad.net/bugs/813146

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Cc: stable@kernel.org
2011-11-23 15:39:38 -06:00
Rafael J. Wysocki
986b11c3ee Merge branch 'pm-freezer' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into pm-freezer
* 'pm-freezer' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc: (24 commits)
  freezer: fix wait_event_freezable/__thaw_task races
  freezer: kill unused set_freezable_with_signal()
  dmatest: don't use set_freezable_with_signal()
  usb_storage: don't use set_freezable_with_signal()
  freezer: remove unused @sig_only from freeze_task()
  freezer: use lock_task_sighand() in fake_signal_wake_up()
  freezer: restructure __refrigerator()
  freezer: fix set_freezable[_with_signal]() race
  freezer: remove should_send_signal() and update frozen()
  freezer: remove now unused TIF_FREEZE
  freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE
  cgroup_freezer: prepare for removal of TIF_FREEZE
  freezer: clean up freeze_processes() failure path
  freezer: kill PF_FREEZING
  freezer: test freezable conditions while holding freezer_lock
  freezer: make freezing indicate freeze condition in effect
  freezer: use dedicated lock instead of task_lock() + memory barrier
  freezer: don't distinguish nosig tasks on thaw
  freezer: remove racy clear_freeze_flag() and set PF_NOFREEZE on dead tasks
  freezer: rename thaw_process() to __thaw_task() and simplify the implementation
  ...
2011-11-23 21:09:02 +01:00
Steven Whitehouse
018a01cd27 GFS2: We only need one ACL getting function
There is no need to have two versions of this function with
slightly different arguments.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-23 13:31:51 +00:00
Alexey Dobriyan
4e3fd7a06d net: remove ipv6_addr_copy()
C assignment can handle struct in6_addr copying.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-22 16:43:32 -05:00
Linus Torvalds
2db1125d51 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  mount_subtree() pointless use-after-free
  iio: fix a leak due to improper use of anon_inode_getfd()
  microblaze: bury asm/namei.h
2011-11-22 13:19:21 -08:00
Alessandro Rubini
03e099fbb0 debugfs: bugfix: include <linux/io.h> in file.c
The regs32 machinery uses readl. I forgot the mandatory include
and the code was not compiling on all archs.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Alessandro Rubini <rubini@gnudd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-22 10:20:05 -08:00
Al Viro
d31da0f0ba mount_subtree() pointless use-after-free
d'oh... we'd carefully pinned mnt->mnt_sb down, dropped mnt and attempt
to grab s_umount on mnt->mnt_sb.  The trouble is, *mnt might've been
overwritten by now...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-11-22 12:31:21 -05:00
Linus Torvalds
e25ba0ce03 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Revert pnfs ugliness from the generic NFS read code path
  SUNRPC: destroy freshly allocated transport in case of sockaddr init error
  NFS: Fix a regression in the referral code
  nfs: move nfs_file_operations declaration to bottom of file.c (try #2)
  nfs: when attempting to open a directory, fall back on normal lookup (try #5)
2011-11-22 08:54:15 -08:00
Linus Torvalds
af36d15f58 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: remove free-space-cache.c WARN during log replay
  Btrfs: sectorsize align offsets in fiemap
  Btrfs: clear pages dirty for io and set them extent mapped
  Btrfs: wait on caching if we're loading the free space cache
  Btrfs: prefix resize related printks with btrfs:
  btrfs: fix stat blocks accounting
  Btrfs: avoid unnecessary bitmap search for cluster setup
  Btrfs: fix to search one more bitmap for cluster setup
  btrfs: mirror_num should be int, not u64
  btrfs: Fix up 32/64-bit compatibility for new ioctls
  Btrfs: fix barrier flushes
  Btrfs: fix tree corruption after multi-thread snapshots and inode_cache flush
2011-11-22 08:53:40 -08:00
Dan Carpenter
bcdd0c1600 ext3: NULL dereference in ext3_evict_inode()
This is an fsfuzzer bug.  ->s_journal is set at the end of
ext3_load_journal() but we try to use it in the error handling from
ext3_get_journal() while it's still NULL.

[  337.039041] BUG: unable to handle kernel NULL pointer dereference at 0000000000000024
[  337.040380] IP: [<ffffffff816e6539>] _raw_spin_lock+0x9/0x30
[  337.041687] PGD 0
[  337.043118] Oops: 0002 [#1] SMP
[  337.044483] CPU 3
[  337.044495] Modules linked in: ecb md4 cifs fuse kvm_intel kvm brcmsmac brcmutil crc8 cordic r8169 [last unloaded: scsi_wait_scan]
[  337.047633]
[  337.049259] Pid: 8308, comm: mount Not tainted 3.2.0-rc2-next-20111121+ #24 SAMSUNG ELECTRONICS CO., LTD. RV411/RV511/E3511/S3511    /RV411/RV511/E3511/S3511
[  337.051064] RIP: 0010:[<ffffffff816e6539>]  [<ffffffff816e6539>] _raw_spin_lock+0x9/0x30
[  337.052879] RSP: 0018:ffff8800b1d11ae8  EFLAGS: 00010282
[  337.054668] RAX: 0000000000000100 RBX: 0000000000000000 RCX: ffff8800b77c2000
[  337.056400] RDX: ffff8800a97b5c00 RSI: 0000000000000000 RDI: 0000000000000024
[  337.058099] RBP: ffff8800b1d11ae8 R08: 6000000000000000 R09: e018000000000000
[  337.059841] R10: ff67366cc2607c03 R11: 00000000110688e6 R12: 0000000000000000
[  337.061607] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8800a78f06e8
[  337.063385] FS:  00007f9d95652800(0000) GS:ffff8800b7180000(0000) knlGS:0000000000000000
[  337.065110] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  337.066801] CR2: 0000000000000024 CR3: 00000000aef2c000 CR4: 00000000000006e0
[  337.068581] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  337.070321] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  337.072105] Process mount (pid: 8308, threadinfo ffff8800b1d10000, task ffff8800b1d02be0)
[  337.073800] Stack:
[  337.075487]  ffff8800b1d11b08 ffffffff811f48cf ffff88007ac9b158 0000000000000000
[  337.077255]  ffff8800b1d11b38 ffffffff8119405d ffff88007ac9b158 ffff88007ac9b250
[  337.078851]  ffffffff8181bda0 ffffffff8181bda0 ffff8800b1d11b68 ffffffff81131e31
[  337.080284] Call Trace:
[  337.081706]  [<ffffffff811f48cf>] log_start_commit+0x1f/0x40
[  337.083107]  [<ffffffff8119405d>] ext3_evict_inode+0x1fd/0x2a0
[  337.084490]  [<ffffffff81131e31>] evict+0xa1/0x1a0
[  337.085857]  [<ffffffff81132031>] iput+0x101/0x210
[  337.087220]  [<ffffffff811339d1>] iget_failed+0x21/0x30
[  337.088581]  [<ffffffff811905fc>] ext3_iget+0x15c/0x450
[  337.089936]  [<ffffffff8118b0c1>] ? ext3_rsv_window_add+0x81/0x100
[  337.091284]  [<ffffffff816df9a4>] ext3_get_journal+0x15/0xde
[  337.092641]  [<ffffffff811a2e9b>] ext3_fill_super+0xf2b/0x1c30
[  337.093991]  [<ffffffff810ddf7d>] ? register_shrinker+0x4d/0x60
[  337.095332]  [<ffffffff8111c112>] mount_bdev+0x1a2/0x1e0
[  337.096680]  [<ffffffff811a1f70>] ? ext3_setup_super+0x210/0x210
[  337.098026]  [<ffffffff8119a770>] ext3_mount+0x10/0x20
[  337.099362]  [<ffffffff8111cbee>] mount_fs+0x3e/0x1b0
[  337.100759]  [<ffffffff810eda1b>] ? __alloc_percpu+0xb/0x10
[  337.102330]  [<ffffffff81135385>] vfs_kern_mount+0x65/0xc0
[  337.103889]  [<ffffffff8113611f>] do_kern_mount+0x4f/0x100
[  337.105442]  [<ffffffff811378fc>] do_mount+0x19c/0x890
[  337.106989]  [<ffffffff810e8456>] ? memdup_user+0x46/0x90
[  337.108572]  [<ffffffff810e84f3>] ? strndup_user+0x53/0x70
[  337.110114]  [<ffffffff811383fb>] sys_mount+0x8b/0xe0
[  337.111617]  [<ffffffff816ed93b>] system_call_fastpath+0x16/0x1b
[  337.113133] Code: 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f b6 03 38 c2 75 f7 48 83 c4 08 5b 5d c3 0f 1f 84 00 00 00 00 00 55 b8 00 01 00 00 48 89 e5 <f0> 66 0f c1 07 0f b6 d4 38 c2 74 0c 0f 1f 00 f3 90 0f b6 07 38
[  337.116588] RIP  [<ffffffff816e6539>] _raw_spin_lock+0x9/0x30
[  337.118260]  RSP <ffff8800b1d11ae8>
[  337.119998] CR2: 0000000000000024
[  337.188701] ---[ end trace c36d790becac1615 ]---

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-11-22 13:36:15 +01:00
Steven Whitehouse
6a8099ed56 GFS2: Fix multi-block allocation
Clean up gfs2_alloc_blocks so that it takes the full extent length
rather than just the number of non-inode blocks as an argument. That
will only make a difference in the inode allocation case for now.

Also, this fixes the extent length handling around gfs2_alloc_extent() so
that multi block allocations will work again.

The rd_last_alloc block is set to the final block in the allocated
extent (as per the update to i_goal, but referenced to a different
start point).

This also removes the dinode argument to rgblk_search() which is no
longer used.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-22 12:18:51 +00:00
Bob Peterson
564e12b115 GFS2: decouple quota allocations from block allocations
This patch separates the code pertaining to allocations into two
parts: quota-related information and block reservations.
This patch also moves all the block reservation structure allocations to
function gfs2_inplace_reserve to simplify the code, and moves
the frees to function gfs2_inplace_release.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-22 10:25:21 +00:00
Thomas Meyer
eaecf43a69 UBIFS: Use kmemdup rather than duplicating its implementation
The semantic patch that makes this change is available
in scripts/coccinelle/api/memdup.cocci.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2011-11-22 10:58:48 +02:00
Yongqiang Yang
8c111b3f56 jbd: clear revoked flag on buffers before a new transaction started
Currently, we clear revoked flag only when a block is reused.  However,
this can tigger a false journal error.  Consider a situation when a block
is used as a meta block and is deleted(revoked) in ordered mode, then the
block is allocated as a data block to a file.  At this moment, user changes
the file's journal mode from ordered to journaled and truncates the file.
The block will be considered re-revoked by journal because it has revoked
flag still pending from the last transaction and an assertion triggers.

We fix the problem by keeping the revoked status more uptodate - we clear
revoked flag when switching revoke tables to reflect there is no revoked
buffers in current transaction any more.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-11-22 01:20:53 +01:00
Tejun Heo
8a32c441c1 freezer: implement and use kthread_freezable_should_stop()
Writeback and thinkpad_acpi have been using thaw_process() to prevent
deadlock between the freezer and kthread_stop(); unfortunately, this
is inherently racy - nothing prevents freezing from happening between
thaw_process() and kthread_stop().

This patch implements kthread_freezable_should_stop() which enters
refrigerator if necessary but is guaranteed to return if
kthread_stop() is invoked.  Both thaw_process() users are converted to
use the new function.

Note that this deadlock condition exists for many of freezable
kthreads.  They need to be converted to use the new should_stop or
freezable workqueue.

Tested with synthetic test case.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Henrique de Moraes Holschuh <ibm-acpi@hmh.eng.br>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
2011-11-21 12:32:23 -08:00
Tejun Heo
a0acae0e88 freezer: unexport refrigerator() and update try_to_freeze() slightly
There is no reason to export two functions for entering the
refrigerator.  Calling refrigerator() instead of try_to_freeze()
doesn't save anything noticeable or removes any race condition.

* Rename refrigerator() to __refrigerator() and make it return bool
  indicating whether it scheduled out for freezing.

* Update try_to_freeze() to return bool and relay the return value of
  __refrigerator() if freezing().

* Convert all refrigerator() users to try_to_freeze().

* Update documentation accordingly.

* While at it, add might_sleep() to try_to_freeze().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Christoph Hellwig <hch@infradead.org>
2011-11-21 12:32:22 -08:00
Linus Torvalds
f8f5ed7c99 Merge branch 'dev' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'dev' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix up a undefined error in ext4_free_blocks in debugging code
  ext4: add blk_finish_plug in error case of writepages.
  ext4: Remove kernel_lock annotations
  ext4: ignore journalled data options on remount if fs has no journal
2011-11-21 12:11:37 -08:00
Linus Torvalds
c292fe4aae Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: Allocate larger oid buffer in request msgs
  ceph: initialize root dentry
  ceph: fix iput race when queueing inode work
2011-11-21 12:11:13 -08:00
Chris Mason
24a7031396 Btrfs: remove free-space-cache.c WARN during log replay
The log replay code only partially loads block groups, since
the block group caching code is able to detect and deal with
extents the logging code has pinned down.

While the logging code is pinning down block groups, there is
a bogus WARN_ON we're hitting if the code wasn't able to find
an extent in the cache.  This commit removes the warning because
it can happen any time there isn't a valid free space cache
for that block group.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-21 14:57:33 -05:00
Yongqiang Yang
6e58ad69ef ext4: fix up a undefined error in ext4_free_blocks in debugging code
sbi is not defined, so let ext4_free_blocks use EXT4_SB(sb) instead
when EXT4FS_DEBUG is defined.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
2011-11-21 12:09:19 -05:00
Bob Peterson
b3e47ca0c2 GFS2: split function rgblk_search
This patch splits function rgblk_search into a function that finds
blocks to allocate (rgblk_search) and a function that assigns those
blocks (gfs2_alloc_extent).

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@rehat.com>
2011-11-21 16:48:02 +00:00
Steven Whitehouse
465f0a760d GFS2: Fix up "off by one" in the previous patch
The trace point should take extlen and not *ndata as the
extent length.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-21 10:05:55 +00:00
Bob Peterson
6e87ed0fc9 GFS2: move toward a generic multi-block allocator
This patch is a revision of the one I previously posted.
I tried to integrate all the suggestions Steve gave.
The purpose of the patch is to change function gfs2_alloc_block
(allocate either a dinode block or an extent of data blocks)
to a more generic gfs2_alloc_blocks function that can
allocate both a dinode _and_ an extent of data blocks in the
same call. This will ultimately help us create a multi-block
reservation scheme to reduce file fragmentation.

This patch moves more toward a generic multi-block allocator that
takes a pointer to the number of data blocks to allocate, plus whether
or not to allocate a dinode. In theory, it could be called to allocate
(1) a single dinode block, (2) a group of one or more data blocks, or
(3) a dinode plus several data blocks.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-21 10:04:09 +00:00
Steven Whitehouse
4442f2e03e GFS2: O_(D)SYNC support for fallocate
Add sync of metadata after fallocate for O_SYNC files to ensure that we
meet expectations for everything being on disk in this case.
Unfortunately, the offset and len parameters are modified during the
course of the fallocate function, so I've had to add a couple of new
variables to call generic_write_sync() at the end.

I know that potentially this will sync data as well within the range,
but I think that is a fairly harmless side-effect overall, since we
would not normally expect there to be any dirty data within the range in
question.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Benjamin Marzinski <bmarzins@redhat.com>
2011-11-21 10:01:25 +00:00
David Howells
dd179946db VFS: Log the fact that we've given ELOOP rather than creating a loop
To prevent an NFS server from being used to create a directory loop in an NFS
superblock on the client, the following patch was committed:

	commit 1836750115
	Author: Al Viro <viro@zeniv.linux.org.uk>
	Date:   Tue Jul 12 21:42:24 2011 -0400
	Subject: fix loop checks in d_materialise_unique()

This causes ELOOP to be reported to anyone trying to access the dentry that
would otherwise cause the kernel to complete the loop.

However, no indication is given to the caller as to why an operation that ought
to work doesn't.  The fault is with the kernel, which doesn't want to try and
solve the problem as it gets horrendously messy if there's another mountpoint
somewhere in the trees being spliced that can't be moved[*].

[*] The real problem is that we don't handle the excision of a subtree that
gets moved _out_ of what we can see.  This can happen on the server where a
directory is merely moved between two other dirs on the same filesystem, but
where destination dir is not accessible by the client.

So, given the choice to return ELOOP rather than trying to reconfigure the
dentry tree, we should give the caller some indication of why they aren't being
allowed to make what should be a legitimate request and log a message.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-11-20 23:04:27 -05:00
Thomas Meyer
67c50a7ed5 qnx4fs: Use kmemdup rather than duplicating its implementation
The semantic patch that makes this change is available
in scripts/coccinelle/api/memdup.cocci.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Anders Larsen <al@alarsen.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-11-20 20:32:28 +01:00
Josef Bacik
4d479cf010 Btrfs: sectorsize align offsets in fiemap
We've been hitting BUG()'s in btrfs_cont_expand and btrfs_fallocate and anywhere
else that calls btrfs_get_extent while running xfstests 13 in a loop.  This is
because fiemap is calling btrfs_get_extent with non-sectorsize aligned offsets,
which will end up adding mappings that are not sectorsize aligned, which will
cause problems in some cases for subsequent calls to btrfs_get_extent for
similar areas that are sectorsize aligned.  With this patch I ran xfstests 13 in
a loop for a couple of hours and didn't hit the problem that I could previously
hit in at most 20 minutes.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-11-20 07:42:17 -05:00
Josef Bacik
f7d61dcd68 Btrfs: clear pages dirty for io and set them extent mapped
When doing the io_ctl helpers to clean up the free space cache stuff I stopped
using our normal prepare_pages stuff, which means I of course forgot to do
things like set the pages extent mapped, which will cause us all sorts of
wonderful propblems.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-11-20 07:42:17 -05:00
Josef Bacik
291c7d2f57 Btrfs: wait on caching if we're loading the free space cache
We've been hitting panics when running xfstest 13 in a loop for long periods of
time.  And actually this problem has always existed so we've been hitting these
things randomly for a while.  Basically what happens is we get a thread coming
into the allocator and reading the space cache off of disk and adding the
entries to the free space cache as we go.  Then we get another thread that comes
in and tries to allocate from that block group.  Since block_group->cached !=
BTRFS_CACHE_NO it goes ahead and tries to do the allocation.  We do this because
if we're doing the old slow way of caching we don't want to hold people up and
wait for everything to finish.  The problem with this is we could end up
discarding the space cache at some arbitrary point in the future, which means we
could very well end up allocating space that is either bad, or when the real
caching happens it could end up thinking the space isn't in use when it really
is and cause all sorts of other problems.

The solution is to add a new flag to indicate we are loading the free space
cache from disk, and always try to cache the block group if cache->cached !=
BTRFS_CACHE_FINISHED.  That way if we are loading the space cache anybody else
who tries to allocate from the block group will have to wait until it's finished
to make sure it completes successfully.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2011-11-20 07:42:16 -05:00
Arnd Hannemann
5bb1468238 Btrfs: prefix resize related printks with btrfs:
For the user it is confusing to find something like:
[10197.627710] new size for /dev/mapper/vg0-usr_share is 3221225472
in kernel log, because it doesn't point directly to btrfs.

This patch prefixes those messages with "btrfs:" like other btrfs
related printks.

Signed-off-by: Arnd Hannemann <arnd@arndnet.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-20 07:42:16 -05:00
David Sterba
fadc0d8be4 btrfs: fix stat blocks accounting
Round inode bytes and delalloc bytes up to real blocksize before
converting to sector size. Otherwise eg. files smaller than 512
are reported with zero blocks due to incorrect rounding.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-20 07:42:15 -05:00
Li Zefan
52621cb6ed Btrfs: avoid unnecessary bitmap search for cluster setup
setup_cluster_no_bitmap() searches all the extents and bitmaps starting
from offset. Therefore if it returns -ENOSPC, all the bitmaps starting
from offset are in the bitmaps list, so it's sufficient to search from
this list in setup_cluser_bitmap().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-20 07:42:15 -05:00
Li Zefan
0f0fbf1d0e Btrfs: fix to search one more bitmap for cluster setup
Suppose there are two bitmaps [0, 256], [256, 512] and one extent
[100, 120] in the free space cache, and we want to setup a cluster
with offset=100, bytes=50.

In this case, there will be only one bitmap [256, 512] in the temporary
bitmaps list, and then setup_cluster_bitmap() won't search bitmap [0, 256].

The cause is, the list is constructed in setup_cluster_no_bitmap(),
and only bitmaps with bitmap_entry->offset >= offset will be added
into the list, and the very bitmap that convers offset has
bitmap_entry->offset <= offset.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-20 07:42:14 -05:00
Jan Schmidt
32240a913d btrfs: mirror_num should be int, not u64
My previous patch introduced some u64 for failed_mirror variables, this one
makes it consistent again.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-20 07:42:14 -05:00
Jeff Mahoney
745c4d8e16 btrfs: Fix up 32/64-bit compatibility for new ioctls
This patch casts to unsigned long before casting to a pointer and fixes
 the following warnings:
fs/btrfs/extent_io.c:2289:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
fs/btrfs/ioctl.c:2933:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
fs/btrfs/ioctl.c:2937:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
fs/btrfs/ioctl.c:3020:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
fs/btrfs/scrub.c:275:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
fs/btrfs/backref.c:686:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-20 07:42:13 -05:00
Chris Mason
387125fc72 Btrfs: fix barrier flushes
When btrfs is writing the super blocks, it send barrier flushes to make
sure writeback caching drives get all the metadata on disk in the
right order.

But, we have two bugs in the way these are sent down.  When doing
full commits (not via the tree log), we are sending the barrier down
before the last super when it should be going down before the first.

In multi-device setups, we should be waiting for the barriers to
complete on all devices before writing any of the supers.

Both of these bugs can cause corruptions on power failures.  We fix it
with some new code to send down empty barriers to all devices before
writing the first super.

Alexandre Oliva found the multi-device bug.  Arne Jansen did the async
barrier loop.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Reported-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
2011-11-20 07:21:14 -05:00
Al Viro
f1fd306a91 minixfs: kill manual hweight(), simplify
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-11-19 11:13:28 -05:00
Josh Boyer
016e8d44bc fs/minix: Verify bitmap block counts before mounting
Newer versions of MINIX can create filesystems that allocate an extra
bitmap block.  Mounting of this succeeds, but doing a statfs call will
result in an oops in count_free because of a negative number being used
for the bh index.

Avoid this by verifying the number of allocated blocks at mount time,
erroring out if there are not enough and make statfs ignore the extras
if there are too many.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=18792

Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-11-19 11:13:26 -05:00
Linus Torvalds
208f6f6068 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  new helper: mount_subtree()
  switch create_mnt_ns() to saner calling conventions, fix double mntput() in nfs
  btrfs: fix double mntput() in mount_subvol()
2011-11-19 06:06:39 -05:00
Linus Torvalds
ab5c5f639b Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  MAINTAINERS: update XFS maintainer entry
  xfs: use doalloc flag in xfs_qm_dqattach_one()
2011-11-19 06:05:17 -05:00
Alessandro Rubini
8ee4dd9f06 debugfs: print_regs32: make regs array a const pointer
Signed-off-by: Alessandro Rubini <rubini@gnudd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-18 15:19:21 -08:00
Kees Cook
2174f6df78 pstore: gracefully handle NULL pstore_info functions
If a pstore backend doesn't want to support various portions of the
pstore interface, it can just leave those functions NULL instead of
creating no-op stubs.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-11-18 13:49:00 -08:00
Alan Stern
0720a06a75 NLS: improve UTF8 -> UTF16 string conversion routine
The utf8s_to_utf16s conversion routine needs to be improved.  Unlike
its utf16s_to_utf8s sibling, it doesn't accept arguments specifying
the maximum length of the output buffer or the endianness of its
16-bit output.

This patch (as1501) adds the two missing arguments, and adjusts the
only two places in the kernel where the function is called.  A
follow-on patch will add a third caller that does utilize the new
capabilities.

The two conversion routines are still annoyingly inconsistent in the
way they handle invalid byte combinations.  But that's a subject for a
different patch.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-18 10:51:01 -08:00
Alessandro Rubini
1a087c6ad9 debugfs: add tools to printk 32-bit registers
Some debugfs file I deal with are mostly blocks of registers,
i.e. lines of the form "<name> = 0x<value>". Some files are only
registers, some include registers blocks among other material.  This
patch introduces data structures and functions to deal with both
cases.  I expect more users of this over time.

Signed-off-by: Alessandro Rubini <rubini@gnudd.com>
Acked-by: Giancarlo Asnaghi <giancarlo.asnaghi@st.com>
Cc: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-18 10:31:22 -08:00
Bob Peterson
9beb3bf5a9 dlm: convert rsb list to rb_tree
Change the linked lists to rb_tree's in the rsb
hash table to speed up searches.  Slow rsb searches
were having a large impact on gfs2 performance due
to the large number of dlm locks gfs2 uses.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2011-11-18 10:20:15 -06:00
Linus Torvalds
15bd1cfb30 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
* 'for-linus' of git://git.kernel.dk/linux-block:
  block: add missed trace_block_plug
  paride: fix potential information leak in pg_read()
  bio: change some signed vars to unsigned
  block: avoid unnecessary plug list flush
  cciss: auto engage SCSI mid layer at driver load time
  loop: cleanup set_status interface
  include/linux/bio.h: use a static inline function for bio_integrity_clone()
  loop: prevent information leak after failed read
  block: Always check length of all iov entries in blk_rq_map_user_iov()
  The Windows driver .inf disables ASPM on all cciss devices. Do the same.
  backing-dev: ensure wakeup_timer is deleted
  block: Revert "[SCSI] genhd: add a new attribute "alias" in gendisk"
2011-11-18 09:34:35 -02:00
Bob Peterson
b9f417f311 GFS2: remove vestigial al_alloced
This patch removes the vestigial variable al_alloced from
the gfs2_alloc structure. This is another baby step toward
multi-block reservations.

My next planned step is to decouple the quota variables
from the gfs2_alloc structure so we can use a different
method for allocations.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-18 09:49:51 +00:00
Kees Cook
3d6d8d20ec pstore: pass reason to backend write callback
This allows a backend to filter on the dmesg reason as well as the pstore
reason. When ramoops is switched to pstore, this is needed since it has
no interest in storing non-crash dmesg details.

Drop pstore_write() as it has no users, and handling the "reason" here
has no obviously correct value.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-11-17 13:13:29 -08:00
Kees Cook
f6f8285132 pstore: pass allocated memory region back to caller
The buf_lock cannot be held while populating the inodes, so make the backend
pass forward an allocated and filled buffer instead. This solves the following
backtrace. The effect is that "buf" is only ever used to notify the backends
that something was written to it, and shouldn't be used in the read path.

To replace the buf_lock during the read path, isolate the open/read/close
loop with a separate mutex to maintain serialized access to the backend.

Note that is is up to the pstore backend to cope if the (*write)() path is
called in the middle of the read path.

[   59.691019] BUG: sleeping function called from invalid context at .../mm/slub.c:847
[   59.691019] in_atomic(): 0, irqs_disabled(): 1, pid: 1819, name: mount
[   59.691019] Pid: 1819, comm: mount Not tainted 3.0.8 #1
[   59.691019] Call Trace:
[   59.691019]  [<810252d5>] __might_sleep+0xc3/0xca
[   59.691019]  [<810a26e6>] kmem_cache_alloc+0x32/0xf3
[   59.691019]  [<810b53ac>] ? __d_lookup_rcu+0x6f/0xf4
[   59.691019]  [<810b68b1>] alloc_inode+0x2a/0x64
[   59.691019]  [<810b6903>] new_inode+0x18/0x43
[   59.691019]  [<81142447>] pstore_get_inode.isra.1+0x11/0x98
[   59.691019]  [<81142623>] pstore_mkfile+0xae/0x26f
[   59.691019]  [<810a2a66>] ? kmem_cache_free+0x19/0xb1
[   59.691019]  [<8116c821>] ? ida_get_new_above+0x140/0x158
[   59.691019]  [<811708ea>] ? __init_rwsem+0x1e/0x2c
[   59.691019]  [<810b67e8>] ? inode_init_always+0x111/0x1b0
[   59.691019]  [<8102127e>] ? should_resched+0xd/0x27
[   59.691019]  [<8137977f>] ? _cond_resched+0xd/0x21
[   59.691019]  [<81142abf>] pstore_get_records+0x52/0xa7
[   59.691019]  [<8114254b>] pstore_fill_super+0x7d/0x91
[   59.691019]  [<810a7ff5>] mount_single+0x46/0x82
[   59.691019]  [<8114231a>] pstore_mount+0x15/0x17
[   59.691019]  [<811424ce>] ? pstore_get_inode.isra.1+0x98/0x98
[   59.691019]  [<810a8199>] mount_fs+0x5a/0x12d
[   59.691019]  [<810b9174>] ? alloc_vfsmnt+0xa4/0x14a
[   59.691019]  [<810b9474>] vfs_kern_mount+0x4f/0x7d
[   59.691019]  [<810b9d7e>] do_kern_mount+0x34/0xb2
[   59.691019]  [<810bb15f>] do_mount+0x5fc/0x64a
[   59.691019]  [<810912fb>] ? strndup_user+0x2e/0x3f
[   59.691019]  [<810bb3cb>] sys_mount+0x66/0x99
[   59.691019]  [<8137b537>] sysenter_do_call+0x12/0x26

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-11-17 12:58:07 -08:00
Jan Kara
249ec93c01 ocfs2: Use filemap_write_and_wait() instead of write_inode_now()
Since ocfs2 has no ->write_inode method, there's no point in calling
write_inode_now() from ocfs2_cleanup_delete_inode().  Use
filemap_write_and_wait() instead. This helps us to cleanup inode writing
interfaces...

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-11-17 02:18:57 -08:00
Mark Fasheh
df295d4a4b ocfs2: honor O_(D)SYNC flag in fallocate
We need to sync the transaction which updates i_size if the file is marked
as needing sync semantics.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-11-17 02:15:58 -08:00
Xiaowei.Hu
0393afea31 ocfs2: Add a missing journal credit in ocfs2_link_credits() -v2
With indexed_dir enabled, ocfs2 maintains a list of dirblocks having
space.

The credit calculation in ocfs2_link_credits() did not correctly account
for adding an entry that exactly fills a dirblock that triggers removing
that dirblock by changing the pointer in the previous block in the list.
The credit calculation did not account for that previous block.

To expose, do:

mkfs.ocfs2 -b 512 -M local /dev/sdX
mount /dev/sdX /ocfs2
mkdir /ocfs2/linkdir
touch /ocfs2/linkdir/file1
for i in `seq 1 29` ; do link /ocfs2/linkdir/file1
/ocfs2/linkdir/linklinklinklinklinklink$i; done
rm -f /ocfs2/linkdir/linklinklinklinklinklink10
sleep 8
link /ocfs2/linkdir/file1
/ocfs2/linkdir/linklinklinklinklinklinkaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Note:
The link names have been crafted for a 512 byte blocksize. Reproducing
with a larger blocksize will require longer (or more) links. The sleep
is important. We want jbd2 to commit the transaction so that the missing
block does not piggy back on account of the previous transaction.

Signed-off-by: XiaoweiHu <xiaowei.hu at oracle.com>
Reviewed-by: WengangWang <wen.gang.wang at oracle.com>
Reviewed-by: Sunil.Mushran <sunil.mushran at oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-11-17 01:46:48 -08:00