Commit Graph

23316 Commits

Author SHA1 Message Date
Linus Torvalds
1acc9309eb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  btrfs: fix oops when doing space balance
  Btrfs: don't panic if we get an error while balancing V2
  btrfs: add missing options displayed in mount output
2011-07-08 23:25:45 -07:00
Linus Torvalds
54af2bd25c Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: unpin stale inodes directly in IOP_COMMITTED
2011-07-08 09:00:51 -07:00
Jeff Layton
04db79b015 cifs: factor smb_vol allocation out of cifs_setup_volume_info
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-07-08 03:51:23 +00:00
David Howells
c902ce1bfb FS-Cache: Add a helper to bulk uncache pages on an inode
Add an FS-Cache helper to bulk uncache pages on an inode.  This will
only work for the circumstance where the pages in the cache correspond
1:1 with the pages attached to an inode's page cache.

This is required for CIFS and NFS: When disabling inode cookie, we were
returning the cookie and setting cifsi->fscache to NULL but failed to
invalidate any previously mapped pages.  This resulted in "Bad page
state" errors and manifested in other kind of errors when running
fsstress.  Fix it by uncaching mapped pages when we disable the inode
cookie.

This patch should fix the following oops and "Bad page state" errors
seen during fsstress testing.

  ------------[ cut here ]------------
  kernel BUG at fs/cachefiles/namei.c:201!
  invalid opcode: 0000 [#1] SMP
  Pid: 5, comm: kworker/u:0 Not tainted 2.6.38.7-30.fc15.x86_64 #1 Bochs Bochs
  RIP: 0010: cachefiles_walk_to_object+0x436/0x745 [cachefiles]
  RSP: 0018:ffff88002ce6dd00  EFLAGS: 00010282
  RAX: ffff88002ef165f0 RBX: ffff88001811f500 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000282
  RBP: ffff88002ce6dda0 R08: 0000000000000100 R09: ffffffff81b3a300
  R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff88002ae54840
  R13: ffff88002ae54840 R14: ffff880029c29c00 R15: ffff88001811f4b0
  FS:  00007f394dd32720(0000) GS:ffff88002ef00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00007fffcb62ddf8 CR3: 000000001825f000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process kworker/u:0 (pid: 5, threadinfo ffff88002ce6c000, task ffff88002ce55cc0)
  Stack:
   0000000000000246 ffff88002ce55cc0 ffff88002ce6dd58 ffff88001815dc00
   ffff8800185246c0 ffff88001811f618 ffff880029c29d18 ffff88001811f380
   ffff88002ce6dd50 ffffffff814757e4 ffff88002ce6dda0 ffffffff8106ac56
  Call Trace:
   cachefiles_lookup_object+0x78/0xd4 [cachefiles]
   fscache_lookup_object+0x131/0x16d [fscache]
   fscache_object_work_func+0x1bc/0x669 [fscache]
   process_one_work+0x186/0x298
   worker_thread+0xda/0x15d
   kthread+0x84/0x8c
   kernel_thread_helper+0x4/0x10
  RIP  cachefiles_walk_to_object+0x436/0x745 [cachefiles]
  ---[ end trace 1d481c9af1804caa ]---

I tested the uncaching by the following means:

 (1) Create a big file on my NFS server (104857600 bytes).

 (2) Read the file into the cache with md5sum on the NFS client.  Look in
     /proc/fs/fscache/stats:

	Pages  : mrk=25601 unc=0

 (3) Open the file for read/write ("bash 5<>/warthog/bigfile").  Look in proc
     again:

	Pages  : mrk=25601 unc=25601

Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de>
cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-07 13:21:56 -07:00
Miao Xie
149e2d76b4 btrfs: fix oops when doing space balance
We need to make sure the data relocation inode doesn't go through
the delayed metadata updates, otherwise we get an oops during balance:

kernel BUG at fs/btrfs/relocation.c:4303!
[SNIP]
Call Trace:
 [<ffffffffa03143fd>] ? update_ref_for_cow+0x22d/0x330 [btrfs]
 [<ffffffffa0314951>] __btrfs_cow_block+0x451/0x5e0 [btrfs]
 [<ffffffffa031355d>] ? read_block_for_search+0x14d/0x4d0 [btrfs]
 [<ffffffffa0314beb>] btrfs_cow_block+0x10b/0x240 [btrfs]
 [<ffffffffa031acae>] btrfs_search_slot+0x49e/0x7a0 [btrfs]
 [<ffffffffa032d8af>] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
 [<ffffffff8147bf0e>] ? mutex_lock+0x1e/0x50
 [<ffffffffa0380cf1>] btrfs_update_delayed_inode+0x71/0x160 [btrfs]
 [<ffffffffa037ff27>] ? __btrfs_release_delayed_node+0x67/0x190 [btrfs]
 [<ffffffffa0381cf8>] btrfs_run_delayed_items+0xe8/0x120 [btrfs]
 [<ffffffffa03365e0>] btrfs_commit_transaction+0x250/0x850 [btrfs]
 [<ffffffff810f91d9>] ? find_get_pages+0x39/0x130
 [<ffffffffa0336cd5>] ? join_transaction+0x25/0x250 [btrfs]
 [<ffffffff81081de0>] ? wake_up_bit+0x40/0x40
 [<ffffffffa03785fa>] prepare_to_relocate+0xda/0xf0 [btrfs]
 [<ffffffffa037f2bb>] relocate_block_group+0x4b/0x620 [btrfs]
 [<ffffffffa0334cf5>] ? btrfs_clean_old_snapshots+0x35/0x150 [btrfs]
 [<ffffffffa037fa43>] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs]
 [<ffffffffa0368ec0>] ? btrfs_tree_unlock+0x50/0x50 [btrfs]
 [<ffffffffa035e39b>] btrfs_relocate_chunk+0x8b/0x670 [btrfs]
 [<ffffffffa031303d>] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs]
 [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
 [<ffffffffa031bea1>] ? btrfs_previous_item+0xb1/0x150 [btrfs]
 [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
 [<ffffffffa035f5aa>] btrfs_balance+0x21a/0x2b0 [btrfs]
 [<ffffffffa0368898>] btrfs_ioctl+0x798/0xd20 [btrfs]
 [<ffffffff8111e358>] ? handle_mm_fault+0x148/0x270
 [<ffffffff814809e8>] ? do_page_fault+0x1d8/0x4b0
 [<ffffffff81160d6a>] do_vfs_ioctl+0x9a/0x540
 [<ffffffff811612b1>] sys_ioctl+0xa1/0xb0
 [<ffffffff81484ec2>] system_call_fastpath+0x16/0x1b
[SNIP]
RIP  [<ffffffffa037c1cc>] btrfs_reloc_cow_block+0x22c/0x270 [btrfs]

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-07-06 18:51:53 -04:00
Josef Bacik
508794eb5e Btrfs: don't panic if we get an error while balancing V2
A user reported an error where if we try to balance an fs after a device has
been removed it will blow up.  This is because we get an EIO back and this is
where BUG_ON(ret) bites us in the ass.  To fix we just exit.  Thanks,

Reported-by: Anand Jain <Anand.Jain@oracle.com>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-07-06 18:46:43 -04:00
David Sterba
0942caa373 btrfs: add missing options displayed in mount output
There are three missed mount options settable by user which are not
currently displayed in mount output.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-07-06 18:46:43 -04:00
Dave Chinner
1316d4da3f xfs: unpin stale inodes directly in IOP_COMMITTED
When inodes are marked stale in a transaction, they are treated
specially when the inode log item is being inserted into the AIL.
It tries to avoid moving the log item forward in the AIL due to a
race condition with the writing the underlying buffer back to disk.
The was "fixed" in commit de25c18 ("xfs: avoid moving stale inodes
in the AIL").

To avoid moving the item forward, we return a LSN smaller than the
commit_lsn of the completing transaction, thereby trying to trick
the commit code into not moving the inode forward at all. I'm not
sure this ever worked as intended - it assumes the inode is already
in the AIL, but I don't think the returned LSN would have been small
enough to prevent moving the inode. It appears that the reason it
worked is that the lower LSN of the inodes meant they were inserted
into the AIL and flushed before the inode buffer (which was moved to
the commit_lsn of the transaction).

The big problem is that with delayed logging, the returning of the
different LSN means insertion takes the slow, non-bulk path.  Worse
yet is that insertion is to a position -before- the commit_lsn so it
is doing a AIL traversal on every insertion, and has to walk over
all the items that have already been inserted into the AIL. It's
expensive.

To compound the matter further, with delayed logging inodes are
likely to go from clean to stale in a single checkpoint, which means
they aren't even in the AIL at all when we come across them at AIL
insertion time. Hence these were all getting inserted into the AIL
when they simply do not need to be as inodes marked XFS_ISTALE are
never written back.

Transactional/recovery integrity is maintained in this case by the
other items in the unlink transaction that were modified (e.g. the
AGI btree blocks) and committed in the same checkpoint.

So to fix this, simply unpin the stale inodes directly in
xfs_inode_item_committed() and return -1 to indicate that the AIL
insertion code does not need to do any further processing of these
inodes.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-07-06 15:44:40 -05:00
Jeff Layton
f9e59bcba2 cifs: have cifs_cleanup_volume_info not take a double pointer
...as that makes for a cumbersome interface. Make it take a regular
smb_vol pointer and rely on the caller to zero it out if needed.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-07-06 20:03:05 +00:00
Jeff Layton
b2a0fa1520 cifs: fix build_unc_path_to_root to account for a prefixpath
Regression introduced by commit f87d39d951.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-07-06 20:03:05 +00:00
Jeff Layton
677d8537d8 cifs: remove bogus call to cifs_cleanup_volume_info
This call to cifs_cleanup_volume_info is clearly wrong. As soon as it's
called the following call to cifs_get_tcp_session will oops as the
volume_info pointer will then be NULL.

The caller of cifs_mount should clean up this data since it passed it
in. There's no need for us to call this here.

Regression introduced by commit 724d9f1cfb.

Reported-by: Adam Williamson <awilliam@redhat.com>
Cc: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-07-06 20:03:04 +00:00
Davidlohr Bueso
bcb65a797e FDPIC: Fix memory leak
The shdr4extnum variable isn't being freed in the cleanup process of
elf_fdpic_core_dump().

Signed-off-by: Davidlohr Bueso <dave@gnu.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-06 12:15:16 -07:00
Miklos Szeredi
a51cb91d81 fs: fix lock initialization
locks_alloc_lock() assumed that the allocated struct file_lock is
already initialized to zero members.  This is only true for the first
allocation of the structure, after reuse some of the members will have
random values.

This will for example result in passing random fl_start values to
userspace in fuse for FL_FLOCK locks, which is an information leak at
best.

Fix by reinitializing those members which may be non-zero after freeing.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-06 10:41:13 -07:00
Linus Torvalds
121782a248 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: fix sync and dio writes across stripe boundaries
  libceph: fix page calculation for non-page-aligned io
  ceph: fix page alignment corrections
2011-07-05 13:15:57 -07:00
Linus Torvalds
a8728d3554 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus:
  hfsplus: Fix double iput of the same inode in hfsplus_fill_super()
  hfsplus: add missing call to bio_put()
2011-07-05 10:04:27 -07:00
Jeff Layton
ee1b3ea9e6 cifs: set socket send and receive timeouts before attempting connect
Benjamin S. reported that he was unable to suspend his machine while
it had a cifs share mounted. The freezer caused this to spew when he
tried it:

-----------------------[snip]------------------
PM: Syncing filesystems ... done.
Freezing user space processes ... (elapsed 0.01 seconds) done.
Freezing remaining freezable tasks ...
Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0):
cifsd         S ffff880127f7b1b0     0  1821      2 0x00800000
 ffff880127f7b1b0 0000000000000046 ffff88005fe008a8 ffff8800ffffffff
 ffff880127cee6b0 0000000000011100 ffff880127737fd8 0000000000004000
 ffff880127737fd8 0000000000011100 ffff880127f7b1b0 ffff880127736010
Call Trace:
 [<ffffffff811e85dd>] ? sk_reset_timer+0xf/0x19
 [<ffffffff8122cf3f>] ? tcp_connect+0x43c/0x445
 [<ffffffff8123374e>] ? tcp_v4_connect+0x40d/0x47f
 [<ffffffff8126ce41>] ? schedule_timeout+0x21/0x1ad
 [<ffffffff8126e358>] ? _raw_spin_lock_bh+0x9/0x1f
 [<ffffffff811e81c7>] ? release_sock+0x19/0xef
 [<ffffffff8123e8be>] ? inet_stream_connect+0x14c/0x24a
 [<ffffffff8104485b>] ? autoremove_wake_function+0x0/0x2a
 [<ffffffffa02ccfe2>] ? ipv4_connect+0x39c/0x3b5 [cifs]
 [<ffffffffa02cd7b7>] ? cifs_reconnect+0x1fc/0x28a [cifs]
 [<ffffffffa02cdbdc>] ? cifs_demultiplex_thread+0x397/0xb9f [cifs]
 [<ffffffff81076afc>] ? perf_event_exit_task+0xb9/0x1bf
 [<ffffffffa02cd845>] ? cifs_demultiplex_thread+0x0/0xb9f [cifs]
 [<ffffffffa02cd845>] ? cifs_demultiplex_thread+0x0/0xb9f [cifs]
 [<ffffffff810444a1>] ? kthread+0x7a/0x82
 [<ffffffff81002d14>] ? kernel_thread_helper+0x4/0x10
 [<ffffffff81044427>] ? kthread+0x0/0x82
 [<ffffffff81002d10>] ? kernel_thread_helper+0x0/0x10

Restarting tasks ... done.
-----------------------[snip]------------------

We do attempt to perform a try_to_freeze in cifs_reconnect, but the
connection attempt itself seems to be taking longer than 20s to time
out. The connect timeout is governed by the socket send and receive
timeouts, so we can shorten that period by setting those timeouts
before attempting the connect instead of after.

Adam Williamson tested the patch and said that it seems to have fixed
suspending on his laptop when a cifs share is mounted.

Reported-by: Benjamin S <da_joind@gmx.net>
Tested-by: Adam Williamson <awilliam@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-07-01 16:15:30 +00:00
Alexey Khoroshilov
032016a56a hfsplus: Fix double iput of the same inode in hfsplus_fill_super()
There is a misprint in resource deallocation code on error path in
hfsplus_fill_super(): the sbi->alloc_file inode is iput twice,
while the root inode in not iput at all.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-06-30 13:38:39 +02:00
Seth Forshee
50176ddefa hfsplus: add missing call to bio_put()
hfsplus leaks bio objects by failing to call bio_put() on the bios
it allocates. Add the missing call to fix the leak.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Cc: <stable@kernel.org> # .38.x, .39.x
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-06-30 13:28:32 +02:00
Boaz Harrosh
2bea038c52 pnfs: write: Set mds_offset in the generic layer - it is needed by all LDs
In current pnfs tree, all the layouts set mds_offset in their
.write_pagelist member.
mds_offset is only used by generic layer and should be handled by it.

This patch is for upstream. It is needed in this -rc series to fix a
bug in objects layout_commit.

I'll send patches for objects and blocks to be
squashed into current pnfs tree.

TODO: It looks like the read path needs the same patch.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-28 14:12:11 -04:00
Vasiliy Kulikov
1d1221f375 proc: restrict access to /proc/PID/io
/proc/PID/io may be used for gathering private information.  E.g.  for
openssh and vsftpd daemons wchars/rchars may be used to learn the
precise password length.  Restrict it to processes being able to ptrace
the target process.

ptrace_may_access() is needed to prevent keeping open file descriptor of
"io" file, executing setuid binary and gathering io information of the
setuid'ed process.

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-28 09:39:11 -07:00
Jan Kara
08142579b6 mm: fix assertion mapping->nrpages == 0 in end_writeback()
Under heavy memory and filesystem load, users observe the assertion
mapping->nrpages == 0 in end_writeback() trigger.  This can be caused by
page reclaim reclaiming the last page from a mapping in the following
race:

	CPU0				CPU1
  ...
  shrink_page_list()
    __remove_mapping()
      __delete_from_page_cache()
        radix_tree_delete()
					evict_inode()
					  truncate_inode_pages()
					    truncate_inode_pages_range()
					      pagevec_lookup() - finds nothing
					  end_writeback()
					    mapping->nrpages != 0 -> BUG
        page->mapping = NULL
        mapping->nrpages--

Fix the problem by doing a reliable check of mapping->nrpages under
mapping->tree_lock in end_writeback().

Analyzed by Jay <jinshan.xiong@whamcloud.com>, lost in LKML, and dug out
by Miklos Szeredi <mszeredi@suse.de>.

Cc: Jay <jinshan.xiong@whamcloud.com>
Cc: Miklos Szeredi <mszeredi@suse.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-27 18:00:13 -07:00
Bob Liu
2b4b2482e7 romfs: fix romfs_get_unmapped_area() argument check
romfs_get_unmapped_area() checks argument `len' without considering
PAGE_ALIGN which will cause do_mmap_pgoff() return -EINVAL error after
commit f67d9b1576 ("nommu: add page_align to mmap").

Fix the check by changing it in same way ramfs_nommu_get_unmapped_area()
was changed in ramfs/file-nommu.c.

Signed-off-by: Bob Liu <lliubbo@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Acked-by: Greg Ungerer <gerg@snapgear.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-27 18:00:12 -07:00
Linus Torvalds
af4087e0e6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  btrfs: fix inconsonant inode information
  Btrfs: make sure to update total_bitmaps when freeing cache V3
  Btrfs: fix type mismatch in find_free_extent()
  Btrfs: make sure to record the transid in new inodes
2011-06-27 13:32:14 -07:00
Linus Torvalds
4699d4423c Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: prevent bogus assert when trying to remove non-existent attribute
  xfs: clear XFS_IDIRTY_RELEASE on truncate down
  xfs: reset inode per-lifetime state when recycling it
2011-06-27 09:01:29 -07:00
Miao Xie
2f7e33d432 btrfs: fix inconsonant inode information
When iputting the inode, We may leave the delayed nodes if they have some
delayed items that have not been dealt with. So when the inode is read again,
we must look up the relative delayed node, and use the information in it to
initialize the inode. Or we will get inconsonant inode information, it may
cause that the same directory index number is allocated again, and hit the
following oops:

[ 5447.554187] err add delayed dir index item(name: pglog_0.965_0) into the
insertion tree of the delayed node(root id: 262, inode id: 258, errno: -17)
[ 5447.569766] ------------[ cut here ]------------
[ 5447.575361] kernel BUG at fs/btrfs/delayed-inode.c:1301!
[SNIP]
[ 5447.790721] Call Trace:
[ 5447.793191]  [<ffffffffa0641c4e>] btrfs_insert_dir_item+0x189/0x1bb [btrfs]
[ 5447.800156]  [<ffffffffa0651a45>] btrfs_add_link+0x12b/0x191 [btrfs]
[ 5447.806517]  [<ffffffffa0651adc>] btrfs_add_nondir+0x31/0x58 [btrfs]
[ 5447.812876]  [<ffffffffa0651d6a>] btrfs_create+0xf9/0x197 [btrfs]
[ 5447.818961]  [<ffffffff8111f840>] vfs_create+0x72/0x92
[ 5447.824090]  [<ffffffff8111fa8c>] do_last+0x22c/0x40b
[ 5447.829133]  [<ffffffff8112076a>] path_openat+0xc0/0x2ef
[ 5447.834438]  [<ffffffff810c58e2>] ? __perf_event_task_sched_out+0x24/0x44
[ 5447.841216]  [<ffffffff8103ecdd>] ? perf_event_task_sched_out+0x59/0x67
[ 5447.847846]  [<ffffffff81121a79>] do_filp_open+0x3d/0x87
[ 5447.853156]  [<ffffffff811e126c>] ? strncpy_from_user+0x43/0x4d
[ 5447.859072]  [<ffffffff8111f1f5>] ? getname_flags+0x2e/0x80
[ 5447.864636]  [<ffffffff8111f179>] ? do_getname+0x14b/0x173
[ 5447.870112]  [<ffffffff8111f1b7>] ? audit_getname+0x16/0x26
[ 5447.875682]  [<ffffffff8112b1ab>] ? spin_lock+0xe/0x10
[ 5447.880882]  [<ffffffff81112d39>] do_sys_open+0x69/0xae
[ 5447.886153]  [<ffffffff81112db1>] sys_open+0x20/0x22
[ 5447.891114]  [<ffffffff813b9aab>] system_call_fastpath+0x16/0x1b

Fix it by reusing the old delayed node.

Reported-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Tested-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-27 11:34:27 -04:00
Linus Torvalds
258e43fdb0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: mark CONFIG_CIFS_NFSD_EXPORT as BROKEN
  cifs: free blkcipher in smbhash
2011-06-26 19:40:31 -07:00
Linus Torvalds
804a007f54 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  cifs: propagate errors from cifs_get_root() to mount(2)
  cifs: tidy cifs_do_mount() up a bit
  cifs: more breakage on mount failures
  cifs: close sget() races
  cifs: pull freeing mountdata/dropping nls/freeing cifs_sb into cifs_umount()
  cifs: move cifs_umount() call into ->kill_sb()
  cifs: pull cifs_mount() call up
  sanitize cifs_umount() prototype
  cifs: initialize ->tlink_tree in cifs_setup_cifs_sb()
  cifs: allocate mountdata earlier
  cifs: leak on mount if we share superblock
  cifs: don't pass superblock to cifs_mount()
  cifs: don't leak nls on mount failure
  cifs: double free on mount failure
  take bdi setup/destruction into cifs_mount/cifs_umount

Acked-by: Steve French <smfrench@gmail.com>
2011-06-26 19:39:22 -07:00
Josef Bacik
9b90f51353 Btrfs: make sure to update total_bitmaps when freeing cache V3
A user reported this bug again where we have more bitmaps than we are supposed
to.  This is because we failed to load the free space cache, but don't update
the ctl->total_bitmaps counter when we remove entries from the tree.  This patch
fixes this problem and we should be good to go again.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-25 09:31:06 -04:00
Ilya Dryomov
e0f5406727 Btrfs: fix type mismatch in find_free_extent()
data parameter should be u64 because a full-sized chunk flags field is
passed instead of 0/1 for distinguishing data from metadata.  All
underlying functions expect u64.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-25 09:31:06 -04:00
Al Viro
9403c9c598 cifs: propagate errors from cifs_get_root() to mount(2)
... instead of just failing with -EINVAL

Acked-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-24 18:39:43 -04:00
Al Viro
5c4f1ad7c6 cifs: tidy cifs_do_mount() up a bit
Acked-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-24 18:39:42 -04:00
Al Viro
fa18f1bdce cifs: more breakage on mount failures
if cifs_get_root() fails, we end up with ->mount() returning NULL,
which is not what callers expect.  Moreover, in case of superblock
reuse we end up leaking a superblock reference...

Acked-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-24 18:39:42 -04:00
Al Viro
ee01a14d9d cifs: close sget() races
have ->s_fs_info set by the set() callback passed to sget()

Acked-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-24 18:39:42 -04:00
Al Viro
d757d71bfc cifs: pull freeing mountdata/dropping nls/freeing cifs_sb into cifs_umount()
all callers of cifs_umount() proceed to do the same thing; pull it into
cifs_umount() itself.

Acked-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-24 18:39:42 -04:00
Al Viro
98ab494dd1 cifs: move cifs_umount() call into ->kill_sb()
instead of calling it manually in case if cifs_read_super() fails
to set ->s_root, just call it from ->kill_sb().  cifs_put_super()
is gone now *and* we have cifs_sb shutdown and destruction done
after the superblock is gone from ->s_instances.

Acked-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-24 18:39:42 -04:00
Al Viro
97d1152ace cifs: pull cifs_mount() call up
... to the point prior to sget().  Now we have cifs_sb set up early
enough.

Acked-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-24 18:39:42 -04:00
Al Viro
2a9b99516c sanitize cifs_umount() prototype
a) superblock argument is unused
b) it always returns 0

Acked-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-24 18:39:42 -04:00
Al Viro
2ced6f6935 cifs: initialize ->tlink_tree in cifs_setup_cifs_sb()
no need to wait until cifs_read_super() and we need it done
by the time cifs_mount() will be called.

Acked-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-24 18:39:42 -04:00
Al Viro
5d3bc605ca cifs: allocate mountdata earlier
pull mountdata allocation up, so that it won't stand in the way when
we lift cifs_mount() to location before sget().

Acked-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-24 18:39:41 -04:00
Al Viro
d687ca380f cifs: leak on mount if we share superblock
cifs_sb and nls end up leaked...

Acked-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-24 18:39:41 -04:00
Al Viro
2c6292ae4b cifs: don't pass superblock to cifs_mount()
To close sget() races we'll need to be able to set cifs_sb up before
we get the superblock, so we'll want to be able to do cifs_mount()
earlier.  Fortunately, it's easy to do - setting ->s_maxbytes can
be done in cifs_read_super(), ditto for ->s_time_gran and as for
putting MS_POSIXACL into ->s_flags, we can mirror it in ->mnt_cifs_flags
until cifs_read_super() is called.  Kill unused 'devname' argument,
while we are at it...

Acked-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-24 18:39:41 -04:00
Al Viro
ca171baaad cifs: don't leak nls on mount failure
if cifs_sb allocation fails, we still need to drop nls we'd stashed
into volume_info - the one we would've copied to cifs_sb if we could
allocate the latter.

Acked-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-24 18:39:41 -04:00
Al Viro
6d6861757d cifs: double free on mount failure
if we get to out_super with ->s_root already set (e.g. with
cifs_get_root() failure), we'll end up with cifs_put_super()
called and ->mountdata freed twice.  We'll also get cifs_sb
freed twice and cifs_sb->local_nls dropped twice.  The problem
is, we can get to out_super both with and without ->s_root,
which makes ->put_super() a bad place for such work.

Switch to ->kill_sb(), have all that work done there after
kill_anon_super().  Unlike ->put_super(), ->kill_sb() is
called by deactivate_locked_super() whether we have ->s_root
or not.

Acked-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-24 18:39:41 -04:00
Al Viro
dd85446619 take bdi setup/destruction into cifs_mount/cifs_umount
Acked-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-24 18:39:41 -04:00
Jeff Layton
9b8e072a31 cifs: mark CONFIG_CIFS_NFSD_EXPORT as BROKEN
This does not work properly with CIFS as current servers do not
enable support for the FILE_OPEN_BY_FILE_ID on SMB NTCreateX
and not all NFS clients handle ESTALE.

For now, it just plain doesn't work. Mark it BROKEN to discourage
distros from enabling it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-24 17:33:30 +00:00
Chris Mason
1973f0faeb Btrfs: make sure to record the transid in new inodes
When we create a new inode, we aren't filling in the
field that records the transaction that last changed this
inode.

If we then go to fsync that inode, it will be skipped because the field
isn't filled in.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-24 13:13:29 -04:00
Jeff Layton
e4fb0edb7c cifs: free blkcipher in smbhash
This is currently leaked in the rc == 0 case.

Reported-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-24 17:03:55 +00:00
Linus Torvalds
5220cc9382 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
* 'for-linus' of git://git.kernel.dk/linux-block:
  block: add REQ_SECURE to REQ_COMMON_MASK
  block: use the passed in @bdev when claiming if partno is zero
  block: Add __attribute__((format(printf...) and fix fallout
  block: make disk_block_events() properly wait for work cancellation
  block: remove non-syncing __disk_block_events() and fold it into disk_block_events()
  block: don't use non-syncing event blocking in disk_check_events()
  cfq-iosched: fix locking around ioc->ioc_data assignment
2011-06-24 08:42:35 -07:00
Linus Torvalds
143e859d05 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: fix wsize negotiation to respect max buffer size and active signing (try #4)
  CIFS: Fix problem with 3.0-rc1 null user mount failure
2011-06-24 08:35:04 -07:00
Jesper Juhl
46e4edbf7e Remove unneeded version.h includes from fs/
It was pointed out by 'make versioncheck' that some includes of
linux/version.h were not needed in fs/ (fs/btrfs/ctree.h and
fs/omfs/file.c).

This patch removes them.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-24 08:34:22 -07:00
Dave Chinner
4a33821236 xfs: prevent bogus assert when trying to remove non-existent attribute
If the attribute fork on an inode is in btree format and has
multiple levels (i.e node format rather than leaf format), then a
lookup failure will trigger an assert failure in xfs_da_path_shift
if the flag XFS_DA_OP_OKNOENT is not set. This flag is used to
indicate to the directory btree code that not finding an entry is
not a fatal error. In the case of doing a lookup for a directory
name removal, this is valid as a user cannot insert an arbitrary
name to remove from the directory btree.

However, in the case of the attribute tree, a user has direct
control over the attribute name and can ask for any random name to
be removed without any validation. In this case, fsstress is asking
for a non-existent user.selinux attribute to be removed, and that is
causing xfs_da_path_shift() to fall off the bottom of the tree where
it asserts that a lookup failure is allowed. Because the flag is not
set, we die a horrible death on a debug enable kernel.

Prevent this assert from firing on attribute removes by adding the
op_flag XFS_DA_OP_OKNOENT to atribute removal operations.

Discovered when testing on a SELinux enabled system by fsstress in
test 070 by trying to remove a non-existent user.selinux attribute.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-06-23 22:13:51 -05:00
Dave Chinner
df4368a146 xfs: clear XFS_IDIRTY_RELEASE on truncate down
When an inode is truncated down, speculative preallocation is
removed from the inode. This should also reset the state bits for
controlling whether preallocation is subsequently removed when the
file is next closed. The flag is not being cleared, so repeated
operations on a file that first involve a truncate (e.g. multiple
repeated dd invocations on a file) give different file layouts for
the second and subsequent invocations.

Fix this by clearing the XFS_IDIRTY_RELEASE state bit when the
XFS_ITRUNCATED bit is detected in xfs_release() and hence ensure
that speculative delalloc is removed on files that have been
truncated down.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-06-23 22:13:46 -05:00
Dave Chinner
778e24bb6d xfs: reset inode per-lifetime state when recycling it
XFS inodes has several per-lifetime state fields that determine the
behaviour of the inode. These state fields are not all reset when an
inode is reused from the reclaimable state.

This can lead to unexpected behaviour of the new inode such as
speculative preallocation not being truncated away in the expected
manner for local files until the inode is subsequently truncated,
freed or cycles out of the cache. It can also lead to an inode being
considered to be a filestream inode or having been truncated when
that is not the case.

Rework the reinitialisation of the inode when it is recycled to
ensure that it is pristine before it is reused. While there, also
fix the resetting of state flags in the recycling error paths so the
inode does not become unreclaimable.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-06-23 22:13:31 -05:00
Jeff Layton
1190f6a067 cifs: fix wsize negotiation to respect max buffer size and active signing (try #4)
Hopefully last version. Base signing check on CAP_UNIX instead of
tcon->unix_ext, also clean up the comments a bit more.

According to Hongwei Sun's blog posting here:

    http://blogs.msdn.com/b/openspecification/archive/2009/04/10/smb-maximum-transmit-buffer-size-and-performance-tuning.aspx

CAP_LARGE_WRITEX is ignored when signing is active. Also, the maximum
size for a write without CAP_LARGE_WRITEX should be the maxBuf that
the server sent in the NEGOTIATE request.

Fix the wsize negotiation to take this into account. While we're at it,
alter the other wsize definitions to use sizeof(WRITE_REQ) to allow for
slightly larger amounts of data to potentially be written per request.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-23 17:54:39 +00:00
Linus Torvalds
bccaeafd7c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
  jfs: agstart field must be 64 bits
  JFS: Don't save agno in the inode
  jfs: Update agstart when resizing volume
  jfs: old_agsize should be 64 bits in jfs_extendfs
2011-06-22 21:49:07 -07:00
Pavel Shilovsky
446b23a758 CIFS: Fix problem with 3.0-rc1 null user mount failure
Figured it out: it was broken by b946845a9d commit - "cifs: cifs_parse_mount_options: do not tokenize mount options in-place". So, as a quick fix I suggest to apply this patch.

[PATCH] CIFS: Fix kfree() with constant string in a null user case

Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-06-22 21:43:56 +00:00
Linus Torvalds
2992c4bd57 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Fix decode_secinfo_maxsz
  NFSv4.1: Fix an off-by-one error in pnfs_generic_pg_test
  NFSv4.1: Fix some issues with pnfs_generic_pg_test
  NFSv4.1: file layout must consider pg_bsize for coalescing
  pnfs-obj: No longer needed to take an extra ref at add_device
  SUNRPC: Ensure the RPC client only quits on fatal signals
  NFSv4: Fix a readdir regression
  nfs4.1: mark layout as bad on error path in _pnfs_return_layout
  nfs4.1: prevent race that allowed use of freed layout in _pnfs_return_layout
  NFSv4.1: need to put_layout_hdr on _pnfs_return_layout error path
  NFS: (d)printks should use %zd for ssize_t arguments
  NFSv4.1: fix break condition in pnfs_find_lseg
  nfs4.1: fix several problems with _pnfs_return_layout
  NFSv4.1: allow zero fh array in filelayout decode layout
  NFSv4.1: allow nfs_fhget to succeed with mounted on fileid
  NFSv4.1: Fix a refcounting issue in the pNFS device id cache
  NFSv4.1: deprecate headerpadsz in CREATE_SESSION
  NFS41: do not update isize if inode needs layoutcommit
  NLM: Don't hang forever on NLM unlock requests
  NFS: fix umount of pnfs filesystems
2011-06-21 18:20:55 -07:00
Linus Torvalds
890879cfa0 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd2: Fix oops in jbd2_journal_remove_journal_head()
  jbd2: Remove obsolete parameters in the comments for some jbd2 functions
  ext4: fixed tracepoints cleanup
  ext4: use FIEMAP_EXTENT_LAST flag for last extent in fiemap
  ext4: Fix max file size and logical block counting of extent format file
  ext4: correct comments for ext4_free_blocks()
2011-06-21 10:22:35 -07:00
Bryan Schumaker
1650add235 NFS: Fix decode_secinfo_maxsz
I initially did the calculation in bytes, and not words

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-21 11:54:07 -04:00
Trond Myklebust
19982ba856 NFSv4.1: Fix an off-by-one error in pnfs_generic_pg_test
And document what is going on there...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-21 11:54:06 -04:00
Trond Myklebust
8f7d5efbef NFSv4.1: Fix some issues with pnfs_generic_pg_test
1. If the intention is to coalesce requests 'prev' and 'req' then we
   have to ensure at least that we have a layout starting at
   req_offset(prev).

2. If we're only requesting a minimal layout of length desc->pg_count,
   we need to test the length actually returned by the server before
   we allow the coalescing to occur.

3. We need to deal correctly with (pgio->lseg == NULL)

4. Fixup the test guarding the pnfs_update_layout.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-21 11:54:05 -04:00
Linus Torvalds
eda0841094 Merge branch 'for-2.6.40' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.40' of git://linux-nfs.org/~bfields/linux:
  nfsd4: fix break_lease flags on nfsd open
  nfsd: link returns nfserr_delay when breaking lease
  nfsd: v4 support requires CRYPTO
  nfsd: fix dependency of nfsd on auth_rpcgss
2011-06-20 20:10:52 -07:00
Linus Torvalds
3669820650 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  devcgroup_inode_permission: take "is it a device node" checks to inlined wrapper
  fix comment in generic_permission()
  kill obsolete comment for follow_down()
  proc_sys_permission() is OK in RCU mode
  reiserfs_permission() doesn't need to bail out in RCU mode
  proc_fd_permission() is doesn't need to bail out in RCU mode
  nilfs2_permission() doesn't need to bail out in RCU mode
  logfs doesn't need ->permission() at all
  coda_ioctl_permission() is safe in RCU mode
  cifs_permission() doesn't need to bail out in RCU mode
  bad_inode_permission() is safe from RCU mode
  ubifs: dereferencing an ERR_PTR in ubifs_mount()
2011-06-20 20:09:15 -07:00
Dave Kleikamp
ecc90462b4 jfs: agstart field must be 64 bits
The previous patch added the agstart field to jfs_ip, but declared
it a long.  We need to make sure its 64 bits on every platform.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2011-06-20 17:53:24 -05:00
Benny Halevy
19345cb299 NFSv4.1: file layout must consider pg_bsize for coalescing
Otherwise we end up overflowing the rpc buffer size on the receive end.

Signed-off-by: Benny Halevy <benny@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-20 16:12:26 -04:00
Linus Torvalds
90a800de0a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: avoid delayed metadata items during commits
  btrfs: fix uninitialized return value
  btrfs: fix wrong reservation when doing delayed inode operations
  btrfs: Remove unused sysfs code
  btrfs: fix dereference of ERR_PTR value
  Btrfs: fix relocation races
  Btrfs: set no_trans_join after trying to expand the transaction
  Btrfs: protect the pending_snapshots list with trans_lock
  Btrfs: fix path leakage on subvol deletion
  Btrfs: drop the delalloc_bytes check in shrink_delalloc
  Btrfs: check the return value from set_anon_super
2011-06-20 08:58:53 -07:00
Dave Kleikamp
d31b53e3cd JFS: Don't save agno in the inode
Resizing the file system can result in an in-memory inode being remapped
to a different aggregate group (AG). A cached AG number can cause
problems when trying to free or allocate inodes. Instead, save the IAG's
agstart address and calculate the agno when we need it.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2011-06-20 10:53:46 -05:00
Dave Kleikamp
28e0fa894c jfs: Update agstart when resizing volume
A comment indicates that the IAG's agstart does not need to be updated
since it will always point to a block in the same aggregate group, but
jfs_fsck isn't so forgiving and reports it as an error.

I'm fixing this in jfsutils as well, so either a new kernel or new
utilities will be sufficient to fix the problem.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2011-06-20 10:32:46 -05:00
Dave Kleikamp
206b6310fd jfs: old_agsize should be 64 bits in jfs_extendfs
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2011-06-20 10:30:04 -05:00
Al Viro
8e833fd2e1 fix comment in generic_permission()
CAP_DAC_OVERRIDE is enough for MAY_EXEC on directory, even if
no exec bits are set.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-20 10:45:56 -04:00
Al Viro
6291176bcd kill obsolete comment for follow_down()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-20 10:45:49 -04:00
Al Viro
1aec7036d0 proc_sys_permission() is OK in RCU mode
nothing blocking there, since all instances of sysctl
->permissions() method are non-blocking - both of them,
that is.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-20 10:45:25 -04:00
Al Viro
1d29b5a2ed reiserfs_permission() doesn't need to bail out in RCU mode
nothing blocking other than generic_permission() (and
check_acl callback does bail out in RCU mode).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-20 10:45:21 -04:00
Al Viro
cf12791116 proc_fd_permission() is doesn't need to bail out in RCU mode
nothing blocking except generic_permission()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-20 10:44:50 -04:00
Al Viro
730e908f35 nilfs2_permission() doesn't need to bail out in RCU mode
Nothing blocking except for generic_permission().  Which will DTRT.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-20 10:44:33 -04:00
Al Viro
a63ab94d67 logfs doesn't need ->permission() at all
... and never did, what with its ->permission() being what we do by default
when ->permission is NULL...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-20 10:44:26 -04:00
Al Viro
6b419951f1 coda_ioctl_permission() is safe in RCU mode
return (mask & MAY_EXEC) ? -EACCES : 0; is non-blocking...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-20 10:44:19 -04:00
Al Viro
ec12781f19 cifs_permission() doesn't need to bail out in RCU mode
nothing potentially blocking except generic_permission(), which
will DTRT

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-20 10:44:07 -04:00
Al Viro
1712c20dae bad_inode_permission() is safe from RCU mode
return -EIO; is *not* a blocking operation, thank you very much.
Nick, what the hell have you been smoking?

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-20 10:44:00 -04:00
Dan Carpenter
185bf87393 ubifs: dereferencing an ERR_PTR in ubifs_mount()
d251ed271d "ubifs: fix sget races" left out the goto from this
error path so the static checkers complain that we're dereferencing
"sb" when it's an ERR_PTR.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-20 10:42:34 -04:00
J. Bruce Fields
105f462210 nfsd4: fix break_lease flags on nfsd open
Thanks to Casey Bodley for pointing out that on a read open we pass 0,
instead of O_RDONLY, to break_lease, with the result that a read open is
treated like a write open for the purposes of lease breaking!

Reported-by: Casey Bodley <cbodley@citi.umich.edu>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-06-20 10:38:01 -04:00
Boaz Harrosh
df18d127f4 pnfs-obj: No longer needed to take an extra ref at add_device
Andy's last device_cache patches, already take an extra
reference on the newly inserted device_id. So we can remove it
from obj-io.

Without this patch the device_ids are leaked.

Andy's patches are not in Linus tree yet. So I'm not sure if they are
scheduled for this Kernel or the next. This patch should be added as
part of these.

CC: Andy Adamson <andros@netapp.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-19 14:49:51 -04:00
Linus Torvalds
8816ead9d8 Merge branches 'perf-urgent-for-linus', 'sched-urgent-for-linus', 'timers-urgent-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tools/perf: Fix static build of perf tool
  tracing: Fix regression in printk_formats file

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  clocksource: Make watchdog robust vs. interruption
  timerfd: Fix wakeup of processes when timer is cancelled on clock change

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, MAINTAINERS: Add x86 MCE people
  x86, efi: Do not reserve boot services regions within reserved areas
2011-06-19 09:00:18 -07:00
Linus Torvalds
c11760c6d8 isofs: fix bh leak in isofs_fill_super() error case
In isofs_fill_super(), when an iso_primary_descriptor is found, it is
kept in pri_bh.  The error cases don't properly release it.  Fix it.

Reported-and-tested-by: 김원석 <stanley.will.kim@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-18 07:25:42 -07:00
Chris Mason
e999376f09 Btrfs: avoid delayed metadata items during commits
Snapshot creation has two phases.  One is the initial snapshot setup,
and the second is done during commit, while nobody is allowed to modify
the root we are snapshotting.

The delayed metadata insertion code can break that rule, it does a
delayed inode update on the inode of the parent of the snapshot,
and delayed directory item insertion.

This makes sure to run the pending delayed operations before we
record the snapshot root, which avoids corruptions.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-17 16:38:47 -04:00
David Sterba
35a30d7ce5 btrfs: fix uninitialized return value
When allocation fails in btrfs_read_fs_root_no_name, ret is not set
although it is returned, holding a garbage value.

Signed-off-by: David Sterba <dsterba@suse.cz>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-17 14:54:18 -04:00
Miao Xie
19fd294957 btrfs: fix wrong reservation when doing delayed inode operations
We have migrated the space for the delayed inode items from
trans_block_rsv to global_block_rsv, but we forgot to set trans->block_rsv to
global_block_rsv when we doing delayed inode operations, and the following Oops
happened:

[ 9792.654889] ------------[ cut here ]------------
[ 9792.654898] WARNING: at fs/btrfs/extent-tree.c:5681
btrfs_alloc_free_block+0xca/0x27c [btrfs]()
[ 9792.654899] Hardware name: To Be Filled By O.E.M.
[ 9792.654900] Modules linked in: btrfs zlib_deflate libcrc32c
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
arc4 rt61pci rt2x00pci rt2x00lib snd_hda_codec_hdmi mac80211
snd_hda_codec_realtek cfg80211 snd_hda_intel edac_core snd_seq rfkill
pcspkr serio_raw snd_hda_codec eeprom_93cx6 edac_mce_amd sp5100_tco
i2c_piix4 k10temp snd_hwdep snd_seq_device snd_pcm floppy r8169 xhci_hcd
mii snd_timer snd soundcore snd_page_alloc ipv6 firewire_ohci pata_acpi
ata_generic firewire_core pata_via crc_itu_t radeon ttm drm_kms_helper
drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
[ 9792.654919] Pid: 2762, comm: rm Tainted: G        W   2.6.39+ #1
[ 9792.654920] Call Trace:
[ 9792.654922]  [<ffffffff81053c4a>] warn_slowpath_common+0x83/0x9b
[ 9792.654925]  [<ffffffff81053c7c>] warn_slowpath_null+0x1a/0x1c
[ 9792.654933]  [<ffffffffa038e747>] btrfs_alloc_free_block+0xca/0x27c [btrfs]
[ 9792.654945]  [<ffffffffa03b8562>] ? map_extent_buffer+0x6e/0xa8 [btrfs]
[ 9792.654953]  [<ffffffffa038189b>] __btrfs_cow_block+0xfc/0x30c [btrfs]
[ 9792.654963]  [<ffffffffa0396aa6>] ? btrfs_buffer_uptodate+0x47/0x58 [btrfs]
[ 9792.654970]  [<ffffffffa0382e48>] ? read_block_for_search+0x94/0x368 [btrfs]
[ 9792.654978]  [<ffffffffa0381ba9>] btrfs_cow_block+0xfe/0x146 [btrfs]
[ 9792.654986]  [<ffffffffa03848b0>] btrfs_search_slot+0x14d/0x4b6 [btrfs]
[ 9792.654997]  [<ffffffffa03b8562>] ? map_extent_buffer+0x6e/0xa8 [btrfs]
[ 9792.655022]  [<ffffffffa03938e8>] btrfs_lookup_inode+0x2f/0x8f [btrfs]
[ 9792.655025]  [<ffffffff8147afac>] ? _cond_resched+0xe/0x22
[ 9792.655027]  [<ffffffff8147b892>] ? mutex_lock+0x29/0x50
[ 9792.655039]  [<ffffffffa03d41b1>] btrfs_update_delayed_inode+0x72/0x137 [btrfs]
[ 9792.655051]  [<ffffffffa03d4ea2>] btrfs_run_delayed_items+0x90/0xdb [btrfs]
[ 9792.655062]  [<ffffffffa039a69b>] btrfs_commit_transaction+0x228/0x654 [btrfs]
[ 9792.655064]  [<ffffffff8106e8da>] ? remove_wait_queue+0x3a/0x3a
[ 9792.655075]  [<ffffffffa03a2fa5>] btrfs_evict_inode+0x14d/0x202 [btrfs]
[ 9792.655077]  [<ffffffff81132bd6>] evict+0x71/0x111
[ 9792.655079]  [<ffffffff81132de0>] iput+0x12a/0x132
[ 9792.655081]  [<ffffffff8112aa3a>] do_unlinkat+0x106/0x155
[ 9792.655083]  [<ffffffff81127b83>] ? path_put+0x1f/0x23
[ 9792.655085]  [<ffffffff8109c53c>] ? audit_syscall_entry+0x145/0x171
[ 9792.655087]  [<ffffffff81128410>] ? putname+0x34/0x36
[ 9792.655090]  [<ffffffff8112b441>] sys_unlinkat+0x29/0x2b
[ 9792.655092]  [<ffffffff81482c42>] system_call_fastpath+0x16/0x1b
[ 9792.655093] ---[ end trace 02b696eb02b3f768 ]---

This patch fix it by setting the reservation of the transaction handle to the
correct one.

Reported-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-17 14:54:18 -04:00
Maarten Lankhorst
9fe6a50fb7 btrfs: Remove unused sysfs code
Removes code no longer used. The sysfs file itself is kept, because the
btrfs developers expressed interest in putting new entries to sysfs.

Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-17 14:54:18 -04:00
David Sterba
3ed4498caf btrfs: fix dereference of ERR_PTR value
smatch reports:

btrfs_recover_log_trees error: 'wc.replay_dest' dereferencing
possible ERR_PTR()

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-17 14:54:17 -04:00
Chris Mason
e038dca803 Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work into for-linus
Conflicts:
	fs/btrfs/transaction.c

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-17 14:16:13 -04:00
Linus Torvalds
01eff85b09 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: make log devices with write back caches work
  xfs: fix ->mknod() return value on xfs_get_acl() failure
2011-06-17 10:37:41 -07:00
Chris Mason
7585717f30 Btrfs: fix relocation races
The recent commit to get rid of our trans_mutex introduced
some races with block group relocation.  The problem is that relocation
needs to do some record keeping about each root, and it was relying
on the transaction mutex to coordinate things in subtle ways.

This fix adds a mutex just for the relocation code and makes sure
it doesn't have a big impact on normal operations.  The race is
really fixed in btrfs_record_root_in_trans, which is where we
step back and wait for the relocation code to finish accounting
setup.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-06-17 13:36:58 -04:00
David Howells
879669961b KEYS/DNS: Fix ____call_usermodehelper() to not lose the session keyring
____call_usermodehelper() now erases any credentials set by the
subprocess_inf::init() function.  The problem is that commit
17f60a7da1 ("capabilites: allow the application of capability limits
to usermode helpers") creates and commits new credentials with
prepare_kernel_cred() after the call to the init() function.  This wipes
all keyrings after umh_keys_init() is called.

The best way to deal with this is to put the init() call just prior to
the commit_creds() call, and pass the cred pointer to init().  That
means that umh_keys_init() and suchlike can modify the credentials
_before_ they are published and potentially in use by the rest of the
system.

This prevents request_key() from working as it is prevented from passing
the session keyring it set up with the authorisation token to
/sbin/request-key, and so the latter can't assume the authority to
instantiate the key.  This causes the in-kernel DNS resolver to fail
with ENOKEY unconditionally.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
Tested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-17 09:40:48 -07:00
Linus Torvalds
8b97b21e0f Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd:
  proc: Fix Oops on stat of /proc/<zombie pid>/ns/net
2011-06-16 15:02:20 -07:00
Trond Myklebust
ee7b75fc4f NFSv4: Fix a readdir regression
Commit 7ebb9315 (NFS: use secinfo when crossing mountpoints) introduces
a regression when decoding an NFSv4 readdir entry that sets the
rdattr_error field.
By treating the resulting value as if it is a decoding error, the current
code may cause us to skip valid readdir entries.

Reported-by: Andy Adamson <andros@netapp.com>
Cc: stable@kernel.org [2.6.39]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-16 13:24:47 -04:00
Linus Torvalds
8dac6bee32 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  AFS: Use i_generation not i_version for the vnode uniquifier
  AFS: Set s_id in the superblock to the volume name
  vfs: Fix data corruption after failed write in __block_write_begin()
  afs: afs_fill_page reads too much, or wrong data
  VFS: Fix vfsmount overput on simultaneous automount
  fix wrong iput on d_inode introduced by e6bc45d65d
  Delay struct net freeing while there's a sysfs instance refering to it
  afs: fix sget() races, close leak on umount
  ubifs: fix sget races
  ubifs: split allocation of ubifs_info into a separate function
  fix leak in proc_set_super()
2011-06-16 10:21:59 -07:00
Christoph Hellwig
a27a263bae xfs: make log devices with write back caches work
There's no reason not to support cache flushing on external log devices.
The only thing this really requires is flushing the data device first
both in fsync and log commits.  A side effect is that we also have to
remove the barrier write test during mount, which has been superflous
since the new FLUSH+FUA code anyway.  Also use the chance to flush the
RT subvolume write cache before the fsync commit, which is required
for correct semantics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-06-16 10:52:39 -05:00
David Howells
d6e43f751f AFS: Use i_generation not i_version for the vnode uniquifier
Store the AFS vnode uniquifier in the i_generation field, not the i_version
field of the inode struct.  i_version can then be given the AFS data version
number.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-16 11:44:48 -04:00
David Howells
2e41ae225f AFS: Set s_id in the superblock to the volume name
Set s_id in the superblock to the name of the AFS volume that this superblock
corresponds to.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-16 11:44:47 -04:00
Jan Kara
f9f07b6c13 vfs: Fix data corruption after failed write in __block_write_begin()
I've got a report of a file corruption from fsxlinux on ext3. The important
operations to the page were:
mapwrite to a hole
partial write to the page
read - found the page zeroed from the end of the normal write

The culprit seems to be that if get_block() fails in __block_write_begin()
(e.g. transient ENOSPC in ext3), the function does ClearPageUptodate(page).
Thus when we retry the write, the logic in __block_write_begin() thinks zeroing
of the page is needed and overwrites old data.  In fact, I don't see why we
should ever need to zero the uptodate bit here - either the page was uptodate
when we entered __block_write_begin() and it should stay so when we leave it,
or it was not uptodate and noone had right to set it uptodate during
__block_write_begin() so it remains !uptodate when we leave as well. So just
remove clearing of the bit.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-16 11:44:46 -04:00