Commit Graph

20418 Commits

Author SHA1 Message Date
Eric Paris
7d13162332 fanotify: fill in the metadata_len field on struct fanotify_event_metadata
The fanotify_event_metadata now has a field which is supposed to
indicate the length of the metadata portion of the event.  Fill in that
field as well.

Based-in-part-on-patch-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-12-15 13:58:18 -05:00
Lino Sanfilippo
fdbf3ceeb6 fanotify: Dont try to open a file descriptor for the overflow event
We should not try to open a file descriptor for the overflow event since this
will always fail.

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-12-07 16:14:24 -05:00
Eric Paris
2637919893 fanotify: do not leak user reference on allocation failure
If fanotify_init is unable to allocate a new fsnotify group it will
return but will not drop its reference on the associated user struct.
Drop that reference on error.

Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-12-07 16:14:23 -05:00
Eric Paris
a2ae4cc9a1 inotify: stop kernel memory leak on file creation failure
If inotify_init is unable to allocate a new file for the new inotify
group we leak the new group.  This patch drops the reference on the
group on file allocation failure.

Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
cc: stable@kernel.org
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-12-07 16:14:22 -05:00
Lino Sanfilippo
09e5f14e57 fanotify: on group destroy allow all waiters to bypass permission check
When fanotify_release() is called, there may still be processes waiting for
access permission. Currently only processes for which an event has already been
queued into the groups access list will be woken up.  Processes for which no
event has been queued will continue to sleep and thus cause a deadlock when
fsnotify_put_group() is called.
Furthermore there is a race allowing further processes to be waiting on the
access wait queue after wake_up (if they arrive before clear_marks_by_group()
is called).
This patch corrects this by setting a flag to inform processes that the group
is about to be destroyed and thus not to wait for access permission.

[additional changelog from eparis]
Lets think about the 4 relevant code paths from the PoV of the
'operator' 'listener' 'responder' and 'closer'.  Where operator is the
process doing an action (like open/read) which could require permission.
Listener is the task (or in this case thread) slated with reading from
the fanotify file descriptor.  The 'responder' is the thread responsible
for responding to access requests.  'Closer' is the thread attempting to
close the fanotify file descriptor.

The 'operator' is going to end up in:
fanotify_handle_event()
  get_response_from_access()
    (THIS BLOCKS WAITING ON USERSPACE)

The 'listener' interesting code path
fanotify_read()
  copy_event_to_user()
    prepare_for_access_response()
      (THIS CREATES AN fanotify_response_event)

The 'responder' code path:
fanotify_write()
  process_access_response()
    (REMOVE A fanotify_response_event, SET RESPONSE, WAKE UP 'operator')

The 'closer':
fanotify_release()
  (SUPPOSED TO CLEAN UP THE REST OF THIS MESS)

What we have today is that in the closer we remove all of the
fanotify_response_events and set a bit so no more response events are
ever created in prepare_for_access_response().

The bug is that we never wake all of the operators up and tell them to
move along.  You fix that in fanotify_get_response_from_access().  You
also fix other operators which haven't gotten there yet.  So I agree
that's a good fix.
[/additional changelog from eparis]

[remove additional changes to minimize patch size]
[move initialization so it was inside CONFIG_FANOTIFY_PERMISSION]

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-12-07 16:14:22 -05:00
Lino Sanfilippo
1734dee4e3 fanotify: Dont allow a mask of 0 if setting or removing a mark
In mark_remove_from_mask() we destroy marks that have their event mask cleared.
Thus we should not allow the creation of those marks in the first place.
With this patch we check if the mask given from user is 0 in case of FAN_MARK_ADD.
If so we return an error. Same for FAN_MARK_REMOVE since this does not have any
effect.

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-12-07 16:14:21 -05:00
Lino Sanfilippo
fa218ab98c fanotify: correct broken ref counting in case adding a mark failed
If adding a mount or inode mark failed fanotify_free_mark() is called explicitly.
But at this time the mark has already been put into the destroy list of the
fsnotify_mark kernel thread. If the thread is too slow it will try to decrease
the reference of a mark, that has already been freed by fanotify_free_mark().
(If its fast enough it will only decrease the marks ref counter from 2 to 1 - note
that the counter has been increased to 2 in add_mark() - which has practically no
effect.)

This patch fixes the ref counting by not calling free_mark() explicitly, but
decreasing the ref counter and rely on the fsnotify_mark thread to cleanup in
case adding the mark has failed.

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-12-07 16:14:21 -05:00
Lino Sanfilippo
b1085ba80c fanotify: if set by user unset FMODE_NONOTIFY before fsnotify_perm() is called
Unsetting FMODE_NONOTIFY in fsnotify_open() is too late, since fsnotify_perm()
is called before. If FMODE_NONOTIFY is set fsnotify_perm() will skip permission
checks, so a user can still disable permission checks by setting this flag
in an open() call.
This patch corrects this by unsetting the flag before fsnotify_perm is called.

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-12-07 16:14:21 -05:00
Eric Paris
ecf6f5e7d6 fanotify: deny permissions when no event was sent
If no event was sent to userspace we cannot expect userspace to respond to
permissions requests.  Today such requests just hang forever. This patch will
deny any permissions event which was unable to be sent to userspace.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@sophos.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-12-07 16:14:17 -05:00
Linus Torvalds
aa3fc52546 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (24 commits)
  Btrfs: don't use migrate page without CONFIG_MIGRATION
  Btrfs: deal with DIO bios that span more than one ordered extent
  Btrfs: setup blank root and fs_info for mount time
  Btrfs: fix fiemap
  Btrfs - fix race between btrfs_get_sb() and umount
  Btrfs: update inode ctime when using links
  Btrfs: make sure new inode size is ok in fallocate
  Btrfs: fix typo in fallocate to make it honor actual size
  Btrfs: avoid NULL pointer deref in try_release_extent_buffer
  Btrfs: make btrfs_add_nondir take parent inode as an argument
  Btrfs: hold i_mutex when calling btrfs_log_dentry_safe
  Btrfs: use dget_parent where we can UPDATED
  Btrfs: fix more ESTALE problems with NFS
  Btrfs: handle NFS lookups properly
  btrfs: make 1-bit signed fileds unsigned
  btrfs: Show device attr correctly for symlinks
  btrfs: Set file size correctly in file clone
  btrfs: Check if dest_offset is block-size aligned before cloning file
  Btrfs: handle the space_cache option properly
  btrfs: Fix early enospc because 'unused' calculated with wrong sign.
  ...
2010-11-29 14:11:08 -08:00
Linus Torvalds
1bfe4eefe5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
  GFS2: Userland expects quota limit/warn/usage in 512b blocks
2010-11-29 14:10:22 -08:00
Chris Mason
5a92bc88ce Btrfs: don't use migrate page without CONFIG_MIGRATION
Fixes compile error

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-29 09:49:11 -05:00
Chris Mason
163cf09c2a Btrfs: deal with DIO bios that span more than one ordered extent
The new DIO bio splitting code has problems when the bio
spans more than one ordered extent.  This will happen as the
generic DIO code merges our get_blocks calls together into
a bigger single bio.

This fixes things by walking forward in the ordered extent
code finding all the overlapping ordered extents and completing them
all at once.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-28 19:56:33 -05:00
Linus Torvalds
7208364652 Un-inline get_pipe_info() helper function
This avoids some include-file hell, and the function isn't really
important enough to be inlined anyway.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-28 16:27:19 -08:00
Linus Torvalds
c66fb34794 Export 'get_pipe_info()' to other users
And in particular, use it in 'pipe_fcntl()'.

The other pipe functions do not need to use the 'careful' version, since
they are only ever called for things that are already known to be pipes.

The normal read/write/ioctl functions are called through the file
operations structures, so if a file isn't a pipe, they'd never get
called.  But pipe_fcntl() is special, and called directly from the
generic fcntl code, and needs to use the same careful function that the
splice code is using.

Cc: Jens Axboe <jaxboe@fusionio.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-28 14:09:57 -08:00
Linus Torvalds
71993e62a4 Rename 'pipe_info()' to 'get_pipe_info()'
.. and change it to take the 'file' pointer instead of an inode, since
that's what all users want anyway.

The renaming is preparatory to exporting it to other users.  The old
'pipe_info()' name was too generic and is already used elsewhere, so
before making the function public we need to use a more specific name.

Cc: Jens Axboe <jaxboe@fusionio.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-28 13:56:09 -08:00
Josef Bacik
450ba0ea06 Btrfs: setup blank root and fs_info for mount time
There is a problem with how we use sget, it searches through the list of supers
attached to the fs_type looking for a super with the same fs_devices as what
we're trying to mount.  This depends on sb->s_fs_info being filled, but we don't
fill that in until we get to btrfs_fill_super, so we could hit supers on the
fs_type super list that have a null s_fs_info.  In order to fix that we need to
go ahead and setup a blank root with a blank fs_info to hold fs_devices, that
way our test will work out right and then we can set s_fs_info in
btrfs_set_super, and then open_ctree will simply use our pre-allocated root and
fs_info when setting everything up.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-27 13:37:51 -05:00
Josef Bacik
975f84fee2 Btrfs: fix fiemap
There are two big problems currently with FIEMAP

1) We return extents for holes.  This isn't supposed to happen, we just don't
return extents for holes and then userspace interprets the lack of an extent as
a hole.

2) We sometimes don't set FIEMAP_EXTENT_LAST properly.  This is because we wait
to see a EXTENT_FLAG_VACANCY flag on the em, but this won't happen if say we ask
fiemap to map up to the last extent in a file, and there is nothing but holes up
to the i_size.  To fix this we need to lookup the last extent in this file and
save the logical offset, so if we happen to try and map that extent we can be
sure to set FIEMAP_EXTENT_LAST.

With this patch we now pass xfstest 225, which we never have before.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-27 13:37:50 -05:00
Ian Kent
619c8c7639 Btrfs - fix race between btrfs_get_sb() and umount
When mounting a btrfs file system btrfs_test_super() may attempt to
use sb->s_fs_info, the btrfs root, of a super block that is going away
and that has had the btrfs root set to NULL in its ->put_super(). But
if the super block is going away it cannot be an existing super block
so we can return false in this case.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-27 13:37:44 -05:00
Josef Bacik
bc1cbf1f86 Btrfs: update inode ctime when using links
Currently we fail xfstest 236 because we're not updating the inode ctime on
link.  This is a simple fix, and makes it so we pass 236 now.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-27 13:00:07 -05:00
Josef Bacik
0ed42a63f3 Btrfs: make sure new inode size is ok in fallocate
We have been failing xfstest 228 forever, because we don't check to make sure
the new inode size is acceptable as far as RLIMIT is concerned.  Just check to
make sure it's ok to create a inode with this new size and error out if not.
With this patch we now pass 228.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-27 13:00:07 -05:00
Josef Bacik
55a61d1d06 Btrfs: fix typo in fallocate to make it honor actual size
There is a typo in __btrfs_prealloc_file_range() where we set the i_size to
actual_len/cur_offset, and then just set it to cur_offset again, and do the same
with btrfs_ordered_update_i_size().  This fixes it back to keeping i_size in a
local variable and then updating i_size properly.  Tested this with

xfs_io -F -f -c "falloc 0 1" -c "pwrite 0 1" foo

stat'ing foo gives us a size of 1 instead of 4096 like it was.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-27 12:59:16 -05:00
Linus Torvalds
19650e8580 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Ensure we return the dirent->d_type when it is known
  NFS: Correct the array bound calculation in nfs_readdir_add_to_array
  NFS: Don't ignore errors from nfs_do_filldir()
  NFS: Fix the error handling in "uncached_readdir()"
  NFS: Fix a page leak in uncached_readdir()
  NFS: Fix a page leak in nfs_do_filldir()
  NFS: Assume eof if the server returns no readdir records
  NFS: Buffer overflow in ->decode_dirent() should not be fatal
  Pure nfs client performance using odirect.
  SUNRPC: Fix an infinite loop in call_refresh/call_refreshresult
2010-11-27 07:30:30 +09:00
Linus Torvalds
78daa87b1d Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  cciss: fix build for PROC_FS disabled
  block: fix amiga and atari floppy driver compile warning
  blk-throttle: Fix calculation of max number of WRITES to be dispatched
  ioprio: grab rcu_read_lock in sys_ioprio_{set,get}()
  xen/blkfront: cope with backend that fail empty BLKIF_OP_WRITE_BARRIER requests
  xen/blkfront: Implement FUA with BLKIF_OP_WRITE_BARRIER
  xen/blkfront: change blk_shadow.request to proper pointer
  xen/blkfront: map REQ_FLUSH into a full barrier
2010-11-27 07:17:50 +09:00
Linus Torvalds
0a66a59649 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: fix typo in comment of nilfs_dat_move function
  nilfs2: nilfs_iget_for_gc() returns ERR_PTR
2010-11-27 07:14:00 +09:00
Frederic Weisbecker
da905873ef reiserfs: fix inode mutex - reiserfs lock misordering
reiserfs_unpack() locks the inode mutex with reiserfs_mutex_lock_safe()
to protect against reiserfs lock dependency.  However this protection
requires to have the reiserfs lock to be locked.

This is the case if reiserfs_unpack() is called by reiserfs_ioctl but
not from reiserfs_quota_on() when it tries to unpack tails of quota
files.

Fix the ordering of the two locks in reiserfs_unpack() to fix this
issue.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reported-by: Markus Gapp <markus.gapp@gmx.net>
Reported-by: Jan Kara <jack@suse.cz>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: <stable@kernel.org>		[2.6.36.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-25 06:50:48 +09:00
Naoya Horiguchi
ea251c1d5c pagemap: set pagemap walk limit to PMD boundary
Currently one pagemap_read() call walks in PAGEMAP_WALK_SIZE bytes (== 512
pages.) But there is a corner case where walk_pmd_range() accidentally
runs over a VMA associated with a hugetlbfs file.

For example, when a process has mappings to VMAs as shown below:

  # cat /proc/<pid>/maps
  ...
  3a58f6d000-3a58f72000 rw-p 00000000 00:00 0
  7fbd51853000-7fbd51855000 rw-p 00000000 00:00 0
  7fbd5186c000-7fbd5186e000 rw-p 00000000 00:00 0
  7fbd51a00000-7fbd51c00000 rw-s 00000000 00:12 8614   /hugepages/test

then pagemap_read() goes into walk_pmd_range() path and walks in the range
0x7fbd51853000-0x7fbd51a53000, but the hugetlbfs VMA should be handled by
walk_hugetlb_range().  Otherwise PMD for the hugepage is considered bad
and cleared, which causes undesirable results.

This patch fixes it by separating pagemap walk range into one PMD.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-25 06:50:46 +09:00
Ken Sumrall
a0822c5577 fuse: fix attributes after open(O_TRUNC)
The attribute cache for a file was not being cleared when a file is opened
with O_TRUNC.

If the filesystem's open operation truncates the file ("atomic_o_trunc"
feature flag is set) then the kernel should invalidate the cached st_mtime
and st_ctime attributes.

Also i_size should be explicitly be set to zero as it is used sometimes
without refreshing the cache.

Signed-off-by: Ken Sumrall <ksumrall@android.com>
Cc: Anfei <anfei.zhou@gmail.com>
Cc: "Anand V. Avati" <avati@gluster.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-25 06:50:41 +09:00
Ryusuke Konishi
f6c26ec508 nilfs2: fix typo in comment of nilfs_dat_move function
Fixes a typo: "uncommited" -> "uncommitted".

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-11-24 12:51:48 +09:00
Dan Carpenter
103cfcf522 nilfs2: nilfs_iget_for_gc() returns ERR_PTR
nilfs_iget_for_gc() returns an ERR_PTR() on failure and doesn't return
NULL.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-11-23 16:32:19 +09:00
Trond Myklebust
0b26a0bf6f NFS: Ensure we return the dirent->d_type when it is known
Store the dirent->d_type in the struct nfs_cache_array_entry so that we
can use it in getdents() calls.

This fixes a regression with the new readdir code.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-22 13:24:48 -05:00
Trond Myklebust
3020093f57 NFS: Correct the array bound calculation in nfs_readdir_add_to_array
It looks as if the array size calculation in MAX_READDIR_ARRAY does not
take the alignment of struct nfs_cache_array_entry into account.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-22 13:24:47 -05:00
Trond Myklebust
ece0b4233b NFS: Don't ignore errors from nfs_do_filldir()
We should ignore the errors from the filldir callback, and just interpret
them as meaning we should exit, however we should definitely pass back
ENOMEM errors.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-22 13:24:47 -05:00
Trond Myklebust
85f8607e16 NFS: Fix the error handling in "uncached_readdir()"
Currently, uncached_readdir() is broken because if fails to handle
the results from nfs_readdir_xdr_to_array() correctly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-22 13:24:46 -05:00
Trond Myklebust
7a8e1dc34f NFS: Fix a page leak in uncached_readdir()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-22 13:24:45 -05:00
Trond Myklebust
e7c58e974a NFS: Fix a page leak in nfs_do_filldir()
nfs_do_filldir() must always free desc->page when it is done, otherwise
we end up leaking the page.

Also remove unused variable 'dentry'.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-22 13:24:44 -05:00
Trond Myklebust
5c346854d8 NFS: Assume eof if the server returns no readdir records
Some servers are known to be buggy w.r.t. this. Deal with them...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-22 13:24:44 -05:00
Trond Myklebust
463a376eae NFS: Buffer overflow in ->decode_dirent() should not be fatal
Overflowing the buffer in the readdir ->decode_dirent() should not lead to
a fatal error, but rather to an attempt to reread the record in question.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-22 13:24:43 -05:00
Arun Bharadwaj
b47d19de2c Pure nfs client performance using odirect.
When an application opens a file with O_DIRECT flag, if the size of
the data that is written is equal to wsize, the client sends a
WRITE RPC with stable flag set to UNSTABLE followed by a single
COMMIT RPC rather than sending a single WRITE RPC with the stable
flag set to FILE_SYNC. This a bug.

Patch to fix this.

Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-22 13:24:42 -05:00
Chris Mason
45f49bce99 Btrfs: avoid NULL pointer deref in try_release_extent_buffer
If we fail to find a pointer in the radix tree, don't try
to deref the NULL one we do have.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-21 22:27:44 -05:00
Josef Bacik
a1b075d28d Btrfs: make btrfs_add_nondir take parent inode as an argument
Everybody who calls btrfs_add_nondir just passes in the dentry of the new file
and then dereference dentry->d_parent->d_inode, but everybody who calls
btrfs_add_nondir() are already passed the parent's inode.  So instead of
dereferencing dentry->d_parent, just make btrfs_add_nondir take the dir inode as
an argument and pass that along so we don't have to worry about d_parent.
Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-21 22:26:10 -05:00
Josef Bacik
495e86779f Btrfs: hold i_mutex when calling btrfs_log_dentry_safe
Since we walk up the path logging all of the parts of the inode's path, we need
to hold i_mutex to make sure that the inode is not renamed while we're logging
everything.  btrfs_log_dentry_safe does dget_parent and all of that jazz, but we
may get unexpected results if the rename changes the inode's location while
we're higher up the path logging those dentries, so do this for safety reasons.
Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-21 22:26:09 -05:00
Josef Bacik
6a91221304 Btrfs: use dget_parent where we can UPDATED
There are lots of places where we do dentry->d_parent->d_inode without holding
the dentry->d_lock.  This could cause problems with rename.  So instead we need
to use dget_parent() and hold the reference to the parent as long as we are
going to use it's inode and then dput it at the end.

Signed-off-by: Josef Bacik <josef@redhat.com>
Cc: raven@themaw.net
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-21 22:26:09 -05:00
Josef Bacik
7619585390 Btrfs: fix more ESTALE problems with NFS
When creating new inodes we don't setup inode->i_generation.  So if we generate
an fh with a newly created inode we save the generation of 0, but if we flush
the inode to disk and have to read it back when getting the inode on the server
we'll have the right i_generation, so gens wont match and we get ESTALE.  This
patch properly sets inode->i_generation when we create the new inode and now I'm
no longer getting ESTALE.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-21 22:26:08 -05:00
Josef Bacik
2ede0daf01 Btrfs: handle NFS lookups properly
People kept reporting NFS issues, specifically getting ESTALE alot.  I figured
out how to reproduce the problem

SERVER
mkfs.btrfs /dev/sda1
mount /dev/sda1 /mnt/btrfs-test
<add /mnt/btrfs-test to /etc/exports>
btrfs subvol create /mnt/btrfs-test/foo
service nfs start

CLIENT
mount server:/mnt/btrfs /mnt/test
cd /mnt/test/foo
ls

SERVER
echo 3 > /proc/sys/vm/drop_caches

CLIENT
ls			<-- get an ESTALE here

This is because the standard way to lookup a name in nfsd is to use readdir, and
what it does is do a readdir on the parent directory looking for the inode of
the child.  So in this case the parent being / and the child being foo.  Well
subvols all have the same inode number, so doing a readdir of / looking for
inode 256 will return '.', which obviously doesn't match foo.  So instead we
need to have our own .get_name so that we can find the right name.

Our .get_name will either lookup the inode backref or the root backref,
whichever we're looking for, and return the name we find.  Running the above
reproducer with this patch results in everything acting the way its supposed to.
Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-21 22:26:08 -05:00
Mariusz Kozlowski
0410c94aff btrfs: make 1-bit signed fileds unsigned
Fixes these sparse warnings:
fs/btrfs/ctree.h:811:17: error: dubious one-bit signed bitfield
fs/btrfs/ctree.h:812:20: error: dubious one-bit signed bitfield
fs/btrfs/ctree.h:813:19: error: dubious one-bit signed bitfield

Signed-off-by: Mariusz Kozlowski <mk@lab.zgora.pl>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-21 22:26:07 -05:00
Li Zefan
f209561ad8 btrfs: Show device attr correctly for symlinks
Symlinks and files of other types show different device numbers, though
they are on the same partition:

 $ touch tmp; ln -s tmp tmp2; stat tmp tmp2
   File: `tmp'
   Size: 0         	Blocks: 0          IO Block: 4096   regular empty file
 Device: 15h/21d	Inode: 984027      Links: 1
 --- snip ---
   File: `tmp2' -> `tmp'
   Size: 3         	Blocks: 0          IO Block: 4096   symbolic link
 Device: 13h/19d	Inode: 984028      Links: 1

Reported-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-21 22:26:07 -05:00
Li Zefan
5f3888ff6f btrfs: Set file size correctly in file clone
Set src_offset = 0, src_length = 20K, dest_offset = 20K. And the
original filesize of the dest file 'file2' is 30K:

  # ls -l /mnt/file2
  -rw-r--r-- 1 root root 30720 Nov 18 16:42 /mnt/file2

Now clone file1 to file2, the dest file should be 40K, but it
still shows 30K:

  # ls -l /mnt/file2
  -rw-r--r-- 1 root root 30720 Nov 18 16:42 /mnt/file2

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-21 22:26:06 -05:00
Li Zefan
2a6b8daeda btrfs: Check if dest_offset is block-size aligned before cloning file
We've done the check for src_offset and src_length, and We should
also check dest_offset, otherwise we'll corrupt the destination
file:

  (After cloning file1 to file2 with unaligned dest_offset)
  # cat /mnt/file2
  cat: /mnt/file2: Input/output error

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-21 22:26:05 -05:00
Josef Bacik
0de90876c6 Btrfs: handle the space_cache option properly
When I added the clear_cache option I screwed up and took the break out of
the space_cache case statement, so whenever you mount with space_cache you also
get clear_cache, which does you no good if you say set space_cache in fstab so
it always gets set.  This patch adds the break back in properly.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-11-21 22:26:05 -05:00