Commit Graph

47592 Commits

Author SHA1 Message Date
Nikolay Borisov
4176bdbf2d btrfs: Make btrfs_record_unlink_dir take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:53 +01:00
Nikolay Borisov
ab1717b2ab btrfs: Make btrfs_must_commit_transaction take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:53 +01:00
Nikolay Borisov
f5cc7b80a6 btrfs: Make btrfs_inode_delayed_dir_index_count take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:53 +01:00
Nikolay Borisov
5f4b32e94a btrfs: Make btrfs_commit_inode_delayed_items take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:53 +01:00
Nikolay Borisov
aa79021fde btrfs: Make btrfs_commit_inode_delayed_inode take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:53 +01:00
Nikolay Borisov
f48d1cf59c btrfs: Make btrfs_remove_delayed_node take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:53 +01:00
Nikolay Borisov
4ccb5c7231 btrfs: Make btrfs_kill_delayed_inode_items take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:52 +01:00
Nikolay Borisov
e07222c7d2 btrfs: Make btrfs_delayed_delete_inode_ref take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:52 +01:00
Nikolay Borisov
e67bbbb9d0 btrfs: Make btrfs_delete_delayed_dir_index take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:52 +01:00
Nikolay Borisov
6f45d18568 btrfs: Make btrfs_insert_delayed_dir_index take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:52 +01:00
Nikolay Borisov
fcabdd1ca5 btrfs: Make btrfs_delayed_inode_reserve_metadata take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:52 +01:00
Nikolay Borisov
e5517a7bff btrfs: Make btrfs_get_or_create_delayed_node take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:51 +01:00
Nikolay Borisov
340c6ca9fd btrfs: Make btrfs_get_delayed_node take btrfs_inode
This function is internal to btrfs and doesn't really deal with any
VFS members, as such it needn't take a struct inode refrence but
btrfs_inode.

Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:51 +01:00
Nikolay Borisov
4a0cc7ca6c btrfs: Make btrfs_ino take a struct btrfs_inode
Currently btrfs_ino takes a struct inode and this causes a lot of
internal btrfs functions which consume this ino to take a VFS inode,
rather than btrfs' own struct btrfs_inode. In order to fix this "leak"
of VFS structs into the internals of btrfs first it's necessary to
eliminate all uses of struct inode for the purpose of inode. This patch
does that by using BTRFS_I to convert an inode to btrfs_inode. With
this problem eliminated subsequent patches will start eliminating the
passing of struct inode altogether, eventually resulting in a lot cleaner
code.

Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
[ fix btrfs_get_extent tracepoint prototype ]
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:51 +01:00
David Sterba
823bb20ab4 btrfs: add wrapper for counting BTRFS_MAX_EXTENT_SIZE
The expression is open-coded in several places, this asks for a wrapper.
As we know the MAX_EXTENT fits to u32, we can use the appropirate
division helper. This cascades to the result type updates.

Compiler is clever enough to use shift instead of integer division, so
there's no change in the generated assembly.

Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:51 +01:00
David Sterba
95995dbbe6 btrfs: remove unused logic of limiting async delalloc pages
A proposed patch in https://marc.info/?l=linux-btrfs&m=147859791003837
pointed out bad limit threshold in cow_file_range_async, but it turned
out that the whole logic is not necessary and is done by writeback. We
agreed to remove it.

Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:51 +01:00
Anand Jain
26d30f8529 btrfs: consolidate auto defrag kick off policies
As of now writes smaller than 64k for non compressed extents and 16k
for compressed extents inside eof are considered as candidate
for auto defrag, put them together at a place.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:50 +01:00
Anand Jain
8c3e6b1f0c btrfs: btrfs_defrag_root() doesn't defrag extent root tree
Since btrfs_defrag_leaves() does not support extent_root, remove its
corresponding call. The user can use the file based defrag to defrag
extents as of now.

No change in behaviour as extent_root is explicitly skipped in
btrfs_defrag_leaves and this has never worked as expected.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ ehnance changelong ]
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:50 +01:00
Jeff Mahoney
fef394f75b btrfs: drop unused extent_op arg from btrfs_add_delayed_data_ref
btrfs_add_delayed_data_ref is always called with a NULL extent_op,
so let's drop the argument.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:50 +01:00
Colin Ian King
694a0dee9c btrfs: remove redundant inode null check
The check for a null inode is redundant since the function
is a callback for exportfs, which will itself crash if
dentry->d_inode or parent->d_inode is NULL.  Removing the
null check makes this consistent with other file systems.

Also remove the redundant null dir check too.

Found with static analysis by CoverityScan, CID 1389472

Kudos to Jeff Mahoney for reviewing and explaining the error in
my original patch (most of this explanation went into the above
commit message) and David Sterba for pointing out that the dir
check is also redundant.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:50 +01:00
Seraphime Kirkovski
20c7bcec6f Btrfs: ACCESS_ONCE cleanup
This replaces ACCESS_ONCE macro with the corresponding
READ|WRITE macros

Signed-off-by: Seraphime Kirkovski <kirkseraph@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:50 +01:00
Seraphime Kirkovski
50d0446e68 Btrfs: code cleanup min/max -> min_t/max_t
This cleans up the cases where the min/max macros were used with a cast
rather than using directly min_t/max_t.

Signed-off-by: Seraphime Kirkovski <kirkseraph@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:50 +01:00
Geliang Tang
6b4df8b6c5 btrfs: use rb_entry() instead of container_of
To make the code clearer, use rb_entry() instead of container_of() to
deal with rbtree.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:49 +01:00
Anand Jain
f74670f713 btrfs: use BTRFS_COMPRESS_NONE to specify no compression
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:49 +01:00
Michal Hocko
1aceabf362 btrfs: drop gfp mask tweaking in try_release_extent_state
try_release_extent_state reduces the gfp mask to GFP_NOFS if it is
compatible. This is true for GFP_KERNEL as well. There is no real
reason to do that though. There is no new lock taken down the
the only consumer of the gfp mask which is
try_release_extent_state
  clear_extent_bit
    __clear_extent_bit
      alloc_extent_state

So this seems just unnecessary and confusing.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:49 +01:00
Michal Hocko
3ba7ab220e btrfs: fix up misleading GFP_NOFS usage in btrfs_releasepage
b335b0034e ("Btrfs: Avoid using __GFP_HIGHMEM with slab allocator")
has reduced the allocation mask in btrfs_releasepage to GFP_NOFS just
to prevent from giving an unappropriate gfp mask to the slab allocator
deeper down the callchain (in alloc_extent_state). This is wrong for
two reasons a) GFP_NOFS might be just too restrictive for the calling
context b) it is better to tweak the gfp mask down when it needs that.

So just remove the mask tweaking from btrfs_releasepage and move it
down to alloc_extent_state where it is needed.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:49 +01:00
Qu Wenruo
18dc22c19b btrfs: Add WARN_ON for qgroup reserved underflow
Goldwyn Rodrigues has exposed and fixed a bug which underflows btrfs
qgroup reserved space, and leads to non-writable fs.

This reminds us that we don't have enough underflow check for qgroup
reserved space.

For underflow case, we should not really underflow the numbers but warn
and keeps qgroup still work.

So add more check on qgroup reserved space and add WARN_ON() and
btrfs_warn() for any underflow case.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:49 +01:00
Linus Torvalds
2b95550a43 Merge branch 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "This has two last minute fixes. The highest priority here is a
  regression fix for the decompression code, but we also fixed up a
  problem with the 32-bit compat ioctls.

  The decompression bug could hand back the wrong data on big reads when
  zlib was used. I have a larger cleanup to make the math here less
  error prone, but at this stage in the release Omar's patch is the best
  choice"

* 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: fix btrfs_decompress_buf2page()
  btrfs: fix btrfs_compat_ioctl failures on non-compat ioctls
2017-02-11 09:15:58 -08:00
Omar Sandoval
6e78b3f7a1 Btrfs: fix btrfs_decompress_buf2page()
If btrfs_decompress_buf2page() is handed a bio with its page in the
middle of the working buffer, then we adjust the offset into the working
buffer. After we copy into the bio, we advance the iterator by the
number of bytes we copied. Then, we have some logic to handle the case
of discontiguous pages and adjust the offset into the working buffer
again. However, if we didn't advance the bio to a new page, we may enter
this case in error, essentially repeating the adjustment that we already
made when we entered the function. The end result is bogus data in the
bio.

Previously, we only checked for this case when we advanced to a new
page, but the conversion to bio iterators changed that. This restores
the old, correct behavior.

A case I saw when testing with zlib was:

    buf_start = 42769
    total_out = 46865
    working_bytes = total_out - buf_start = 4096
    start_byte = 45056

The condition (total_out > start_byte && buf_start < start_byte) is
true, so we adjust the offset:

    buf_offset = start_byte - buf_start = 2287
    working_bytes -= buf_offset = 1809
    current_buf_start = buf_start = 42769

Then, we copy

    bytes = min(bvec.bv_len, PAGE_SIZE - buf_offset, working_bytes) = 1809
    buf_offset += bytes = 4096
    working_bytes -= bytes = 0
    current_buf_start += bytes = 44578

After bio_advance(), we are still in the same page, so start_byte is the
same. Then, we check (total_out > start_byte && current_buf_start < start_byte),
which is true! So, we adjust the values again:

    buf_offset = start_byte - buf_start = 2287
    working_bytes = total_out - start_byte = 1809
    current_buf_start = buf_start + buf_offset = 45056

But note that working_bytes was already zero before this, so we should
have stopped copying.

Fixes: 974b1adc3b ("btrfs: use bio iterators for the decompression handlers")
Reported-by: Pat Erley <pat-lkml@erley.org>
Reviewed-by: Chris Mason <clm@fb.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Tested-by: Liu Bo <bo.li.liu@oracle.com>
2017-02-10 19:11:03 -08:00
Linus Torvalds
7fe654dca2 This patch turned out to have a couple problems. The problems are
fixable, but at least one of the fixes is a little ugly.  The original
 bug has always been there, so we can wait another week or two to get
 this right.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYncSKAAoJECebzXlCjuG+txsP/iqR4qsw9BOpLDb0Jf3sRpcj
 FzNaHC3a8yIsQsgEBqaoslg1ZUIWzbcudxR4G1X0ieeE00uFLNs4/nYDBlQlYAKi
 bXtUcHyIqpU9eCDfe+BdrNDyfoMizQW+OoYVoLzXWR8jEzEe3HWGOtSv8Y1McIwo
 N75lTYfzYtM3xrUjCaQMY5BgwmOhXZMUh0bkyTGKNv0V/HY2zRrDWbNShsZnh3M+
 ywhoDGHKCo1fMM8Gs4fQU4V5z6RXVqi7kNjoAdAjyxsPsBc0U7ggjNfq0mlal0FT
 tgGoa+KHuCyMAfbZ5gQVuqetuZXiBnc5CDyxg7idClY/aDzmwPO5dIHXoWUwCrHR
 Y/zJRQ0h0jfZK8Mis/T2c9ueYPkSERLHY9Pol8g35FOlkJhR2kc30wQM33Ya0hsk
 Hp3qPHWrVoL3AbxuPHkY3aF/YrCIOsHiqdTvxctzrPI9V+rI0Zqx0uro/yOZ/17C
 T38O9usblHa5KGT0rFVQDBIuZaF6lNWTV+gCQNFFu+Xxl01i82CQohmckYvV/iC7
 zWnMBh7lTXwCmzw6/lWI7d277QdkfJkMYlmsOFPrkV1qxZ92KFJyqas1NFAt0Gg2
 V8RmBABd6JSqmA0RWQxNh9iy9Ot53HkugTDMZKgBsBIIiI88DcY2hkhanpRhUXVn
 /3uBqGHbWw4wwQaeZiRe
 =1kRc
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.10-3' of git://linux-nfs.org/~bfields/linux

Pull nfsd revert from Bruce Fields:
 "This patch turned out to have a couple problems. The problems are
  fixable, but at least one of the fixes is a little ugly. The original
  bug has always been there, so we can wait another week or two to get
  this right"

* tag 'nfsd-4.10-3' of git://linux-nfs.org/~bfields/linux:
  nfsd: Revert "nfsd: special case truncates some more"
2017-02-10 14:23:45 -08:00
J. Bruce Fields
0839ffb83e nfsd: Revert "nfsd: special case truncates some more"
This patch incorrectly attempted nested mnt_want_write, and incorrectly
disabled nfsd's owner override for truncate.  We'll fix those problems
and make another attempt soon, for the moment I think the safest is to
revert.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-02-09 20:49:20 -05:00
Brian Norris
8672aed7bd pstore: don't OOPS when there are no ftrace zones
We'll OOPS in ramoops_get_next_prz() if the platform didn't ask for any
ftrace zones (i.e., cxt->fprzs will be NULL). Let's just skip this
entire FTRACE section if there's no 'fprzs'.

Regression seen on a coreboot/depthcharge-based Chromebook.

Fixes: 2fbea82bbb ("pstore: Merge per-CPU ftrace records into one")
Cc: Joel Fernandes <joelaf@google.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2017-02-09 11:49:49 -08:00
Jeff Mahoney
2a36224918 btrfs: fix btrfs_compat_ioctl failures on non-compat ioctls
Commit 4c63c2454e incorrectly assumed that returning -ENOIOCTLCMD would
cause the native ioctl to be called.  The ->compat_ioctl callback is
expected to handle all ioctls, not just compat variants.  As a result,
when using 32-bit userspace on 64-bit kernels, everything except those
three ioctls would return -ENOTTY.

Fixes: 4c63c2454e ("btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl")
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-08 17:47:30 +01:00
Hugh Dickins
b6789123bc mm: fix KPF_SWAPCACHE in /proc/kpageflags
Commit 6326fec112 ("mm: Use owner_priv bit for PageSwapCache, valid
when PageSwapBacked") aliased PG_swapcache to PG_owner_priv_1 (and
depending on PageSwapBacked being true).

As a result, the KPF_SWAPCACHE bit in '/proc/kpageflags' should now be
synthesized, instead of being shown on unrelated pages which just happen
to have PG_owner_priv_1 set.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-07 12:08:32 -08:00
Michal Hocko
d1908f5255 fs: break out of iomap_file_buffered_write on fatal signals
Tetsuo has noticed that an OOM stress test which performs large write
requests can cause the full memory reserves depletion.  He has tracked
this down to the following path

	__alloc_pages_nodemask+0x436/0x4d0
	alloc_pages_current+0x97/0x1b0
	__page_cache_alloc+0x15d/0x1a0          mm/filemap.c:728
	pagecache_get_page+0x5a/0x2b0           mm/filemap.c:1331
	grab_cache_page_write_begin+0x23/0x40   mm/filemap.c:2773
	iomap_write_begin+0x50/0xd0             fs/iomap.c:118
	iomap_write_actor+0xb5/0x1a0            fs/iomap.c:190
	? iomap_write_end+0x80/0x80             fs/iomap.c:150
	iomap_apply+0xb3/0x130                  fs/iomap.c:79
	iomap_file_buffered_write+0x68/0xa0     fs/iomap.c:243
	? iomap_write_end+0x80/0x80
	xfs_file_buffered_aio_write+0x132/0x390 [xfs]
	? remove_wait_queue+0x59/0x60
	xfs_file_write_iter+0x90/0x130 [xfs]
	__vfs_write+0xe5/0x140
	vfs_write+0xc7/0x1f0
	? syscall_trace_enter+0x1d0/0x380
	SyS_write+0x58/0xc0
	do_syscall_64+0x6c/0x200
	entry_SYSCALL64_slow_path+0x25/0x25

the oom victim has access to all memory reserves to make a forward
progress to exit easier.  But iomap_file_buffered_write and other
callers of iomap_apply loop to complete the full request.  We need to
check for fatal signals and back off with a short write instead.

As the iomap_apply delegates all the work down to the actor we have to
hook into those.  All callers that work with the page cache are calling
iomap_write_begin so we will check for signals there.  dax_iomap_actor
has to handle the situation explicitly because it copies data to the
userspace directly.  Other callers like iomap_page_mkwrite work on a
single page or iomap_fiemap_actor do not allocate memory based on the
given len.

Fixes: 68a9f5e700 ("xfs: implement iomap based buffered write path")
Link: http://lkml.kernel.org/r/20170201092706.9966-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03 14:13:19 -08:00
Linus Torvalds
1fc576b82b Three more miscellaneous nfsd bugfixes.
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYk5bGAAoJECebzXlCjuG+vGcP/j2Sw74vbJX4ifkFQvUomOp/
 83IHinaax5RDIqryf903iabvKX1SuXpVuxgpxtaSCz+oS9fV5wv/28NdIrMwfICJ
 DC3I2xQ/osLUGHw1Td8BZ+Fv+P0Th+OGKwWCydp3Xejg/X+XUKrLs/Ex1/rHwTzJ
 y/BIAV7BU94HlDFxif7sKbv6mi/gpETlvJK4AAbSbvpd4f5JuBXPQBtJBxUPtwi1
 zM0pQzMoBgY5XzAZBWmQEYCJjTje5wqiueDHtteh8/X7FoLWtAcHsOmleglleKQn
 927eRCTrHt9ioToyZJM4GkH8eW/nOMpaKKvjXCTdoKsuEZyAnlPk0VZUAikfDvql
 dr9FznI21Geq/c9IeU3+1xSbe1I9eDO0L9qWp8prUikIZwI4kdEBN5z/0oMMcU4q
 IXw2lb46w8JD11GIIAGZIhZb63FdO75Ck4Z2GX6UUFqt246s26Go9yJIZEDfu5sL
 8FLmwgOYNMhFrSPAR9JmnBors5gNT9owNwieUB8IFvgMv1ajz2CWG2yvNO9Sq/SK
 a/HJJ7A1YvX0uSzsKsvO/j5S4cY73l2kWKX4NRqMFXIYzMzNGHvIvIB238tXAzZe
 Z5YaMycsjuRKe9VkP2lZQtVzl9qfnvkd5o6Tg3RkMZXkVOHMB/j2yratWB2XTl3h
 I2xAGjQFJ/Pn66pJWNe0
 =yQ4e
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.10-2' of git://linux-nfs.org/~bfields/linux

Pull nfsd fixes from Bruce Fields:
 "Three more miscellaneous nfsd bugfixes"

* tag 'nfsd-4.10-2' of git://linux-nfs.org/~bfields/linux:
  svcrpc: fix oops in absence of krb5 module
  nfsd: special case truncates some more
  NFSD: Fix a null reference case in find_or_create_lock_stateid()
2017-02-02 12:49:58 -08:00
Linus Torvalds
2883aaea36 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull fscache fixes from Al Viro.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fscache: Fix dead object requeue
  fscache: Clear outstanding writes when disabling a cookie
  FS-Cache: Initialise stores_lock in netfs cookie
2017-02-01 10:30:56 -08:00
Linus Torvalds
415f9b71d1 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fix from Steve French:
 "A small cifs fix for stable"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: initialize file_info_lock
2017-01-31 12:36:39 -08:00
David Howells
e26bfebdfc fscache: Fix dead object requeue
Under some circumstances, an fscache object can become queued such that it
fscache_object_work_func() can be called once the object is in the
OBJECT_DEAD state.  This results in the kernel oopsing when it tries to
invoke the handler for the state (which is hard coded to 0x2).

The way this comes about is something like the following:

 (1) The object dispatcher is processing a work state for an object.  This
     is done in workqueue context.

 (2) An out-of-band event comes in that isn't masked, causing the object to
     be queued, say EV_KILL.

 (3) The object dispatcher finishes processing the current work state on
     that object and then sees there's another event to process, so,
     without returning to the workqueue core, it processes that event too.
     It then follows the chain of events that initiates until we reach
     OBJECT_DEAD without going through a wait state (such as
     WAIT_FOR_CLEARANCE).

     At this point, object->events may be 0, object->event_mask will be 0
     and oob_event_mask will be 0.

 (4) The object dispatcher returns to the workqueue processor, and in due
     course, this sees that the object's work item is still queued and
     invokes it again.

 (5) The current state is a work state (OBJECT_DEAD), so the dispatcher
     jumps to it - resulting in an OOPS.

When I'm seeing this, the work state in (1) appears to have been either
LOOK_UP_OBJECT or CREATE_OBJECT (object->oob_table is
fscache_osm_lookup_oob).

The window for (2) is very small:

 (A) object->event_mask is cleared whilst the event dispatch process is
     underway - though there's no memory barrier to force this to the top
     of the function.

     The window, therefore is from the time the object was selected by the
     workqueue processor and made requeueable to the time the mask was
     cleared.

 (B) fscache_raise_event() will only queue the object if it manages to set
     the event bit and the corresponding event_mask bit was set.

     The enqueuement is then deferred slightly whilst we get a ref on the
     object and get the per-CPU variable for workqueue congestion.  This
     slight deferral slightly increases the probability by allowing extra
     time for the workqueue to make the item requeueable.

Handle this by giving the dead state a processor function and checking the
for the dead state address rather than seeing if the processor function is
address 0x2.  The dead state processor function can then set a flag to
indicate that it's occurred and give a warning if it occurs more than once
per object.

If this race occurs, an oops similar to the following is seen (note the RIP
value):

BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
IP: [<0000000000000002>] 0x1
PGD 0
Oops: 0010 [#1] SMP
Modules linked in: ...
CPU: 17 PID: 16077 Comm: kworker/u48:9 Not tainted 3.10.0-327.18.2.el7.x86_64 #1
Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
Workqueue: fscache_object fscache_object_work_func [fscache]
task: ffff880302b63980 ti: ffff880717544000 task.ti: ffff880717544000
RIP: 0010:[<0000000000000002>]  [<0000000000000002>] 0x1
RSP: 0018:ffff880717547df8  EFLAGS: 00010202
RAX: ffffffffa0368640 RBX: ffff880edf7a4480 RCX: dead000000200200
RDX: 0000000000000002 RSI: 00000000ffffffff RDI: ffff880edf7a4480
RBP: ffff880717547e18 R08: 0000000000000000 R09: dfc40a25cb3a4510
R10: dfc40a25cb3a4510 R11: 0000000000000400 R12: 0000000000000000
R13: ffff880edf7a4510 R14: ffff8817f6153400 R15: 0000000000000600
FS:  0000000000000000(0000) GS:ffff88181f420000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000002 CR3: 000000000194a000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffffffffa0363695 ffff880edf7a4510 ffff88093f16f900 ffff8817faa4ec00
 ffff880717547e60 ffffffff8109d5db 00000000faa4ec18 0000000000000000
 ffff8817faa4ec18 ffff88093f16f930 ffff880302b63980 ffff88093f16f900
Call Trace:
 [<ffffffffa0363695>] ? fscache_object_work_func+0xa5/0x200 [fscache]
 [<ffffffff8109d5db>] process_one_work+0x17b/0x470
 [<ffffffff8109e4ac>] worker_thread+0x21c/0x400
 [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
 [<ffffffff810a5acf>] kthread+0xcf/0xe0
 [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
 [<ffffffff816460d8>] ret_from_fork+0x58/0x90
 [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeremy McNicoll <jeremymc@redhat.com>
Tested-by: Frank Sorenson <sorenson@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-01-31 13:23:09 -05:00
David Howells
6bdded59c8 fscache: Clear outstanding writes when disabling a cookie
fscache_disable_cookie() needs to clear the outstanding writes on the
cookie it's disabling because they cannot be completed after.

Without this, fscache_nfs_open_file() gets stuck because it disables the
cookie when the file is opened for writing but can't uncache the pages till
afterwards - otherwise there's a race between the open routine and anyone
who already has it open R/O and is still reading from it.

Looking in /proc/pid/stack of the offending process shows:

[<ffffffffa0142883>] __fscache_wait_on_page_write+0x82/0x9b [fscache]
[<ffffffffa014336e>] __fscache_uncache_all_inode_pages+0x91/0xe1 [fscache]
[<ffffffffa01740fa>] nfs_fscache_open_file+0x59/0x9e [nfs]
[<ffffffffa01ccf41>] nfs4_file_open+0x17f/0x1b8 [nfsv4]
[<ffffffff8117350e>] do_dentry_open+0x16d/0x2b7
[<ffffffff811743ac>] vfs_open+0x5c/0x65
[<ffffffff81184185>] path_openat+0x785/0x8fb
[<ffffffff81184343>] do_filp_open+0x48/0x9e
[<ffffffff81174710>] do_sys_open+0x13b/0x1cb
[<ffffffff811747b9>] SyS_open+0x19/0x1b
[<ffffffff81001c44>] do_syscall_64+0x80/0x17a
[<ffffffff8165c2da>] return_from_SYSCALL_64+0x0/0x7a
[<ffffffffffffffff>] 0xffffffffffffffff

Reported-by: Jianhong Yin <jiyin@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-01-31 13:23:09 -05:00
David Howells
62deb8187d FS-Cache: Initialise stores_lock in netfs cookie
Initialise the stores_lock in fscache netfs cookies.  Technically, it
shouldn't be necessary, since the netfs cookie is an index and stores no
data, but initialising it anyway adds insignificant overhead.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-01-31 13:23:09 -05:00
Christoph Hellwig
41f53350a0 nfsd: special case truncates some more
Both the NFS protocols and the Linux VFS use a setattr operation with a
bitmap of attributs to set to set various file attributes including the
file size and the uid/gid.

The Linux syscalls never mixes size updates with unrelated updates like
the uid/gid, and some file systems like XFS and GFS2 rely on the fact
that truncates might not update random other attributes, and many other
file systems handle the case but do not update the different attributes
in the same transaction.  NFSD on the other hand passes the attributes
it gets on the wire more or less directly through to the VFS, leading to
updates the file systems don't expect.  XFS at least has an assert on
the allowed attributes, which caught an unusual NFS client setting the
size and group at the same time.

To handle this issue properly this switches nfsd to call vfs_truncate
for size changes, and then handle all other attributes through
notify_change.  As a side effect this also means less boilerplace code
around the size change as we can now reuse the VFS code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-31 12:29:24 -05:00
Kinglong Mee
d19fb70dd6 NFSD: Fix a null reference case in find_or_create_lock_stateid()
nfsd assigns the nfs4_free_lock_stateid to .sc_free in init_lock_stateid().

If nfsd doesn't go through init_lock_stateid() and put stateid at end,
there is a NULL reference to .sc_free when calling nfs4_put_stid(ns).

This patch let the nfs4_stid.sc_free assignment to nfs4_alloc_stid().

Cc: stable@vger.kernel.org
Fixes: 356a95ece7 "nfsd: clean up races in lock stateid searching..."
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-31 12:29:24 -05:00
Linus Torvalds
d56a5ca366 NFS client bugfixes for Linux 4.10
Stable patches:
 - NFSv4.1: Fix a deadlock in layoutget
 - NFSv4 must not bump sequence ids on NFS4ERR_MOVED errors
 - NFSv4 Fix a regression with OPEN EXCLUSIVE4 mode
 - Fix a memory leak when removing the SUNRPC module
 
 Bugfixes:
 - Fix a reference leak in _pnfs_return_layout
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYjM6UAAoJEGcL54qWCgDy+WoP/2NdwqzOa3xxn8piMhLh/TLJ
 lxWfVmUZJ8UNVSVl45NDslt2V63P0gpITUiaVo27rPfjKV8pv8XTU7HkxCuXzj0m
 uo11jEErsYxbXELNDL2y1fmsPTygYvg4MAV3NK0inKv/ciu6TTMddoga+QQXGJPz
 QqQ7znTHyjDSTqZM1D5+f8EhbY7AIAzJbkgVr90WPPjsT6YtdlTQpMb1O9tlDtfj
 oYidIOxRphqLOYTdoD8NW4x19dlgTNIYyCgOFomCbpUTFKKK5RFjVy+bpml3NxV1
 oUldw8lhiUE8x/5J/CfbCMt1bPZ3yJeFWNghINOIF7cgGnvKEjw0uO2o8Lk6fMzi
 yF2aNoZcNQnyoHHCjsqK8t5dB51abwxlzeElrFB1u73B56CcXB91xEP0GsX1vFt1
 BIGCnfsOvtYPZxp/OX+B3AweAJ2DRuIvdze9cS2IeHBW9c5XdVdDmMK7CFeL9dc3
 YEsWmaWNWtDt41erxCvQ0ezkr3lPHXOcjwS440LIiVpr38/rzjp81+Tf+i2D1GzB
 XTsy9IHzaC2zLb0Vz5vOR3S10k4OiFtXT6ikoZLwgpTfI2gt146e0YHx3g90AiFh
 QYBRNLJrjSYVedjv2GGFLG1WWsHbhbP6dpmT3BwNmhZTzjPoqZ76f5QFIQUya4jt
 4Y2ksyhx7pOeOJy++6SN
 =S5qH
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.10-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Stable patches:
   - NFSv4.1: Fix a deadlock in layoutget
   - NFSv4 must not bump sequence ids on NFS4ERR_MOVED errors
   - NFSv4 Fix a regression with OPEN EXCLUSIVE4 mode
   - Fix a memory leak when removing the SUNRPC module

  Bugfixes:
   - Fix a reference leak in _pnfs_return_layout"

* tag 'nfs-for-4.10-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  pNFS: Fix a reference leak in _pnfs_return_layout
  nfs: Fix "Don't increment lock sequence ID after NFS4ERR_MOVED"
  SUNRPC: cleanup ida information when removing sunrpc module
  NFSv4.0: always send mode in SETATTR after EXCLUSIVE4
  nfs: Don't increment lock sequence ID after NFS4ERR_MOVED
  NFSv4.1: Fix a deadlock in layoutget
2017-01-28 11:50:17 -08:00
Linus Torvalds
3365135d43 Contained in this update:
- Fix race conditions in the CoW code
 - Fix some incorrect input validation checks
 - Avoid crashing fs by running out of space when freeing inodes
 - Fix toctou race wrt whether or not an inode has an attr
 - Fix build error on arm
 - Fix page refcount corruption when readahead fails
 - Don't corrupt userspace in the bmap ioctl
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCgAGBQJYi4RJAAoJEPh/dxk0SrTrsU4QAIZBUUSpvpwggyfbTcOb6QWb
 F/vBoj50f+cxZB2jxLrch4aRsvlhW6IkRnsKiciG4cQaDbjPbjhSMH4JEUUqrPvG
 TRTcBvT6rEbxnB3adIH2DVrDAaEEVpRSkPaV/vLjYEfJp8Nyv8yXb6U8zH//NeZQ
 Pwrwe0RX0bJRyAEBIRwBnTayMP6xIccCE9Ml7+sMG/UVnb0Fa8t5zg/9igfd0MrB
 xf3MdkW0fpFCcp1Bbby8cnDmTjjxEtB6OApL82UnSZ7l2/U5AHhiA0NgHreYuzt8
 47ezqEQfk+IWK5LY1c6V/vARKhVvM738jS2dG1tsFhnbTbq9yXA2yiCMdA+sKB7+
 wlRIuTq7tuhN4Lk/9eheXHR4xHKDbOKY+zWEWi/AlFRaWmld0otMykVC6wbp6soo
 1gYgbaCjJcJcResKYAdby92jqvIRONqknpUF2L0jOiGIPgz8rmjA6BIvymjaXEuO
 4MLfSjeVP4Ip2tDcaa0R3dSQ40lP778UQNiuqcKb1WODx1AyljJB+gemK0jE8kwN
 OEY7IBSs+wP/UBYN+XbYhoIGlJ4ckwyhIctl4bMvVOduQ40uASlyQS6hmqng5Df/
 NIFd+fCwuBCa45mwUJ2LPTzx3WndMyLv30z/ladtshV+WlUbu60yTzT4bIiQDcpZ
 CYALhDjBiCHzrs6rIhxW
 =Co7t
 -----END PGP SIGNATURE-----

Merge tag 'xfs-for-linus-4.10-rc6-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs uodates from Darrick Wong:
 "I have some more fixes this week: better input validation, corruption
  avoidance, build fixes, memory leak fixes, and a couple from Christoph
  to avoid an ENOSPC failure.

  Summary:
   - Fix race conditions in the CoW code
   - Fix some incorrect input validation checks
   - Avoid crashing fs by running out of space when freeing inodes
   - Fix toctou race wrt whether or not an inode has an attr
   - Fix build error on arm
   - Fix page refcount corruption when readahead fails
   - Don't corrupt userspace in the bmap ioctl"

* tag 'xfs-for-linus-4.10-rc6-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: prevent quotacheck from overloading inode lru
  xfs: fix bmv_count confusion w/ shared extents
  xfs: clear _XBF_PAGES from buffers when readahead page
  xfs: extsize hints are not unlikely in xfs_bmap_btalloc
  xfs: remove racy hasattr check from attr ops
  xfs: use per-AG reservations for the finobt
  xfs: only update mount/resv fields on success in __xfs_ag_resv_init
  xfs: verify dirblocklog correctly
  xfs: fix COW writeback race
2017-01-27 12:44:32 -08:00
Linus Torvalds
5906374446 Merge branch 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "Some fixes that we've collected from the list.

  We still have one more pending to nail down a regression in lzo
  compression, but I wanted to get this batch out the door"

* 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: remove ->{get, set}_acl() from btrfs_dir_ro_inode_operations
  Btrfs: disable xattr operations on subvolume directories
  Btrfs: remove old tree_root case in btrfs_read_locked_inode()
  Btrfs: fix truncate down when no_holes feature is enabled
  Btrfs: Fix deadlock between direct IO and fast fsync
  btrfs: fix false enospc error when truncating heavily reflinked file
2017-01-27 12:41:46 -08:00
Linus Torvalds
2fb78e8940 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A set of fixes for this series. This contains:

   - Set of fixes for the nvme target code

   - A revert of patch from this merge window, causing a regression with
     WRITE_SAME on iSCSI targets at least.

   - A fix for a use-after-free in the new O_DIRECT bdev code.

   - Two fixes for the xen-blkfront driver"

* 'for-linus' of git://git.kernel.dk/linux-block:
  Revert "sd: remove __data_len hack for WRITE SAME"
  nvme-fc: use blk_rq_nr_phys_segments
  nvmet-rdma: Fix missing dma sync to nvme data structures
  nvmet: Call fatal_error from keep-alive timout expiration
  nvmet: cancel fatal error and flush async work before free controller
  nvmet: delete controllers deletion upon subsystem release
  nvmet_fc: correct logic in disconnect queue LS handling
  block: fix use after free in __blkdev_direct_IO
  xen-blkfront: correct maximum segment accounting
  xen-blkfront: feature flags handling adjustments
2017-01-27 12:36:39 -08:00
Brian Foster
e0d76fa447 xfs: prevent quotacheck from overloading inode lru
Quotacheck runs at mount time in situations where quota accounting must
be recalculated. In doing so, it uses bulkstat to visit every inode in
the filesystem. Historically, every inode processed during quotacheck
was released and immediately tagged for reclaim because quotacheck runs
before the superblock is marked active by the VFS. In other words,
the final iput() lead to an immediate ->destroy_inode() call, which
allowed the XFS background reclaim worker to start reclaiming inodes.

Commit 17c12bcd3 ("xfs: when replaying bmap operations, don't let
unlinked inodes get reaped") marks the XFS superblock active sooner as
part of the mount process to support caching inodes processed during log
recovery. This occurs before quotacheck and thus means all inodes
processed by quotacheck are inserted to the LRU on release.  The
s_umount lock is held until the mount has completed and thus prevents
the shrinkers from operating on the sb. This means that quotacheck can
excessively populate the inode LRU and lead to OOM conditions on systems
without sufficient RAM.

Update the quotacheck bulkstat handler to set XFS_IGET_DONTCACHE on
inodes processed by quotacheck. This causes ->drop_inode() to return 1
and in turn causes iput_final() to evict the inode. This preserves the
original quotacheck behavior and prevents it from overloading the LRU
and running out of memory.

CC: stable@vger.kernel.org # v4.9
Reported-by: Martin Svec <martin.svec@zoner.cz>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-01-27 09:32:30 -08:00
Omar Sandoval
57b59ed2e5 Btrfs: remove ->{get, set}_acl() from btrfs_dir_ro_inode_operations
Subvolume directory inodes can't have ACLs.

Cc: <stable@vger.kernel.org> # 4.9.x
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2017-01-26 15:48:56 -08:00
Omar Sandoval
1fdf41941b Btrfs: disable xattr operations on subvolume directories
When you snapshot a subvolume containing a subvolume, you get a
placeholder directory where the subvolume would be. These directory
inodes have ->i_ops set to btrfs_dir_ro_inode_operations. Previously,
these i_ops didn't include the xattr operation callbacks. The conversion
to xattr_handlers missed this case, leading to bogus attempts to set
xattrs on these inodes. This manifested itself as failures when running
delayed inodes.

To fix this, clear IOP_XATTR in ->i_opflags on these inodes.

Fixes: 6c6ef9f26e ("xattr: Stop calling {get,set,remove}xattr inode operations")
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Reported-by: Chris Murphy <lists@colorremedies.com>
Tested-by: Chris Murphy <lists@colorremedies.com>
Cc: <stable@vger.kernel.org> # 4.9.x
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2017-01-26 15:48:55 -08:00