Commit Graph

53440 Commits

Author SHA1 Message Date
Al Viro
18fbbfc2bf omfs_lookup(): report IO errors, use d_splice_alias()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-22 14:27:58 -04:00
Al Viro
04bb1ba141 orangefs_lookup: simplify
d_splice_alias() can handle NULL and ERR_PTR() for inode just fine...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-22 14:27:58 -04:00
Al Viro
0ed883fd80 openpromfs: switch to d_splice_alias()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-22 14:27:57 -04:00
Al Viro
b113a6d3cf xfs_vn_lookup: simplify a bit
have all post-xfs_lookup() branches converge on d_splice_alias()

Cc: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-22 14:27:57 -04:00
Al Viro
9a7dddcaff adfs_lookup: do not fail with ENOENT on negatives, use d_splice_alias()
Cc: Russell King <linux@armlinux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-22 14:27:56 -04:00
Al Viro
686bb96d1b adfs_lookup_byname: .. *is* taken care of in fs/namei.c
Cc: Russell King <linux@armlinux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-22 14:27:55 -04:00
Al Viro
8130c15176 romfs_lookup: switch to d_splice_alias()
... and hash negative lookups

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-22 14:27:55 -04:00
Al Viro
c1481700f4 qnx6_lookup: switch to d_splice_alias()
... and hash negative lookups

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-22 14:27:54 -04:00
Al Viro
191ac107f9 ubifs_lookup: use d_splice_alias()
code is simpler that way

Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-22 14:27:54 -04:00
Al Viro
5bf3544970 sysv_lookup: use d_splice_alias()
code is simpler that way

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-22 14:27:53 -04:00
Al Viro
b135dcea37 qnx4_lookup: use d_splice_alias()
code is simpler that way

Acked-by: Anders Larsen <al@alarsen.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-22 14:27:52 -04:00
Al Viro
b014951692 minix_lookup: use d_splice_alias()
code is simpler that way

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-22 14:27:52 -04:00
Al Viro
72ff0b038d freevxfs_lookup(): use d_splice_alias()
code is simpler that way
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-22 14:27:51 -04:00
Al Viro
d023b3a19f cramfs_lookup(): use d_splice_alias()
simpler code that way, actually

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-22 14:27:51 -04:00
Al Viro
b455ecd4bb bfs_add_entry: pass name/len as qstr pointer
same story as with bfs_find_entry()

Cc: "Tigran A. Aivazian" <aivazian.tigran@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-22 14:27:50 -04:00
Al Viro
33ebdebece bfs_find_entry: pass name/len as qstr pointer
all callers feed something->name/something->len anyway

Cc: "Tigran A. Aivazian" <aivazian.tigran@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-22 14:27:49 -04:00
Al Viro
a596a23b9a bfs_lookup(): use d_splice_alias()
code is actually simpler that way.

Acked-by: "Tigran A. Aivazian" <aivazian.tigran@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-22 14:27:38 -04:00
Al Viro
837f3ec692 Merge branch 'work.misc' into work.lookup 2018-05-21 17:43:32 -04:00
Al Viro
baf10564fb aio: fix io_destroy(2) vs. lookup_ioctx() race
kill_ioctx() used to have an explicit RCU delay between removing the
reference from ->ioctx_table and percpu_ref_kill() dropping the refcount.
At some point that delay had been removed, on the theory that
percpu_ref_kill() itself contained an RCU delay.  Unfortunately, that was
the wrong kind of RCU delay and it didn't care about rcu_read_lock() used
by lookup_ioctx().  As the result, we could get ctx freed right under
lookup_ioctx().  Tejun has fixed that in a6d7cff472 ("fs/aio: Add explicit
RCU grace period when freeing kioctx"); however, that fix is not enough.

Suppose io_destroy() from one thread races with e.g. io_setup() from another;
CPU1 removes the reference from current->mm->ioctx_table[...] just as CPU2
has picked it (under rcu_read_lock()).  Then CPU1 proceeds to drop the
refcount, getting it to 0 and triggering a call of free_ioctx_users(),
which proceeds to drop the secondary refcount and once that reaches zero
calls free_ioctx_reqs().  That does
        INIT_RCU_WORK(&ctx->free_rwork, free_ioctx);
        queue_rcu_work(system_wq, &ctx->free_rwork);
and schedules freeing the whole thing after RCU delay.

In the meanwhile CPU2 has gotten around to percpu_ref_get(), bumping the
refcount from 0 to 1 and returned the reference to io_setup().

Tejun's fix (that queue_rcu_work() in there) guarantees that ctx won't get
freed until after percpu_ref_get().  Sure, we'd increment the counter before
ctx can be freed.  Now we are out of rcu_read_lock() and there's nothing to
stop freeing of the whole thing.  Unfortunately, CPU2 assumes that since it
has grabbed the reference, ctx is *NOT* going away until it gets around to
dropping that reference.

The fix is obvious - use percpu_ref_tryget_live() and treat failure as miss.
It's not costlier than what we currently do in normal case, it's safe to
call since freeing *is* delayed and it closes the race window - either
lookup_ioctx() comes before percpu_ref_kill() (in which case ctx->users
won't reach 0 until the caller of lookup_ioctx() drops it) or lookup_ioctx()
fails, ctx->users is unaffected and caller of lookup_ioctx() doesn't see
the object in question at all.

Cc: stable@kernel.org
Fixes: a6d7cff472 "fs/aio: Add explicit RCU grace period when freeing kioctx"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21 14:30:11 -04:00
Al Viro
5aa1437d2d ext2: fix a block leak
open file, unlink it, then use ioctl(2) to make it immutable or
append only.  Now close it and watch the blocks *not* freed...

Immutable/append-only checks belong in ->setattr().
Note: the bug is old and backport to anything prior to 737f2e93b9
("ext2: convert to use the new truncate convention") will need
these checks lifted into ext2_setattr().

Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21 14:30:11 -04:00
Al Viro
3819bb0d79 nfsd: vfs_mkdir() might succeed leaving dentry negative unhashed
That can (and does, on some filesystems) happen - ->mkdir() (and thus
vfs_mkdir()) can legitimately leave its argument negative and just
unhash it, counting upon the lookup to pick the object we'd created
next time we try to look at that name.

Some vfs_mkdir() callers forget about that possibility...

Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21 14:30:10 -04:00
Al Viro
9c3e9025a3 cachefiles: vfs_mkdir() might succeed leaving dentry negative unhashed
That can (and does, on some filesystems) happen - ->mkdir() (and thus
vfs_mkdir()) can legitimately leave its argument negative and just
unhash it, counting upon the lookup to pick the object we'd created
next time we try to look at that name.

Some vfs_mkdir() callers forget about that possibility...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21 14:30:10 -04:00
Al Viro
7b745a4e40 unfuck sysfs_mount()
new_sb is left uninitialized in case of early failures in kernfs_mount_ns(),
and while IS_ERR(root) is true in all such cases, using IS_ERR(root) || !new_sb
is not a solution - IS_ERR(root) is true in some cases when new_sb is true.

Make sure new_sb is initialized (and matches the reality) in all cases and
fix the condition for dropping kobj reference - we want it done precisely
in those situations where the reference has not been transferred into a new
super_block instance.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21 14:30:09 -04:00
Al Viro
82382acec0 kernfs: deal with kernfs_fill_super() failures
make sure that info->node is initialized early, so that kernfs_kill_sb()
can list_del() it safely.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21 14:30:08 -04:00
Joe Perches
08a8f30868 cramfs: Fix IS_ENABLED typo
There's an extra C here...

Fixes: 99c18ce580 ("cramfs: direct memory access support")
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21 14:30:08 -04:00
Al Viro
f4e4d434fe befs_lookup(): use d_splice_alias()
RTFS(Documentation/filesystems/nfs/Exporting) if you try to make
something exportable.

Fixes: ac632f5b63 "befs: add NFS export support"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21 14:30:07 -04:00
Al Viro
87fbd639c0 affs_lookup: switch to d_splice_alias()
Making something exportable takes more than providing ->s_export_ops.
In particular, ->lookup() *MUST* use d_splice_alias() instead of
d_add().

Reading Documentation/filesystems/nfs/Exporting would've been a good idea;
as it is, exporting AFFS is badly (and exploitably) broken.

Partially-Fixes: ed4433d723 "fs/affs: make affs exportable"
Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21 14:29:12 -04:00
Al Viro
30da870ce4 affs_lookup(): close a race with affs_remove_link()
we unlock the directory hash too early - if we are looking at secondary
link and primary (in another directory) gets removed just as we unlock,
we could have the old primary moved in place of the secondary, leaving
us to look into freed entry (and leaving our dentry with ->d_fsdata
pointing to a freed entry).

Cc: stable@vger.kernel.org # 2.4.4+
Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-21 14:27:45 -04:00
Danilo Krummrich
030c7e0bb7 vfs: namei: use path_equal() in follow_dotdot()
Use path_equal() to detect whether we're already in root.

Signed-off-by: Danilo Krummrich <danilokrummrich@dk-develop.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-17 22:19:39 -04:00
Al Viro
2220c5b0a7 make xattr_getsecurity() static
many years overdue...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-14 09:51:34 -04:00
Al Viro
f6ddc16175 vfat: simplify checks in vfat_lookup()
vfat_d_anon_disconn() is called only if alias->d_parent is equal to
dentry->d_parent *and* it returns false unless alias->d_parent == alias.
But in that case alias is the directory we are doing lookup in, and
d_splice_alias() would've done the right thing.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-13 12:09:14 -04:00
Al Viro
61fec493c9 get rid of dead code in d_find_alias()
All "try disconnected alias if nothing else fits" logics in d_find_alias()
got accidentally disabled by Neil a while ago; for most of the callers it
was the right thing to do, so fixes belong in few callers that *do* want
disconnected aliases.  This just takes the now-dead code in d_find_alias()
out.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-13 12:08:32 -04:00
Dave Chinner
79f546a696 fs: don't scan the inode cache before SB_BORN is set
We recently had an oops reported on a 4.14 kernel in
xfs_reclaim_inodes_count() where sb->s_fs_info pointed to garbage
and so the m_perag_tree lookup walked into lala land.  It produces
an oops down this path during the failed mount:

  radix_tree_gang_lookup_tag+0xc4/0x130
  xfs_perag_get_tag+0x37/0xf0
  xfs_reclaim_inodes_count+0x32/0x40
  xfs_fs_nr_cached_objects+0x11/0x20
  super_cache_count+0x35/0xc0
  shrink_slab.part.66+0xb1/0x370
  shrink_node+0x7e/0x1a0
  try_to_free_pages+0x199/0x470
  __alloc_pages_slowpath+0x3a1/0xd20
  __alloc_pages_nodemask+0x1c3/0x200
  cache_grow_begin+0x20b/0x2e0
  fallback_alloc+0x160/0x200
  kmem_cache_alloc+0x111/0x4e0

The problem is that the superblock shrinker is running before the
filesystem structures it depends on have been fully set up. i.e.
the shrinker is registered in sget(), before ->fill_super() has been
called, and the shrinker can call into the filesystem before
fill_super() does it's setup work. Essentially we are exposed to
both use-after-free and use-before-initialisation bugs here.

To fix this, add a check for the SB_BORN flag in super_cache_count.
In general, this flag is not set until ->fs_mount() completes
successfully, so we know that it is set after the filesystem
setup has completed. This matches the trylock_super() behaviour
which will not let super_cache_scan() run if SB_BORN is not set, and
hence will not allow the superblock shrinker from entering the
filesystem while it is being set up or after it has failed setup
and is being torn down.

Cc: stable@kernel.org
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-11 15:37:57 -04:00
Al Viro
1e2e547a93 do d_instantiate/unlock_new_inode combinations safely
For anything NFS-exported we do _not_ want to unlock new inode
before it has grown an alias; original set of fixes got the
ordering right, but missed the nasty complication in case of
lockdep being enabled - unlock_new_inode() does
	lockdep_annotate_inode_mutex_key(inode)
which can only be done before anyone gets a chance to touch
->i_mutex.  Unfortunately, flipping the order and doing
unlock_new_inode() before d_instantiate() opens a window when
mkdir can race with open-by-fhandle on a guessed fhandle, leading
to multiple aliases for a directory inode and all the breakage
that follows from that.

	Correct solution: a new primitive (d_instantiate_new())
combining these two in the right order - lockdep annotate, then
d_instantiate(), then the rest of unlock_new_inode().  All
combinations of d_instantiate() with unlock_new_inode() should
be converted to that.

Cc: stable@kernel.org	# 2.6.29 and later
Tested-by: Mike Marshall <hubcap@omnibond.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-11 15:36:37 -04:00
Al Viro
1c18d2a15e it's SB_BORN, not MS_BORN...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-10 15:09:41 -04:00
Al Viro
3087147ba3 msdos_rmdir(): kill BS comment
it hadn't been checking for "busy" since 2.3.99-something and removing
that leaves us with "check if it's empty" followed by call of fat_dir_emtpy()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-04-20 00:01:02 -04:00
Al Viro
16a34adb93 Don't leak MNT_INTERNAL away from internal mounts
We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for
their copies.  As it is, creating a deep stack of bindings of /proc/*/ns/*
somewhere in a new namespace and exiting yields a stack overflow.

Cc: stable@kernel.org
Reported-by: Alexander Aring <aring@mojatatu.com>
Bisected-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Tested-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-04-19 23:52:15 -04:00
Tetsuo Handa
8e04944f0e mm,vmscan: Allow preallocating memory for register_shrinker().
syzbot is catching so many bugs triggered by commit 9ee332d99e
("sget(): handle failures of register_shrinker()"). That commit expected
that calling kill_sb() from deactivate_locked_super() without successful
fill_super() is safe, but the reality was different; some callers assign
attributes which are needed for kill_sb() after sget() succeeds.

For example, [1] is a report where sb->s_mode (which seems to be either
FMODE_READ | FMODE_EXCL | FMODE_WRITE or FMODE_READ | FMODE_EXCL) is not
assigned unless sget() succeeds. But it does not worth complicate sget()
so that register_shrinker() failure path can safely call
kill_block_super() via kill_sb(). Making alloc_super() fail if memory
allocation for register_shrinker() failed is much simpler. Let's avoid
calling deactivate_locked_super() from sget_userns() by preallocating
memory for the shrinker and making register_shrinker() in sget_userns()
never fail.

[1] https://syzkaller.appspot.com/bug?id=588996a25a2587be2e3a54e8646728fb9cae44e7

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+5a170e19c963a2e0df79@syzkaller.appspotmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-04-16 02:06:47 -04:00
Al Viro
659038428c orangefs_kill_sb(): deal with allocation failures
orangefs_fill_sb() might've failed to allocate ORANGEFS_SB(s); don't
oops in that case.

Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-04-15 23:49:12 -04:00
Al Viro
c66b23c284 jffs2_kill_sb(): deal with failed allocations
jffs2_fill_super() might fail to allocate jffs2_sb_info;
jffs2_kill_sb() must survive that.

Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-04-15 23:49:05 -04:00
Zev Weiss
2276271147 fs: avoid fdput() after failed fdget() in vfs_dedupe_file_range()
It's a fairly inconsequential bug, since fdput() won't actually try to
fput() the file due to fd.flags (and thus FDPUT_FPUT) being zero in
the failure case, but most other vfs code takes steps to avoid this.

Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-04-15 23:36:26 -04:00
Linus Torvalds
e37563bb6c for-4.17-part2-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlrTQ4sACgkQxWXV+ddt
 WDti3Q/+MAeqsLTjvre2RQ3ka5hNyCuVftUIBmcP3YfJbt+xZYQyaewW4Xkfi3cm
 cbJE+zehzf5ag+RJhxk3OwFTvNfLGIO9asWs3b08NGUi6VzwL0/8B/iOdZPuHSAV
 TrecQIBE2Tp+xax9cQEnxav34D4dUtXNaDweGjp1MIIUkDneQP/I0vlTu7vafBgX
 UVxP6riL/MCs7sjTHGIPs0lv8L/fgdmo+dk5SnNuIPTOcFTQXgVrtHjw9IvbKWd4
 aq+sbNWoSrhXUfllbFg/wZqDe9tWn9E2f6m/H0ThSoNdxusSVgacOjFRYh20NKLW
 WGB8Amd/ItGtJwJ1CIypa7VX2U11UAi0XT7BeiK82rUNEJ6moRqFOXG861gRLoTZ
 SpH8uWO+e+CogfXob1KCndn5lot4AM2ZTkCqfrjpM35Nul72PZdne0CxNlmiRupY
 Fdt5GB+sg8plcMaRiYr++BbbHP5tggX1MrhLGEbx2XBs2eRdn+2Lv2I1Ig/U4NUb
 Vf+xk/tFLKGOTSZlbv7SV7ekXxG/3+7gAuL7A1XMETZCwBF4L3hwyW7CgkEBb3PC
 TqX8BwMaRpyp/FgW/QL6edjXZ3a64VaHIfqRPNks3lWWcCHVbzyiVPfEjx+AEJ/0
 abx6jXTJhLUBPuPxEgb5rsSv62RoxPoYHqCErrG95XwnZ8rSCts=
 =I0y0
 -----END PGP SIGNATURE-----

Merge tag 'for-4.17-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull more btrfs updates from David Sterba:
 "We have queued a few more fixes (error handling, log replay,
  softlockup) and the rest is SPDX updates that touche almost all files
  so the diffstat is long"

* tag 'for-4.17-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: Only check first key for committed tree blocks
  btrfs: add SPDX header to Kconfig
  btrfs: replace GPL boilerplate by SPDX -- sources
  btrfs: replace GPL boilerplate by SPDX -- headers
  Btrfs: fix loss of prealloc extents past i_size after fsync log replay
  Btrfs: clean up resources during umount after trans is aborted
  btrfs: Fix possible softlock on single core machines
  Btrfs: bail out on error during replay_dir_deletes
  Btrfs: fix NULL pointer dereference in log_dir_items
2018-04-15 18:08:35 -07:00
Linus Torvalds
09c9b0eaa0 SMB3 fixes, a few for stable, and some important cleanup work from Ronnie of the smb3 transport code
-----BEGIN PGP SIGNATURE-----
 
 iQGwBAABCAAaBQJa0mDhExxzbWZyZW5jaEBnbWFpbC5jb20ACgkQiiy9cAdyT1HA
 EQv+KxZAFONG/YGBIdoyJevnSobzZSTYgsrEmox28rIvWSFr6Xkt9K6WN4lki+Td
 nacByZBhyeKPp7CxdgjmiNunbGfBZJsWk5LwtgdE2jOLjM9CFq8Z7m2qmkXQ4IkJ
 EoH+NWpCUvEe9nj8oaTdSelR2OHH63SG5E5dK29eOB59ixcVe7qKUG29SpJks9qd
 mCdZyhYyuzh469CKGFeQ8qfWopBH+QFBs4SrpQ9d4liNtZ3W8zTsQ3ezX069AGlU
 Ef2xVTT1DtlZ4/6SiXBKlXGH3jjUg67vEGHzT3ZJwfLVNhTLEpKM/hsPiduSnjS8
 JqdVRdFLc2lQzr4HknhNR9yuE5wT5tZM4iZOjNDwT6Sef2Mt0QIBuYI6kb4mv6Qg
 aCVv7uXUMdvSrTk0mYiO1dXYozGUC+uFhJCuaiLax583o3ihp+/INDosQyqTbavZ
 amRBZpWeNysSJwwJu21RdJiM74+fasR/CDy11U94ssxplg1qE72M8DShK7kXtUIy
 g9ZW
 =Ah5z
 -----END PGP SIGNATURE-----

Merge tag '4.17-rc1SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "SMB3 fixes, a few for stable, and some important cleanup work from
  Ronnie of the smb3 transport code"

* tag '4.17-rc1SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: change validate_buf to validate_iov
  cifs: remove rfc1002 hardcoded constants from cifs_discard_remaining_data()
  cifs: Change SMB2_open to return an iov for the error parameter
  cifs: add resp_buf_size to the mid_q_entry structure
  smb3.11: replace a 4 with server->vals->header_preamble_size
  cifs: replace a 4 with server->vals->header_preamble_size
  cifs: add pdu_size to the TCP_Server_Info structure
  SMB311: Improve checking of negotiate security contexts
  SMB3: Fix length checking of SMB3.11 negotiate request
  CIFS: add ONCE flag for cifs_dbg type
  cifs: Use ULL suffix for 64-bit constant
  SMB3: Log at least once if tree connect fails during reconnect
  cifs: smb2pdu: Fix potential NULL pointer dereference
2018-04-15 18:06:22 -07:00
Linus Torvalds
18b7fd1c93 Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:

 - various hotfixes

 - kexec_file updates and feature work

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits)
  kernel/kexec_file.c: move purgatories sha256 to common code
  kernel/kexec_file.c: allow archs to set purgatory load address
  kernel/kexec_file.c: remove mis-use of sh_offset field during purgatory load
  kernel/kexec_file.c: remove unneeded variables in kexec_purgatory_setup_sechdrs
  kernel/kexec_file.c: remove unneeded for-loop in kexec_purgatory_setup_sechdrs
  kernel/kexec_file.c: split up __kexec_load_puragory
  kernel/kexec_file.c: use read-only sections in arch_kexec_apply_relocations*
  kernel/kexec_file.c: search symbols in read-only kexec_purgatory
  kernel/kexec_file.c: make purgatory_info->ehdr const
  kernel/kexec_file.c: remove checks in kexec_purgatory_load
  include/linux/kexec.h: silence compile warnings
  kexec_file, x86: move re-factored code to generic side
  x86: kexec_file: clean up prepare_elf64_headers()
  x86: kexec_file: lift CRASH_MAX_RANGES limit on crash_mem buffer
  x86: kexec_file: remove X86_64 dependency from prepare_elf64_headers()
  x86: kexec_file: purge system-ram walking from prepare_elf64_headers()
  kexec_file,x86,powerpc: factor out kexec_file_ops functions
  kexec_file: make use of purgatory optional
  proc: revalidate misc dentries
  mm, slab: reschedule cache_reap() on the same CPU
  ...
2018-04-14 08:50:50 -07:00
Alexey Dobriyan
1da4d377f9 proc: revalidate misc dentries
If module removes proc directory while another process pins it by
chdir'ing to it, then subsequent recreation of proc entry and all
entries down the tree will not be visible to any process until pinning
process unchdir from directory and unpins everything.

Steps to reproduce:

	proc_mkdir("aaa", NULL);
	proc_create("aaa/bbb", ...);

		chdir("/proc/aaa");

	remove_proc_entry("aaa/bbb", NULL);
	remove_proc_entry("aaa", NULL);

	proc_mkdir("aaa", NULL);
	# inaccessible because "aaa" dentry still points
	# to the original "aaa".
	proc_create("aaa/bbb", ...);

Fix is to implement ->d_revalidate and ->d_delete.

Link: http://lkml.kernel.org/r/20180312201938.GA4871@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:27 -07:00
Linus Torvalds
48023102b7 Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:
 "In addition to bug fixes and cleanups there are two new features from
  Amir:

   - Consistent inode number support for the case when layers are not
     all on the same filesystem (feature is dubbed "xino").

   - Optimize overlayfs file handle decoding. This one touches the
     exportfs interface to allow detecting the disconnected directory
     case"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: update documentation w.r.t "xino" feature
  ovl: add support for "xino" mount and config options
  ovl: consistent d_ino for non-samefs with xino
  ovl: consistent i_ino for non-samefs with xino
  ovl: constant st_ino for non-samefs with xino
  ovl: allocate anon bdev per unique lower fs
  ovl: factor out ovl_map_dev_ino() helper
  ovl: cleanup ovl_update_time()
  ovl: add WARN_ON() for non-dir redirect cases
  ovl: cleanup setting OVL_INDEX
  ovl: set d->is_dir and d->opaque for last path element
  ovl: Do not check for redirect if this is last layer
  ovl: lookup in inode cache first when decoding lower file handle
  ovl: do not try to reconnect a disconnected origin dentry
  ovl: disambiguate ovl_encode_fh()
  ovl: set lower layer st_dev only if setting lower st_ino
  ovl: fix lookup with middle layer opaque dir and absolute path redirects
  ovl: Set d->last properly during lookup
  ovl: set i_ino to the value of st_ino for NFS export
2018-04-13 16:55:41 -07:00
Qu Wenruo
5d41be6f70 btrfs: Only check first key for committed tree blocks
When looping btrfs/074 with many cpus (>= 8), it's possible to trigger
kernel warning due to first key verification:

[ 4239.523446] WARNING: CPU: 5 PID: 2381 at fs/btrfs/disk-io.c:460 btree_read_extent_buffer_pages+0x1ad/0x210
[ 4239.523830] Modules linked in:
[ 4239.524630] RIP: 0010:btree_read_extent_buffer_pages+0x1ad/0x210
[ 4239.527101] Call Trace:
[ 4239.527251]  read_tree_block+0x42/0x70
[ 4239.527434]  read_node_slot+0xd2/0x110
[ 4239.527632]  push_leaf_right+0xad/0x1b0
[ 4239.527809]  split_leaf+0x4ea/0x700
[ 4239.527988]  ? leaf_space_used+0xbc/0xe0
[ 4239.528192]  ? btrfs_set_lock_blocking_rw+0x99/0xb0
[ 4239.528416]  btrfs_search_slot+0x8cc/0xa40
[ 4239.528605]  btrfs_insert_empty_items+0x71/0xc0
[ 4239.528798]  __btrfs_run_delayed_refs+0xa98/0x1680
[ 4239.529013]  btrfs_run_delayed_refs+0x10b/0x1b0
[ 4239.529205]  btrfs_commit_transaction+0x33/0xaf0
[ 4239.529445]  ? start_transaction+0xa8/0x4f0
[ 4239.529630]  btrfs_alloc_data_chunk_ondemand+0x1b0/0x4e0
[ 4239.529833]  btrfs_check_data_free_space+0x54/0xa0
[ 4239.530045]  btrfs_delalloc_reserve_space+0x25/0x70
[ 4239.531907]  btrfs_direct_IO+0x233/0x3d0
[ 4239.532098]  generic_file_direct_write+0xcb/0x170
[ 4239.532296]  btrfs_file_write_iter+0x2bb/0x5f4
[ 4239.532491]  aio_write+0xe2/0x180
[ 4239.532669]  ? lock_acquire+0xac/0x1e0
[ 4239.532839]  ? __might_fault+0x3e/0x90
[ 4239.533032]  do_io_submit+0x594/0x860
[ 4239.533223]  ? do_io_submit+0x594/0x860
[ 4239.533398]  SyS_io_submit+0x10/0x20
[ 4239.533560]  ? SyS_io_submit+0x10/0x20
[ 4239.533729]  do_syscall_64+0x75/0x1d0
[ 4239.533979]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 4239.534182] RIP: 0033:0x7f8519741697

The problem here is, at btree_read_extent_buffer_pages() we don't have
acquired read/write lock on that extent buffer, only basic info like
level/bytenr is reliable.

So race condition leads to such false alert.

However in current call site, it's impossible to acquire proper lock
without race window.
To fix the problem, we only verify first key for committed tree blocks
(whose generation is no larger than fs_info->last_trans_committed), so
the content of such tree blocks will not change and there is no need to
get read/write lock.

Reported-by: Nikolay Borisov <nborisov@suse.com>
Fixes: 581c176041 ("btrfs: Validate child tree block's level and first key")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-13 16:16:15 +02:00
Ronnie Sahlberg
c1596ff524 cifs: change validate_buf to validate_iov
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-04-12 20:32:55 -05:00
Ronnie Sahlberg
05432e2938 cifs: remove rfc1002 hardcoded constants from cifs_discard_remaining_data()
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-04-12 20:32:53 -05:00
Ronnie Sahlberg
91cb74f514 cifs: Change SMB2_open to return an iov for the error parameter
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-04-12 20:32:50 -05:00