Commit Graph

845 Commits

Author SHA1 Message Date
Al Viro
c80567c82a do_last(): don't let a bogus return value from ->open() et.al. to confuse us
... into returning a positive to path_openat(), which would interpret that
as "symlink had been encountered" and proceed to corrupt memory, etc.
It can only happen due to a bug in some ->open() instance or in some LSM
hook, etc., so we report any such event *and* make sure it doesn't trick
us into further unpleasantness.

Cc: stable@vger.kernel.org # v3.6+, at least
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27 19:17:33 -05:00
Al Viro
5955102c99 wrappers for ->i_mutex access
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-22 18:04:28 -05:00
Linus Torvalds
33caf82acf Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "All kinds of stuff.  That probably should've been 5 or 6 separate
  branches, but by the time I'd realized how large and mixed that bag
  had become it had been too close to -final to play with rebasing.

  Some fs/namei.c cleanups there, memdup_user_nul() introduction and
  switching open-coded instances, burying long-dead code, whack-a-mole
  of various kinds, several new helpers for ->llseek(), assorted
  cleanups and fixes from various people, etc.

  One piece probably deserves special mention - Neil's
  lookup_one_len_unlocked().  Similar to lookup_one_len(), but gets
  called without ->i_mutex and tries to avoid ever taking it.  That, of
  course, means that it's not useful for any directory modifications,
  but things like getting inode attributes in nfds readdirplus are fine
  with that.  I really should've asked for moratorium on lookup-related
  changes this cycle, but since I hadn't done that early enough...  I
  *am* asking for that for the coming cycle, though - I'm going to try
  and get conversion of i_mutex to rwsem with ->lookup() done under lock
  taken shared.

  There will be a patch closer to the end of the window, along the lines
  of the one Linus had posted last May - mechanical conversion of
  ->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/
  inode_is_locked()/inode_lock_nested().  To quote Linus back then:

    -----
    |    This is an automated patch using
    |
    |        sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/'
    |        sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/'
    |        sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[     ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/'
    |        sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/'
    |        sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/'
    |
    |    with a very few manual fixups
    -----

  I'm going to send that once the ->i_mutex-affecting stuff in -next
  gets mostly merged (or when Linus says he's about to stop taking
  merges)"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  nfsd: don't hold i_mutex over userspace upcalls
  fs:affs:Replace time_t with time64_t
  fs/9p: use fscache mutex rather than spinlock
  proc: add a reschedule point in proc_readfd_common()
  logfs: constify logfs_block_ops structures
  fcntl: allow to set O_DIRECT flag on pipe
  fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
  fs: xattr: Use kvfree()
  [s390] page_to_phys() always returns a multiple of PAGE_SIZE
  nbd: use ->compat_ioctl()
  fs: use block_device name vsprintf helper
  lib/vsprintf: add %*pg format specifier
  fs: use gendisk->disk_name where possible
  poll: plug an unused argument to do_poll
  amdkfd: don't open-code memdup_user()
  cdrom: don't open-code memdup_user()
  rsxx: don't open-code memdup_user()
  mtip32xx: don't open-code memdup_user()
  [um] mconsole: don't open-code memdup_user_nul()
  [um] hostaudio: don't open-code memdup_user()
  ...
2016-01-12 17:11:47 -08:00
NeilBrown
bbddca8e8f nfsd: don't hold i_mutex over userspace upcalls
We need information about exports when crossing mountpoints during
lookup or NFSv4 readdir.  If we don't already have that information
cached, we may have to ask (and wait for) rpc.mountd.

In both cases we currently hold the i_mutex on the parent of the
directory we're asking rpc.mountd about.  We've seen situations where
rpc.mountd performs some operation on that directory that tries to take
the i_mutex again, resulting in deadlock.

With some care, we may be able to avoid that in rpc.mountd.  But it
seems better just to avoid holding a mutex while waiting on userspace.

It appears that lookup_one_len is pretty much the only operation that
needs the i_mutex.  So we could just drop the i_mutex elsewhere and do
something like

	mutex_lock()
	lookup_one_len()
	mutex_unlock()

In many cases though the lookup would have been cached and not required
the i_mutex, so it's more efficient to create a lookup_one_len() variant
that only takes the i_mutex when necessary.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09 03:07:52 -05:00
Al Viro
62fb4a155f don't carry MAY_OPEN in op->acc_mode
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-04 10:28:40 -05:00
Al Viro
fceef393a5 switch ->get_link() to delayed_call, kill ->put_link()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-30 13:01:03 -05:00
Al Viro
d3883d4f93 teach page_get_link() to work in RCU mode
more or less along the lines of Neil's patchset, sans the insanity
around kmap().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08 22:41:54 -05:00
Al Viro
6b2553918d replace ->follow_link() with new method that could stay in RCU mode
new method: ->get_link(); replacement of ->follow_link().  The differences
are:
	* inode and dentry are passed separately
	* might be called both in RCU and non-RCU mode;
the former is indicated by passing it a NULL dentry.
	* when called that way it isn't allowed to block
and should return ERR_PTR(-ECHILD) if it needs to be called
in non-RCU mode.

It's a flagday change - the old method is gone, all in-tree instances
converted.  Conversion isn't hard; said that, so far very few instances
do not immediately bail out when called in RCU mode.  That'll change
in the next commits.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08 22:41:54 -05:00
Al Viro
21fc61c73c don't put symlink bodies in pagecache into highmem
kmap() in page_follow_link_light() needed to go - allowing to hold
an arbitrary number of kmaps for long is a great way to deadlocking
the system.

new helper (inode_nohighmem(inode)) needs to be used for pagecache
symlinks inodes; done for all in-tree cases.  page_follow_link_light()
instrumented to yell about anything missed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08 22:41:36 -05:00
Al Viro
e1a63bbc40 restore_nameidata(): no need to clear now->stack
microoptimization: in all callers *now is in the frame we are about to leave.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06 21:18:27 -05:00
Al Viro
248fb5b955 namei.c: take "jump to root" into a new helper
... and use it both in path_init() (for absolute pathnames) and
get_link() (for absolute symlinks).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06 21:18:21 -05:00
Al Viro
ef55d91700 path_init(): set nd->inode earlier in cwd-relative case
that allows to kill the recheck of nd->seq on the way out in
this case, and this check on the way out is left only for
absolute pathnames.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06 21:18:16 -05:00
Al Viro
9e6697e26f namei.c: fold set_root_rcu() into set_root()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06 21:18:10 -05:00
Mike Marshall
57e3715cfa typo in fs/namei.c comment
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06 21:17:18 -05:00
Al Viro
aa80deab33 namei: page_getlink() and page_follow_link_light() are the same thing
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06 20:43:27 -05:00
Al Viro
2788cc47f4 Don't reset ->total_link_count on nested calls of vfs_path_lookup()
we already zero it on outermost set_nameidata(), so initialization in
path_init() is pointless and wrong.  The same DoS exists on pre-4.2
kernels, but there a slightly different fix will be needed.

Cc: stable@vger.kernel.org # v4.2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06 12:33:02 -05:00
Linus Torvalds
ad804a0b2a Merge branch 'akpm' (patches from Andrew)
Merge second patch-bomb from Andrew Morton:

 - most of the rest of MM

 - procfs

 - lib/ updates

 - printk updates

 - bitops infrastructure tweaks

 - checkpatch updates

 - nilfs2 update

 - signals

 - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
   dma-debug, dma-mapping, ...

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
  ipc,msg: drop dst nil validation in copy_msg
  include/linux/zutil.h: fix usage example of zlib_adler32()
  panic: release stale console lock to always get the logbuf printed out
  dma-debug: check nents in dma_sync_sg*
  dma-mapping: tidy up dma_parms default handling
  pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
  kexec: use file name as the output message prefix
  fs, seqfile: always allow oom killer
  seq_file: reuse string_escape_str()
  fs/seq_file: use seq_* helpers in seq_hex_dump()
  coredump: change zap_threads() and zap_process() to use for_each_thread()
  coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
  signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
  signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
  signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
  signals: kill block_all_signals() and unblock_all_signals()
  nilfs2: fix gcc uninitialized-variable warnings in powerpc build
  nilfs2: fix gcc unused-but-set-variable warnings
  MAINTAINERS: nilfs2: add header file for tracing
  nilfs2: add tracepoints for analyzing reading and writing metadata files
  ...
2015-11-07 14:32:45 -08:00
Linus Torvalds
75021d2859 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial updates from Jiri Kosina:
 "Trivial stuff from trivial tree that can be trivially summed up as:

   - treewide drop of spurious unlikely() before IS_ERR() from Viresh
     Kumar

   - cosmetic fixes (that don't really affect basic functionality of the
     driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek

   - various comment / printk fixes and updates all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  bcache: Really show state of work pending bit
  hwmon: applesmc: fix comment typos
  Kconfig: remove comment about scsi_wait_scan module
  class_find_device: fix reference to argument "match"
  debugfs: document that debugfs_remove*() accepts NULL and error values
  net: Drop unlikely before IS_ERR(_OR_NULL)
  mm: Drop unlikely before IS_ERR(_OR_NULL)
  fs: Drop unlikely before IS_ERR(_OR_NULL)
  drivers: net: Drop unlikely before IS_ERR(_OR_NULL)
  drivers: misc: Drop unlikely before IS_ERR(_OR_NULL)
  UBI: Update comments to reflect UBI_METAONLY flag
  pktcdvd: drop null test before destroy functions
2015-11-07 13:05:44 -08:00
Michal Hocko
c62d25556b mm, fs: introduce mapping_gfp_constraint()
There are many places which use mapping_gfp_mask to restrict a more
generic gfp mask which would be used for allocations which are not
directly related to the page cache but they are performed in the same
context.

Let's introduce a helper function which makes the restriction explicit and
easier to track.  This patch doesn't introduce any functional changes.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Linus Torvalds
6de29ccb50 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull userns hardlink capability check fix from Eric Biederman:
 "This round just contains a single patch.  There has been a lot of
  other work this period but it is not quite ready yet, so I am pushing
  it until 4.5.

  The remaining change by Dirk Steinmetz wich fixes both Gentoo and
  Ubuntu containers allows hardlinks if we have the appropriate
  capabilities in the user namespace.  Security wise it is really a
  gimme as the user namespace root can already call setuid become that
  user and create the hardlink"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  namei: permit linking with CAP_FOWNER in userns
2015-11-05 15:20:56 -08:00
Dirk Steinmetz
f2ca379642 namei: permit linking with CAP_FOWNER in userns
Attempting to hardlink to an unsafe file (e.g. a setuid binary) from
within an unprivileged user namespace fails, even if CAP_FOWNER is held
within the namespace. This may cause various failures, such as a gentoo
installation within a lxc container failing to build and install specific
packages.

This change permits hardlinking of files owned by mapped uids, if
CAP_FOWNER is held for that namespace. Furthermore, it improves consistency
by using the existing inode_owner_or_capable(), which is aware of
namespaced capabilities as of 23adbe12ef ("fs,userns: Change
inode_capable to capable_wrt_inode_uidgid").

Signed-off-by: Dirk Steinmetz <public@rsjtdrjgfuzkfg.com>

This is hitting us in Ubuntu during some dpkg upgrades in containers.
When upgrading a file dpkg creates a hard link to the old file to back
it up before overwriting it. When packages upgrade suid files owned by a
non-root user the link isn't permitted, and the package upgrade fails.
This patch fixes our problem.

Tested-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2015-10-27 16:12:35 -05:00
Trond Myklebust
daf3761c9f namei: results of d_is_negative() should be checked after dentry revalidation
Leandro Awa writes:
 "After switching to version 4.1.6, our parallelized and distributed
  workflows now fail consistently with errors of the form:

  T34: ./regex.c:39:22: error: config.h: No such file or directory

  From our 'git bisect' testing, the following commit appears to be the
  possible cause of the behavior we've been seeing: commit 766c4cbfacd8"

Al Viro says:
 "What happens is that 766c4cbfac got the things subtly wrong.

  We used to treat d_is_negative() after lookup_fast() as "fall with
  ENOENT".  That was wrong - checking ->d_flags outside of ->d_seq
  protection is unreliable and failing with hard error on what should've
  fallen back to non-RCU pathname resolution is a bug.

  Unfortunately, we'd pulled the test too far up and ran afoul of
  another kind of staleness.  The dentry might have been absolutely
  stable from the RCU point of view (and we might be on UP, etc), but
  stale from the remote fs point of view.  If ->d_revalidate() returns
  "it's actually stale", dentry gets thrown away and the original code
  wouldn't even have looked at its ->d_flags.

  What we need is to check ->d_flags where 766c4cbfac does (prior to
  ->d_seq validation) but only use the result in cases where we do not
  discard this dentry outright"

Reported-by: Leandro Awa <lawa@nvidia.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=104911
Fixes: 766c4cbfac ("namei: d_is_negative() should be checked...")
Tested-by: Leandro Awa <lawa@nvidia.com>
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-10 10:17:27 -07:00
Viresh Kumar
a1c83681d5 fs: Drop unlikely before IS_ERR(_OR_NULL)
IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there
is no need to do that again from its callers. Drop it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve French <smfrench@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-09-29 15:13:58 +02:00
Masanari Iida
2a78b857d3 namei: fix warning while make xmldocs caused by namei.c
Fix the following warnings:

Warning(.//fs/namei.c:2422): No description found for parameter 'nd'
Warning(.//fs/namei.c:2422): Excess function parameter 'nameidata'
description in 'path_mountpoint'

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Eric W. Biederman
397d425dc2 vfs: Test for and handle paths that are unreachable from their mnt_root
In rare cases a directory can be renamed out from under a bind mount.
In those cases without special handling it becomes possible to walk up
the directory tree to the root dentry of the filesystem and down
from the root dentry to every other file or directory on the filesystem.

Like division by zero .. from an unconnected path can not be given
a useful semantic as there is no predicting at which path component
the code will realize it is unconnected.  We certainly can not match
the current behavior as the current behavior is a security hole.

Therefore when encounting .. when following an unconnected path
return -ENOENT.

- Add a function path_connected to verify path->dentry is reachable
  from path->mnt.mnt_root.  AKA to validate that rename did not do
  something nasty to the bind mount.

  To avoid races path_connected must be called after following a path
  component to it's next path component.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-08-21 03:20:10 -04:00
Al Viro
aa65fa35ba may_follow_link() should use nd->inode
Now that we can get there in RCU mode, we shouldn't play with
nd->path.dentry->d_inode - it's not guaranteed to be stable.
Use nd->inode instead.

Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-08-04 23:23:50 -04:00
Al Viro
97242f99a0 link_path_walk(): be careful when failing with ENOTDIR
In RCU mode we might end up with dentry evicted just we check
that it's a directory.  In such case we should return ECHILD
rather than ENOTDIR, so that pathwalk would be retries in non-RCU
mode.

Breakage had been introduced in commit b18825a - prior to that
we were looking at nd->inode, which had been fetched before
verifying that ->d_seq was still valid.  That form of check
would only be satisfied if at some point the pathname prefix
would indeed have resolved to a non-directory.  The fix consists
of checking ->d_seq after we'd run into a non-directory dentry,
and failing with ECHILD in case of mismatch.

Note that all branches since 3.12 have that problem...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-08-01 20:18:38 -04:00
Al Viro
06d7137e5c namei: make set_root_rcu() return void
The only caller that cares about its return value can just
as easily pick it from nd->root_seq itself.  We used to just
calculate it and return to caller, but these days we are
storing it in nd->root_seq in all cases.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-29 12:07:04 -04:00
Al Viro
b853a16176 turn user_{path_at,path,lpath,path_dir}() into static inlines
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:10:45 -04:00
Al Viro
9883d1855e namei: move saved_nd pointer into struct nameidata
these guys are always declared next to each other; might as well put
the former (pointer to previous instance) into the latter and simplify
the calling conventions for {set,restore}_nameidata()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:10:45 -04:00
Al Viro
520ae68747 inline user_path_create()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:10:44 -04:00
Al Viro
a2ec4a2d5c inline user_path_parent()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:10:44 -04:00
Al Viro
76ae2a5ab1 namei: trim do_last() arguments
now that struct filename is stashed in nameidata we have no need to
pass it in

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:10:43 -04:00
Al Viro
c8a53ee5ee namei: stash dfd and name into nameidata
fewer arguments to pass around...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:10:43 -04:00
Al Viro
102b8af266 namei: fold path_cleanup() into terminate_walk()
they are always called next to each other; moreover,
terminate_walk() is more symmetrical that way.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:10:42 -04:00
Al Viro
5c31b6cedb namei: saner calling conventions for filename_parentat()
a) make it reject ERR_PTR() for name
b) make it putname(name) on all other failure exits
c) make it return name on success

again, simplifies the callers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:10:42 -04:00
Al Viro
181c37b6e4 namei: saner calling conventions for filename_create()
a) make it reject ERR_PTR() for name
b) make it putname(name) upon return in all other cases.

seriously simplifies the callers...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:10:42 -04:00
Al Viro
391172c46e namei: shift nameidata down into filename_parentat()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:10:41 -04:00
Al Viro
abc9f5beb1 namei: make filename_lookup() reject ERR_PTR() passed as name
makes for much easier life in callers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:10:41 -04:00
Al Viro
9ad1aaa615 namei: shift nameidata inside filename_lookup()
pass root instead; non-NULL => copy to nd.root and
set LOOKUP_ROOT in flags

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:10:40 -04:00
Al Viro
e4bd1c1a95 namei: move putname() call into filename_lookup()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:10:40 -04:00
Al Viro
625b6d1054 namei: pass the struct path to store the result down into path_lookupat()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:10:39 -04:00
Al Viro
18d8c86011 namei: uninline set_root{,_rcu}()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:10:39 -04:00
Al Viro
aed434ada6 namei: be careful with mountpoint crossings in follow_dotdot_rcu()
Otherwise we are risking a hard error where nonlazy restart would be the right
thing to do; it's a very narrow race with mount --move and most of the time it
ends up being completely harmless, but it's possible to construct a case when
we'll get a bogus hard error instead of falling back to non-lazy walk...

For one thing, when crossing _into_ overmount of parent we need to check for
mount_lock bumps when we get NULL from __lookup_mnt() as well.

For another, and less exotically, we need to make sure that the data fetched
in follow_up_rcu() had been consistent.  ->mnt_mountpoint is pinned for as
long as it is a mountpoint, but we need to check mount_lock after fetching
to verify that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:10:38 -04:00
Al Viro
5a8d87e8ed namei: unlazy_walk() doesn't need to mess with current->fs anymore
now that we have ->root_seq, legitimize_path(&nd->root, nd->root_seq)
will do just fine...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:10:36 -04:00
Al Viro
8f47a0167c namei: handle absolute symlinks without dropping out of RCU mode
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:10:22 -04:00
Al Viro
8c1b456689 enable passing fast relative symlinks without dropping out of RCU mode
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:06:28 -04:00
NeilBrown
8fa9dd2466 VFS/namei: make the use of touch_atime() in get_link() RCU-safe.
touch_atime is not RCU-safe, and so cannot be called on an RCU walk.
However, in situations where RCU-walk makes a difference, the symlink
will likely to accessed much more often than it is useful to update
the atime.

So split out the test of "Does the atime actually need to be updated"
into  atime_needs_update(), and have get_link() unlazy if it finds that
it will need to do that update.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:06:27 -04:00
Al Viro
bc40aee053 namei: don't unlazy until get_link()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:06:27 -04:00
Al Viro
7973387a2f namei: make unlazy_walk and terminate_walk handle nd->stack, add unlazy_link
We are almost done - primitives for leaving RCU mode are aware of nd->stack
now, a new primitive for going to non-RCU mode when we have a symlink on hands
added.

The thing we are heavily relying upon is that *any* unlazy failure will be
shortly followed by terminate_walk(), with no access to nameidata in between.
So it's enough to leave the things in a state terminate_walk() would cope with.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15 01:06:01 -04:00