License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 14:07:57 +00:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2005-04-16 22:20:36 +00:00
|
|
|
/*
|
|
|
|
* linux/fs/readdir.c
|
|
|
|
*
|
|
|
|
* Copyright (C) 1995 Linus Torvalds
|
|
|
|
*/
|
|
|
|
|
2010-08-10 00:20:22 +00:00
|
|
|
#include <linux/stddef.h>
|
2007-05-08 07:29:02 +00:00
|
|
|
#include <linux/kernel.h>
|
2011-11-17 04:57:37 +00:00
|
|
|
#include <linux/export.h>
|
2005-04-16 22:20:36 +00:00
|
|
|
#include <linux/time.h>
|
|
|
|
#include <linux/mm.h>
|
|
|
|
#include <linux/errno.h>
|
|
|
|
#include <linux/stat.h>
|
|
|
|
#include <linux/file.h>
|
|
|
|
#include <linux/fs.h>
|
2014-06-04 23:05:41 +00:00
|
|
|
#include <linux/fsnotify.h>
|
2005-04-16 22:20:36 +00:00
|
|
|
#include <linux/dirent.h>
|
|
|
|
#include <linux/security.h>
|
|
|
|
#include <linux/syscalls.h>
|
|
|
|
#include <linux/unistd.h>
|
2017-04-08 22:10:08 +00:00
|
|
|
#include <linux/compat.h>
|
2016-12-24 19:46:01 +00:00
|
|
|
#include <linux/uaccess.h>
|
2005-04-16 22:20:36 +00:00
|
|
|
|
vfs: get rid of old '->iterate' directory operation
All users now just use '->iterate_shared()', which only takes the
directory inode lock for reading.
Filesystems that never got convered to shared mode now instead use a
wrapper that drops the lock, re-takes it in write mode, calls the old
function, and then downgrades the lock back to read mode.
This way the VFS layer and other callers no longer need to care about
filesystems that never got converted to the modern era.
The filesystems that use the new wrapper are ceph, coda, exfat, jfs,
ntfs, ocfs2, overlayfs, and vboxsf.
Honestly, several of them look like they really could just iterate their
directories in shared mode and skip the wrapper entirely, but the point
of this change is to not change semantics or fix filesystems that
haven't been fixed in the last 7+ years, but to finally get rid of the
dual iterators.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-05 19:25:01 +00:00
|
|
|
/*
|
|
|
|
* Some filesystems were never converted to '->iterate_shared()'
|
|
|
|
* and their directory iterators want the inode lock held for
|
|
|
|
* writing. This wrapper allows for converting from the shared
|
|
|
|
* semantics to the exclusive inode use.
|
|
|
|
*/
|
|
|
|
int wrap_directory_iterator(struct file *file,
|
|
|
|
struct dir_context *ctx,
|
|
|
|
int (*iter)(struct file *, struct dir_context *))
|
|
|
|
{
|
|
|
|
struct inode *inode = file_inode(file);
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We'd love to have an 'inode_upgrade_trylock()' operation,
|
|
|
|
* see the comment in mmap_upgrade_trylock() in mm/memory.c.
|
|
|
|
*
|
|
|
|
* But considering this is for "filesystems that never got
|
|
|
|
* converted", it really doesn't matter.
|
|
|
|
*
|
|
|
|
* Also note that since we have to return with the lock held
|
|
|
|
* for reading, we can't use the "killable()" locking here,
|
|
|
|
* since we do need to get the lock even if we're dying.
|
|
|
|
*
|
|
|
|
* We could do the write part killably and then get the read
|
|
|
|
* lock unconditionally if it mattered, but see above on why
|
|
|
|
* this does the very simplistic conversion.
|
|
|
|
*/
|
|
|
|
up_read(&inode->i_rwsem);
|
|
|
|
down_write(&inode->i_rwsem);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Since we dropped the inode lock, we should do the
|
|
|
|
* DEADDIR test again. See 'iterate_dir()' below.
|
|
|
|
*
|
|
|
|
* Note that we don't need to re-do the f_pos games,
|
|
|
|
* since the file must be locked wrt f_pos anyway.
|
|
|
|
*/
|
|
|
|
ret = -ENOENT;
|
|
|
|
if (!IS_DEADDIR(inode))
|
|
|
|
ret = iter(file, ctx);
|
|
|
|
|
|
|
|
downgrade_write(&inode->i_rwsem);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(wrap_directory_iterator);
|
|
|
|
|
Convert filldir[64]() from __put_user() to unsafe_put_user()
We really should avoid the "__{get,put}_user()" functions entirely,
because they can easily be mis-used and the original intent of being
used for simple direct user accesses no longer holds in a post-SMAP/PAN
world.
Manually optimizing away the user access range check makes no sense any
more, when the range check is generally much cheaper than the "enable
user accesses" code that the __{get,put}_user() functions still need.
So instead of __put_user(), use the unsafe_put_user() interface with
user_access_{begin,end}() that really does generate better code these
days, and which is generally a nicer interface. Under some loads, the
multiple user writes that filldir() does are actually quite noticeable.
This also makes the dirent name copy use unsafe_put_user() with a couple
of macros. We do not want to make function calls with SMAP/PAN
disabled, and the code this generates is quite good when the
architecture uses "asm goto" for unsafe_put_user() like x86 does.
Note that this doesn't bother with the legacy cases. Nobody should use
them anyway, so performance doesn't really matter there.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-22 04:59:07 +00:00
|
|
|
/*
|
2024-06-02 00:47:30 +00:00
|
|
|
* Note the "unsafe_put_user()" semantics: we goto a
|
Convert filldir[64]() from __put_user() to unsafe_put_user()
We really should avoid the "__{get,put}_user()" functions entirely,
because they can easily be mis-used and the original intent of being
used for simple direct user accesses no longer holds in a post-SMAP/PAN
world.
Manually optimizing away the user access range check makes no sense any
more, when the range check is generally much cheaper than the "enable
user accesses" code that the __{get,put}_user() functions still need.
So instead of __put_user(), use the unsafe_put_user() interface with
user_access_{begin,end}() that really does generate better code these
days, and which is generally a nicer interface. Under some loads, the
multiple user writes that filldir() does are actually quite noticeable.
This also makes the dirent name copy use unsafe_put_user() with a couple
of macros. We do not want to make function calls with SMAP/PAN
disabled, and the code this generates is quite good when the
architecture uses "asm goto" for unsafe_put_user() like x86 does.
Note that this doesn't bother with the legacy cases. Nobody should use
them anyway, so performance doesn't really matter there.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-22 04:59:07 +00:00
|
|
|
* label for errors.
|
|
|
|
*/
|
|
|
|
#define unsafe_copy_dirent_name(_dst, _src, _len, label) do { \
|
|
|
|
char __user *dst = (_dst); \
|
|
|
|
const char *src = (_src); \
|
|
|
|
size_t len = (_len); \
|
uaccess: implement a proper unsafe_copy_to_user() and switch filldir over to it
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I made filldir() use unsafe_put_user(), which
improves code generation on x86 enormously.
But because we didn't have a "unsafe_copy_to_user()", the dirent name
copy was also done by hand with unsafe_put_user() in a loop, and it
turns out that a lot of other architectures didn't like that, because
unlike x86, they have various alignment issues.
Most non-x86 architectures trap and fix it up, and some (like xtensa)
will just fail unaligned put_user() accesses unconditionally. Which
makes that "copy using put_user() in a loop" not work for them at all.
I could make that code do explicit alignment etc, but the architectures
that don't like unaligned accesses also don't really use the fancy
"user_access_begin/end()" model, so they might just use the regular old
__copy_to_user() interface.
So this commit takes that looping implementation, turns it into the x86
version of "unsafe_copy_to_user()", and makes other architectures
implement the unsafe copy version as __copy_to_user() (the same way they
do for the other unsafe_xyz() accessor functions).
Note that it only does this for the copying _to_ user space, and we
still don't have a unsafe version of copy_from_user().
That's partly because we have no current users of it, but also partly
because the copy_from_user() case is slightly different and cannot
efficiently be implemented in terms of a unsafe_get_user() loop (because
gcc can't do asm goto with outputs).
It would be trivial to do this using "rep movsb", which would work
really nicely on newer x86 cores, but really badly on some older ones.
Al Viro is looking at cleaning up all our user copy routines to make
this all a non-issue, but for now we have this simple-but-stupid version
for x86 that works fine for the dirent name copy case because those
names are short strings and we simply don't need anything fancier.
Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-and-tested-by: Tony Luck <tony.luck@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-07 19:56:48 +00:00
|
|
|
unsafe_put_user(0, dst+len, label); \
|
|
|
|
unsafe_copy_to_user(dst, src, len, label); \
|
Convert filldir[64]() from __put_user() to unsafe_put_user()
We really should avoid the "__{get,put}_user()" functions entirely,
because they can easily be mis-used and the original intent of being
used for simple direct user accesses no longer holds in a post-SMAP/PAN
world.
Manually optimizing away the user access range check makes no sense any
more, when the range check is generally much cheaper than the "enable
user accesses" code that the __{get,put}_user() functions still need.
So instead of __put_user(), use the unsafe_put_user() interface with
user_access_{begin,end}() that really does generate better code these
days, and which is generally a nicer interface. Under some loads, the
multiple user writes that filldir() does are actually quite noticeable.
This also makes the dirent name copy use unsafe_put_user() with a couple
of macros. We do not want to make function calls with SMAP/PAN
disabled, and the code this generates is quite good when the
architecture uses "asm goto" for unsafe_put_user() like x86 does.
Note that this doesn't bother with the legacy cases. Nobody should use
them anyway, so performance doesn't really matter there.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-22 04:59:07 +00:00
|
|
|
} while (0)
|
|
|
|
|
|
|
|
|
2013-05-15 17:52:59 +00:00
|
|
|
int iterate_dir(struct file *file, struct dir_context *ctx)
|
2005-04-16 22:20:36 +00:00
|
|
|
{
|
2013-01-23 22:07:38 +00:00
|
|
|
struct inode *inode = file_inode(file);
|
2005-04-16 22:20:36 +00:00
|
|
|
int res = -ENOTDIR;
|
vfs: get rid of old '->iterate' directory operation
All users now just use '->iterate_shared()', which only takes the
directory inode lock for reading.
Filesystems that never got convered to shared mode now instead use a
wrapper that drops the lock, re-takes it in write mode, calls the old
function, and then downgrades the lock back to read mode.
This way the VFS layer and other callers no longer need to care about
filesystems that never got converted to the modern era.
The filesystems that use the new wrapper are ceph, coda, exfat, jfs,
ntfs, ocfs2, overlayfs, and vboxsf.
Honestly, several of them look like they really could just iterate their
directories in shared mode and skip the wrapper entirely, but the point
of this change is to not change semantics or fix filesystems that
haven't been fixed in the last 7+ years, but to finally get rid of the
dual iterators.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-05 19:25:01 +00:00
|
|
|
|
|
|
|
if (!file->f_op->iterate_shared)
|
2005-04-16 22:20:36 +00:00
|
|
|
goto out;
|
|
|
|
|
|
|
|
res = security_file_permission(file, MAY_READ);
|
|
|
|
if (res)
|
|
|
|
goto out;
|
|
|
|
|
2023-12-12 09:44:40 +00:00
|
|
|
res = fsnotify_file_perm(file, MAY_READ);
|
|
|
|
if (res)
|
|
|
|
goto out;
|
|
|
|
|
vfs: get rid of old '->iterate' directory operation
All users now just use '->iterate_shared()', which only takes the
directory inode lock for reading.
Filesystems that never got convered to shared mode now instead use a
wrapper that drops the lock, re-takes it in write mode, calls the old
function, and then downgrades the lock back to read mode.
This way the VFS layer and other callers no longer need to care about
filesystems that never got converted to the modern era.
The filesystems that use the new wrapper are ceph, coda, exfat, jfs,
ntfs, ocfs2, overlayfs, and vboxsf.
Honestly, several of them look like they really could just iterate their
directories in shared mode and skip the wrapper entirely, but the point
of this change is to not change semantics or fix filesystems that
haven't been fixed in the last 7+ years, but to finally get rid of the
dual iterators.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-05 19:25:01 +00:00
|
|
|
res = down_read_killable(&inode->i_rwsem);
|
2017-09-29 16:06:48 +00:00
|
|
|
if (res)
|
|
|
|
goto out;
|
2007-12-06 22:39:54 +00:00
|
|
|
|
2005-04-16 22:20:36 +00:00
|
|
|
res = -ENOENT;
|
|
|
|
if (!IS_DEADDIR(inode)) {
|
2013-05-23 01:44:23 +00:00
|
|
|
ctx->pos = file->f_pos;
|
vfs: get rid of old '->iterate' directory operation
All users now just use '->iterate_shared()', which only takes the
directory inode lock for reading.
Filesystems that never got convered to shared mode now instead use a
wrapper that drops the lock, re-takes it in write mode, calls the old
function, and then downgrades the lock back to read mode.
This way the VFS layer and other callers no longer need to care about
filesystems that never got converted to the modern era.
The filesystems that use the new wrapper are ceph, coda, exfat, jfs,
ntfs, ocfs2, overlayfs, and vboxsf.
Honestly, several of them look like they really could just iterate their
directories in shared mode and skip the wrapper entirely, but the point
of this change is to not change semantics or fix filesystems that
haven't been fixed in the last 7+ years, but to finally get rid of the
dual iterators.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-05 19:25:01 +00:00
|
|
|
res = file->f_op->iterate_shared(file, ctx);
|
2013-05-23 01:44:23 +00:00
|
|
|
file->f_pos = ctx->pos;
|
2014-06-04 23:05:41 +00:00
|
|
|
fsnotify_access(file);
|
2005-04-16 22:20:36 +00:00
|
|
|
file_accessed(file);
|
|
|
|
}
|
vfs: get rid of old '->iterate' directory operation
All users now just use '->iterate_shared()', which only takes the
directory inode lock for reading.
Filesystems that never got convered to shared mode now instead use a
wrapper that drops the lock, re-takes it in write mode, calls the old
function, and then downgrades the lock back to read mode.
This way the VFS layer and other callers no longer need to care about
filesystems that never got converted to the modern era.
The filesystems that use the new wrapper are ceph, coda, exfat, jfs,
ntfs, ocfs2, overlayfs, and vboxsf.
Honestly, several of them look like they really could just iterate their
directories in shared mode and skip the wrapper entirely, but the point
of this change is to not change semantics or fix filesystems that
haven't been fixed in the last 7+ years, but to finally get rid of the
dual iterators.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-05 19:25:01 +00:00
|
|
|
inode_unlock_shared(inode);
|
2005-04-16 22:20:36 +00:00
|
|
|
out:
|
|
|
|
return res;
|
|
|
|
}
|
2013-05-15 17:52:59 +00:00
|
|
|
EXPORT_SYMBOL(iterate_dir);
|
2005-04-16 22:20:36 +00:00
|
|
|
|
Make filldir[64]() verify the directory entry filename is valid
This has been discussed several times, and now filesystem people are
talking about doing it individually at the filesystem layer, so head
that off at the pass and just do it in getdents{64}().
This is partially based on a patch by Jann Horn, but checks for NUL
bytes as well, and somewhat simplified.
There's also commentary about how it might be better if invalid names
due to filesystem corruption don't cause an immediate failure, but only
an error at the end of the readdir(), so that people can still see the
filenames that are ok.
There's also been discussion about just how much POSIX strictly speaking
requires this since it's about filesystem corruption. It's really more
"protect user space from bad behavior" as pointed out by Jann. But
since Eric Biederman looked up the POSIX wording, here it is for context:
"From readdir:
The readdir() function shall return a pointer to a structure
representing the directory entry at the current position in the
directory stream specified by the argument dirp, and position the
directory stream at the next entry. It shall return a null pointer
upon reaching the end of the directory stream. The structure dirent
defined in the <dirent.h> header describes a directory entry.
From definitions:
3.129 Directory Entry (or Link)
An object that associates a filename with a file. Several directory
entries can associate names with the same file.
...
3.169 Filename
A name consisting of 1 to {NAME_MAX} bytes used to name a file. The
characters composing the name may be selected from the set of all
character values excluding the slash character and the null byte. The
filenames dot and dot-dot have special meaning. A filename is
sometimes referred to as a 'pathname component'."
Note that I didn't bother adding the checks to any legacy interfaces
that nobody uses.
Also note that if this ends up being noticeable as a performance
regression, we can fix that to do a much more optimized model that
checks for both NUL and '/' at the same time one word at a time.
We haven't really tended to optimize 'memchr()', and it only checks for
one pattern at a time anyway, and we really _should_ check for NUL too
(but see the comment about "soft errors" in the code about why it
currently only checks for '/')
See the CONFIG_DCACHE_WORD_ACCESS case of hash_name() for how the name
lookup code looks for pathname terminating characters in parallel.
Link: https://lore.kernel.org/lkml/20190118161440.220134-2-jannh@google.com/
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jann Horn <jannh@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-05 18:32:52 +00:00
|
|
|
/*
|
|
|
|
* POSIX says that a dirent name cannot contain NULL or a '/'.
|
|
|
|
*
|
|
|
|
* It's not 100% clear what we should really do in this case.
|
|
|
|
* The filesystem is clearly corrupted, but returning a hard
|
|
|
|
* error means that you now don't see any of the other names
|
|
|
|
* either, so that isn't a perfect alternative.
|
|
|
|
*
|
|
|
|
* And if you return an error, what error do you use? Several
|
|
|
|
* filesystems seem to have decided on EUCLEAN being the error
|
|
|
|
* code for EFSCORRUPTED, and that may be the error to use. Or
|
|
|
|
* just EIO, which is perhaps more obvious to users.
|
|
|
|
*
|
|
|
|
* In order to see the other file names in the directory, the
|
|
|
|
* caller might want to make this a "soft" error: skip the
|
|
|
|
* entry, and return the error at the end instead.
|
|
|
|
*
|
|
|
|
* Note that this should likely do a "memchr(name, 0, len)"
|
|
|
|
* check too, since that would be filesystem corruption as
|
|
|
|
* well. However, that case can't actually confuse user space,
|
|
|
|
* which has to do a strlen() on the name anyway to find the
|
|
|
|
* filename length, and the above "soft error" worry means
|
|
|
|
* that it's probably better left alone until we have that
|
|
|
|
* issue clarified.
|
2020-01-23 18:05:05 +00:00
|
|
|
*
|
|
|
|
* Note the PATH_MAX check - it's arbitrary but the real
|
|
|
|
* kernel limit on a possible path component, not NAME_MAX,
|
|
|
|
* which is the technical standard limit.
|
Make filldir[64]() verify the directory entry filename is valid
This has been discussed several times, and now filesystem people are
talking about doing it individually at the filesystem layer, so head
that off at the pass and just do it in getdents{64}().
This is partially based on a patch by Jann Horn, but checks for NUL
bytes as well, and somewhat simplified.
There's also commentary about how it might be better if invalid names
due to filesystem corruption don't cause an immediate failure, but only
an error at the end of the readdir(), so that people can still see the
filenames that are ok.
There's also been discussion about just how much POSIX strictly speaking
requires this since it's about filesystem corruption. It's really more
"protect user space from bad behavior" as pointed out by Jann. But
since Eric Biederman looked up the POSIX wording, here it is for context:
"From readdir:
The readdir() function shall return a pointer to a structure
representing the directory entry at the current position in the
directory stream specified by the argument dirp, and position the
directory stream at the next entry. It shall return a null pointer
upon reaching the end of the directory stream. The structure dirent
defined in the <dirent.h> header describes a directory entry.
From definitions:
3.129 Directory Entry (or Link)
An object that associates a filename with a file. Several directory
entries can associate names with the same file.
...
3.169 Filename
A name consisting of 1 to {NAME_MAX} bytes used to name a file. The
characters composing the name may be selected from the set of all
character values excluding the slash character and the null byte. The
filenames dot and dot-dot have special meaning. A filename is
sometimes referred to as a 'pathname component'."
Note that I didn't bother adding the checks to any legacy interfaces
that nobody uses.
Also note that if this ends up being noticeable as a performance
regression, we can fix that to do a much more optimized model that
checks for both NUL and '/' at the same time one word at a time.
We haven't really tended to optimize 'memchr()', and it only checks for
one pattern at a time anyway, and we really _should_ check for NUL too
(but see the comment about "soft errors" in the code about why it
currently only checks for '/')
See the CONFIG_DCACHE_WORD_ACCESS case of hash_name() for how the name
lookup code looks for pathname terminating characters in parallel.
Link: https://lore.kernel.org/lkml/20190118161440.220134-2-jannh@google.com/
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jann Horn <jannh@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-05 18:32:52 +00:00
|
|
|
*/
|
|
|
|
static int verify_dirent_name(const char *name, int len)
|
|
|
|
{
|
2020-01-23 18:05:05 +00:00
|
|
|
if (len <= 0 || len >= PATH_MAX)
|
Make filldir[64]() verify the directory entry filename is valid
This has been discussed several times, and now filesystem people are
talking about doing it individually at the filesystem layer, so head
that off at the pass and just do it in getdents{64}().
This is partially based on a patch by Jann Horn, but checks for NUL
bytes as well, and somewhat simplified.
There's also commentary about how it might be better if invalid names
due to filesystem corruption don't cause an immediate failure, but only
an error at the end of the readdir(), so that people can still see the
filenames that are ok.
There's also been discussion about just how much POSIX strictly speaking
requires this since it's about filesystem corruption. It's really more
"protect user space from bad behavior" as pointed out by Jann. But
since Eric Biederman looked up the POSIX wording, here it is for context:
"From readdir:
The readdir() function shall return a pointer to a structure
representing the directory entry at the current position in the
directory stream specified by the argument dirp, and position the
directory stream at the next entry. It shall return a null pointer
upon reaching the end of the directory stream. The structure dirent
defined in the <dirent.h> header describes a directory entry.
From definitions:
3.129 Directory Entry (or Link)
An object that associates a filename with a file. Several directory
entries can associate names with the same file.
...
3.169 Filename
A name consisting of 1 to {NAME_MAX} bytes used to name a file. The
characters composing the name may be selected from the set of all
character values excluding the slash character and the null byte. The
filenames dot and dot-dot have special meaning. A filename is
sometimes referred to as a 'pathname component'."
Note that I didn't bother adding the checks to any legacy interfaces
that nobody uses.
Also note that if this ends up being noticeable as a performance
regression, we can fix that to do a much more optimized model that
checks for both NUL and '/' at the same time one word at a time.
We haven't really tended to optimize 'memchr()', and it only checks for
one pattern at a time anyway, and we really _should_ check for NUL too
(but see the comment about "soft errors" in the code about why it
currently only checks for '/')
See the CONFIG_DCACHE_WORD_ACCESS case of hash_name() for how the name
lookup code looks for pathname terminating characters in parallel.
Link: https://lore.kernel.org/lkml/20190118161440.220134-2-jannh@google.com/
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jann Horn <jannh@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-05 18:32:52 +00:00
|
|
|
return -EIO;
|
2019-10-18 22:41:16 +00:00
|
|
|
if (memchr(name, '/', len))
|
Make filldir[64]() verify the directory entry filename is valid
This has been discussed several times, and now filesystem people are
talking about doing it individually at the filesystem layer, so head
that off at the pass and just do it in getdents{64}().
This is partially based on a patch by Jann Horn, but checks for NUL
bytes as well, and somewhat simplified.
There's also commentary about how it might be better if invalid names
due to filesystem corruption don't cause an immediate failure, but only
an error at the end of the readdir(), so that people can still see the
filenames that are ok.
There's also been discussion about just how much POSIX strictly speaking
requires this since it's about filesystem corruption. It's really more
"protect user space from bad behavior" as pointed out by Jann. But
since Eric Biederman looked up the POSIX wording, here it is for context:
"From readdir:
The readdir() function shall return a pointer to a structure
representing the directory entry at the current position in the
directory stream specified by the argument dirp, and position the
directory stream at the next entry. It shall return a null pointer
upon reaching the end of the directory stream. The structure dirent
defined in the <dirent.h> header describes a directory entry.
From definitions:
3.129 Directory Entry (or Link)
An object that associates a filename with a file. Several directory
entries can associate names with the same file.
...
3.169 Filename
A name consisting of 1 to {NAME_MAX} bytes used to name a file. The
characters composing the name may be selected from the set of all
character values excluding the slash character and the null byte. The
filenames dot and dot-dot have special meaning. A filename is
sometimes referred to as a 'pathname component'."
Note that I didn't bother adding the checks to any legacy interfaces
that nobody uses.
Also note that if this ends up being noticeable as a performance
regression, we can fix that to do a much more optimized model that
checks for both NUL and '/' at the same time one word at a time.
We haven't really tended to optimize 'memchr()', and it only checks for
one pattern at a time anyway, and we really _should_ check for NUL too
(but see the comment about "soft errors" in the code about why it
currently only checks for '/')
See the CONFIG_DCACHE_WORD_ACCESS case of hash_name() for how the name
lookup code looks for pathname terminating characters in parallel.
Link: https://lore.kernel.org/lkml/20190118161440.220134-2-jannh@google.com/
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jann Horn <jannh@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-05 18:32:52 +00:00
|
|
|
return -EIO;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2005-04-16 22:20:36 +00:00
|
|
|
/*
|
|
|
|
* Traditional linux readdir() handling..
|
|
|
|
*
|
|
|
|
* "count=1" is a special case, meaning that the buffer is one
|
|
|
|
* dirent-structure in size and that the code can't handle more
|
|
|
|
* anyway. Thus the special "fillonedir()" function for that
|
|
|
|
* case (the low-level handlers don't need to care about this).
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifdef __ARCH_WANT_OLD_READDIR
|
|
|
|
|
|
|
|
struct old_linux_dirent {
|
|
|
|
unsigned long d_ino;
|
|
|
|
unsigned long d_offset;
|
|
|
|
unsigned short d_namlen;
|
2023-06-20 17:30:36 +00:00
|
|
|
char d_name[];
|
2005-04-16 22:20:36 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
struct readdir_callback {
|
2013-05-15 17:52:59 +00:00
|
|
|
struct dir_context ctx;
|
2005-04-16 22:20:36 +00:00
|
|
|
struct old_linux_dirent __user * dirent;
|
|
|
|
int result;
|
|
|
|
};
|
|
|
|
|
2022-08-16 15:57:56 +00:00
|
|
|
static bool fillonedir(struct dir_context *ctx, const char *name, int namlen,
|
2014-10-30 16:37:34 +00:00
|
|
|
loff_t offset, u64 ino, unsigned int d_type)
|
2005-04-16 22:20:36 +00:00
|
|
|
{
|
2014-10-30 16:37:34 +00:00
|
|
|
struct readdir_callback *buf =
|
|
|
|
container_of(ctx, struct readdir_callback, ctx);
|
2005-04-16 22:20:36 +00:00
|
|
|
struct old_linux_dirent __user * dirent;
|
2006-10-03 08:13:46 +00:00
|
|
|
unsigned long d_ino;
|
2005-04-16 22:20:36 +00:00
|
|
|
|
|
|
|
if (buf->result)
|
2022-08-16 15:57:56 +00:00
|
|
|
return false;
|
2021-04-17 16:27:04 +00:00
|
|
|
buf->result = verify_dirent_name(name, namlen);
|
2022-08-16 15:57:56 +00:00
|
|
|
if (buf->result)
|
|
|
|
return false;
|
2006-10-03 08:13:46 +00:00
|
|
|
d_ino = ino;
|
2008-08-12 04:28:24 +00:00
|
|
|
if (sizeof(d_ino) < sizeof(ino) && d_ino != ino) {
|
|
|
|
buf->result = -EOVERFLOW;
|
2022-08-16 15:57:56 +00:00
|
|
|
return false;
|
2008-08-12 04:28:24 +00:00
|
|
|
}
|
2005-04-16 22:20:36 +00:00
|
|
|
buf->result++;
|
|
|
|
dirent = buf->dirent;
|
2020-02-18 19:39:56 +00:00
|
|
|
if (!user_write_access_begin(dirent,
|
2005-04-16 22:20:36 +00:00
|
|
|
(unsigned long)(dirent->d_name + namlen + 1) -
|
|
|
|
(unsigned long)dirent))
|
|
|
|
goto efault;
|
2020-02-18 19:39:56 +00:00
|
|
|
unsafe_put_user(d_ino, &dirent->d_ino, efault_end);
|
|
|
|
unsafe_put_user(offset, &dirent->d_offset, efault_end);
|
|
|
|
unsafe_put_user(namlen, &dirent->d_namlen, efault_end);
|
|
|
|
unsafe_copy_dirent_name(dirent->d_name, name, namlen, efault_end);
|
|
|
|
user_write_access_end();
|
2022-08-16 15:57:56 +00:00
|
|
|
return true;
|
2020-02-18 19:39:56 +00:00
|
|
|
efault_end:
|
|
|
|
user_write_access_end();
|
2005-04-16 22:20:36 +00:00
|
|
|
efault:
|
|
|
|
buf->result = -EFAULT;
|
2022-08-16 15:57:56 +00:00
|
|
|
return false;
|
2005-04-16 22:20:36 +00:00
|
|
|
}
|
|
|
|
|
2009-01-14 13:14:34 +00:00
|
|
|
SYSCALL_DEFINE3(old_readdir, unsigned int, fd,
|
|
|
|
struct old_linux_dirent __user *, dirent, unsigned int, count)
|
2005-04-16 22:20:36 +00:00
|
|
|
{
|
|
|
|
int error;
|
2016-04-20 21:08:21 +00:00
|
|
|
struct fd f = fdget_pos(fd);
|
2013-05-23 02:22:04 +00:00
|
|
|
struct readdir_callback buf = {
|
|
|
|
.ctx.actor = fillonedir,
|
|
|
|
.dirent = dirent
|
|
|
|
};
|
2005-04-16 22:20:36 +00:00
|
|
|
|
2024-05-31 18:12:01 +00:00
|
|
|
if (!fd_file(f))
|
2012-04-21 22:40:32 +00:00
|
|
|
return -EBADF;
|
2005-04-16 22:20:36 +00:00
|
|
|
|
2024-05-31 18:12:01 +00:00
|
|
|
error = iterate_dir(fd_file(f), &buf.ctx);
|
2008-08-24 11:29:52 +00:00
|
|
|
if (buf.result)
|
2005-04-16 22:20:36 +00:00
|
|
|
error = buf.result;
|
|
|
|
|
2016-04-20 21:08:21 +00:00
|
|
|
fdput_pos(f);
|
2005-04-16 22:20:36 +00:00
|
|
|
return error;
|
|
|
|
}
|
|
|
|
|
|
|
|
#endif /* __ARCH_WANT_OLD_READDIR */
|
|
|
|
|
|
|
|
/*
|
|
|
|
* New, all-improved, singing, dancing, iBCS2-compliant getdents()
|
|
|
|
* interface.
|
|
|
|
*/
|
|
|
|
struct linux_dirent {
|
|
|
|
unsigned long d_ino;
|
|
|
|
unsigned long d_off;
|
|
|
|
unsigned short d_reclen;
|
2023-06-20 17:30:36 +00:00
|
|
|
char d_name[];
|
2005-04-16 22:20:36 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
struct getdents_callback {
|
2013-05-15 17:52:59 +00:00
|
|
|
struct dir_context ctx;
|
2005-04-16 22:20:36 +00:00
|
|
|
struct linux_dirent __user * current_dir;
|
readdir: make user_access_begin() use the real access range
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I changed filldir to not do individual __put_user()
accesses, but instead use unsafe_put_user() surrounded by the proper
user_access_begin/end() pair.
That make them enormously faster on modern x86, where the STAC/CLAC
games make individual user accesses fairly heavy-weight.
However, the user_access_begin() range was not really the exact right
one, since filldir() has the unfortunate problem that it needs to not
only fill out the new directory entry, it also needs to fix up the
previous one to contain the proper file offset.
It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
file offset of the directory entry itself - it's the offset of the next
one. So we end up backfilling the offset in the previous entry as we
walk along.
But since x86 didn't really care about the exact range, and used to be
the only architecture that did anything fancy in user_access_begin() to
begin with, the filldir[64]() changes did something lazy, and even
commented on it:
/*
* Note! This range-checks 'previous' (which may be NULL).
* The real range was checked in getdents
*/
if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;
and it all worked fine.
But now 32-bit ppc is starting to also implement user_access_begin(),
and the fact that we faked the range to only be the (possibly not even
valid) previous directory entry becomes a problem, because ppc32 will
actually be using the range that is passed in for more than just "check
that it's user space".
This is a complete rewrite of Christophe's original patch.
By saving off the record length of the previous entry instead of a
pointer to it in the filldir data structures, we can simplify the range
check and the writing of the previous entry d_off field. No need for
any conditionals in the user accesses themselves, although we retain the
conditional EINTR checking for the "was this the first directory entry"
signal handling latency logic.
Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
Reported-and-tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-22 20:37:25 +00:00
|
|
|
int prev_reclen;
|
2005-04-16 22:20:36 +00:00
|
|
|
int count;
|
|
|
|
int error;
|
|
|
|
};
|
|
|
|
|
2022-08-16 15:57:56 +00:00
|
|
|
static bool filldir(struct dir_context *ctx, const char *name, int namlen,
|
2014-10-30 16:37:34 +00:00
|
|
|
loff_t offset, u64 ino, unsigned int d_type)
|
2005-04-16 22:20:36 +00:00
|
|
|
{
|
readdir: make user_access_begin() use the real access range
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I changed filldir to not do individual __put_user()
accesses, but instead use unsafe_put_user() surrounded by the proper
user_access_begin/end() pair.
That make them enormously faster on modern x86, where the STAC/CLAC
games make individual user accesses fairly heavy-weight.
However, the user_access_begin() range was not really the exact right
one, since filldir() has the unfortunate problem that it needs to not
only fill out the new directory entry, it also needs to fix up the
previous one to contain the proper file offset.
It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
file offset of the directory entry itself - it's the offset of the next
one. So we end up backfilling the offset in the previous entry as we
walk along.
But since x86 didn't really care about the exact range, and used to be
the only architecture that did anything fancy in user_access_begin() to
begin with, the filldir[64]() changes did something lazy, and even
commented on it:
/*
* Note! This range-checks 'previous' (which may be NULL).
* The real range was checked in getdents
*/
if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;
and it all worked fine.
But now 32-bit ppc is starting to also implement user_access_begin(),
and the fact that we faked the range to only be the (possibly not even
valid) previous directory entry becomes a problem, because ppc32 will
actually be using the range that is passed in for more than just "check
that it's user space".
This is a complete rewrite of Christophe's original patch.
By saving off the record length of the previous entry instead of a
pointer to it in the filldir data structures, we can simplify the range
check and the writing of the previous entry d_off field. No need for
any conditionals in the user accesses themselves, although we retain the
conditional EINTR checking for the "was this the first directory entry"
signal handling latency logic.
Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
Reported-and-tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-22 20:37:25 +00:00
|
|
|
struct linux_dirent __user *dirent, *prev;
|
2014-10-30 16:37:34 +00:00
|
|
|
struct getdents_callback *buf =
|
|
|
|
container_of(ctx, struct getdents_callback, ctx);
|
2006-10-03 08:13:46 +00:00
|
|
|
unsigned long d_ino;
|
2010-08-10 00:20:22 +00:00
|
|
|
int reclen = ALIGN(offsetof(struct linux_dirent, d_name) + namlen + 2,
|
|
|
|
sizeof(long));
|
readdir: make user_access_begin() use the real access range
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I changed filldir to not do individual __put_user()
accesses, but instead use unsafe_put_user() surrounded by the proper
user_access_begin/end() pair.
That make them enormously faster on modern x86, where the STAC/CLAC
games make individual user accesses fairly heavy-weight.
However, the user_access_begin() range was not really the exact right
one, since filldir() has the unfortunate problem that it needs to not
only fill out the new directory entry, it also needs to fix up the
previous one to contain the proper file offset.
It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
file offset of the directory entry itself - it's the offset of the next
one. So we end up backfilling the offset in the previous entry as we
walk along.
But since x86 didn't really care about the exact range, and used to be
the only architecture that did anything fancy in user_access_begin() to
begin with, the filldir[64]() changes did something lazy, and even
commented on it:
/*
* Note! This range-checks 'previous' (which may be NULL).
* The real range was checked in getdents
*/
if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;
and it all worked fine.
But now 32-bit ppc is starting to also implement user_access_begin(),
and the fact that we faked the range to only be the (possibly not even
valid) previous directory entry becomes a problem, because ppc32 will
actually be using the range that is passed in for more than just "check
that it's user space".
This is a complete rewrite of Christophe's original patch.
By saving off the record length of the previous entry instead of a
pointer to it in the filldir data structures, we can simplify the range
check and the writing of the previous entry d_off field. No need for
any conditionals in the user accesses themselves, although we retain the
conditional EINTR checking for the "was this the first directory entry"
signal handling latency logic.
Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
Reported-and-tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-22 20:37:25 +00:00
|
|
|
int prev_reclen;
|
2005-04-16 22:20:36 +00:00
|
|
|
|
Make filldir[64]() verify the directory entry filename is valid
This has been discussed several times, and now filesystem people are
talking about doing it individually at the filesystem layer, so head
that off at the pass and just do it in getdents{64}().
This is partially based on a patch by Jann Horn, but checks for NUL
bytes as well, and somewhat simplified.
There's also commentary about how it might be better if invalid names
due to filesystem corruption don't cause an immediate failure, but only
an error at the end of the readdir(), so that people can still see the
filenames that are ok.
There's also been discussion about just how much POSIX strictly speaking
requires this since it's about filesystem corruption. It's really more
"protect user space from bad behavior" as pointed out by Jann. But
since Eric Biederman looked up the POSIX wording, here it is for context:
"From readdir:
The readdir() function shall return a pointer to a structure
representing the directory entry at the current position in the
directory stream specified by the argument dirp, and position the
directory stream at the next entry. It shall return a null pointer
upon reaching the end of the directory stream. The structure dirent
defined in the <dirent.h> header describes a directory entry.
From definitions:
3.129 Directory Entry (or Link)
An object that associates a filename with a file. Several directory
entries can associate names with the same file.
...
3.169 Filename
A name consisting of 1 to {NAME_MAX} bytes used to name a file. The
characters composing the name may be selected from the set of all
character values excluding the slash character and the null byte. The
filenames dot and dot-dot have special meaning. A filename is
sometimes referred to as a 'pathname component'."
Note that I didn't bother adding the checks to any legacy interfaces
that nobody uses.
Also note that if this ends up being noticeable as a performance
regression, we can fix that to do a much more optimized model that
checks for both NUL and '/' at the same time one word at a time.
We haven't really tended to optimize 'memchr()', and it only checks for
one pattern at a time anyway, and we really _should_ check for NUL too
(but see the comment about "soft errors" in the code about why it
currently only checks for '/')
See the CONFIG_DCACHE_WORD_ACCESS case of hash_name() for how the name
lookup code looks for pathname terminating characters in parallel.
Link: https://lore.kernel.org/lkml/20190118161440.220134-2-jannh@google.com/
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jann Horn <jannh@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-05 18:32:52 +00:00
|
|
|
buf->error = verify_dirent_name(name, namlen);
|
|
|
|
if (unlikely(buf->error))
|
2022-08-16 15:57:56 +00:00
|
|
|
return false;
|
2005-04-16 22:20:36 +00:00
|
|
|
buf->error = -EINVAL; /* only used if we fail.. */
|
|
|
|
if (reclen > buf->count)
|
2022-08-16 15:57:56 +00:00
|
|
|
return false;
|
2006-10-03 08:13:46 +00:00
|
|
|
d_ino = ino;
|
2008-08-12 04:28:24 +00:00
|
|
|
if (sizeof(d_ino) < sizeof(ino) && d_ino != ino) {
|
|
|
|
buf->error = -EOVERFLOW;
|
2022-08-16 15:57:56 +00:00
|
|
|
return false;
|
2008-08-12 04:28:24 +00:00
|
|
|
}
|
readdir: make user_access_begin() use the real access range
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I changed filldir to not do individual __put_user()
accesses, but instead use unsafe_put_user() surrounded by the proper
user_access_begin/end() pair.
That make them enormously faster on modern x86, where the STAC/CLAC
games make individual user accesses fairly heavy-weight.
However, the user_access_begin() range was not really the exact right
one, since filldir() has the unfortunate problem that it needs to not
only fill out the new directory entry, it also needs to fix up the
previous one to contain the proper file offset.
It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
file offset of the directory entry itself - it's the offset of the next
one. So we end up backfilling the offset in the previous entry as we
walk along.
But since x86 didn't really care about the exact range, and used to be
the only architecture that did anything fancy in user_access_begin() to
begin with, the filldir[64]() changes did something lazy, and even
commented on it:
/*
* Note! This range-checks 'previous' (which may be NULL).
* The real range was checked in getdents
*/
if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;
and it all worked fine.
But now 32-bit ppc is starting to also implement user_access_begin(),
and the fact that we faked the range to only be the (possibly not even
valid) previous directory entry becomes a problem, because ppc32 will
actually be using the range that is passed in for more than just "check
that it's user space".
This is a complete rewrite of Christophe's original patch.
By saving off the record length of the previous entry instead of a
pointer to it in the filldir data structures, we can simplify the range
check and the writing of the previous entry d_off field. No need for
any conditionals in the user accesses themselves, although we retain the
conditional EINTR checking for the "was this the first directory entry"
signal handling latency logic.
Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
Reported-and-tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-22 20:37:25 +00:00
|
|
|
prev_reclen = buf->prev_reclen;
|
|
|
|
if (prev_reclen && signal_pending(current))
|
2022-08-16 15:57:56 +00:00
|
|
|
return false;
|
Convert filldir[64]() from __put_user() to unsafe_put_user()
We really should avoid the "__{get,put}_user()" functions entirely,
because they can easily be mis-used and the original intent of being
used for simple direct user accesses no longer holds in a post-SMAP/PAN
world.
Manually optimizing away the user access range check makes no sense any
more, when the range check is generally much cheaper than the "enable
user accesses" code that the __{get,put}_user() functions still need.
So instead of __put_user(), use the unsafe_put_user() interface with
user_access_{begin,end}() that really does generate better code these
days, and which is generally a nicer interface. Under some loads, the
multiple user writes that filldir() does are actually quite noticeable.
This also makes the dirent name copy use unsafe_put_user() with a couple
of macros. We do not want to make function calls with SMAP/PAN
disabled, and the code this generates is quite good when the
architecture uses "asm goto" for unsafe_put_user() like x86 does.
Note that this doesn't bother with the legacy cases. Nobody should use
them anyway, so performance doesn't really matter there.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-22 04:59:07 +00:00
|
|
|
dirent = buf->current_dir;
|
readdir: make user_access_begin() use the real access range
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I changed filldir to not do individual __put_user()
accesses, but instead use unsafe_put_user() surrounded by the proper
user_access_begin/end() pair.
That make them enormously faster on modern x86, where the STAC/CLAC
games make individual user accesses fairly heavy-weight.
However, the user_access_begin() range was not really the exact right
one, since filldir() has the unfortunate problem that it needs to not
only fill out the new directory entry, it also needs to fix up the
previous one to contain the proper file offset.
It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
file offset of the directory entry itself - it's the offset of the next
one. So we end up backfilling the offset in the previous entry as we
walk along.
But since x86 didn't really care about the exact range, and used to be
the only architecture that did anything fancy in user_access_begin() to
begin with, the filldir[64]() changes did something lazy, and even
commented on it:
/*
* Note! This range-checks 'previous' (which may be NULL).
* The real range was checked in getdents
*/
if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;
and it all worked fine.
But now 32-bit ppc is starting to also implement user_access_begin(),
and the fact that we faked the range to only be the (possibly not even
valid) previous directory entry becomes a problem, because ppc32 will
actually be using the range that is passed in for more than just "check
that it's user space".
This is a complete rewrite of Christophe's original patch.
By saving off the record length of the previous entry instead of a
pointer to it in the filldir data structures, we can simplify the range
check and the writing of the previous entry d_off field. No need for
any conditionals in the user accesses themselves, although we retain the
conditional EINTR checking for the "was this the first directory entry"
signal handling latency logic.
Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
Reported-and-tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-22 20:37:25 +00:00
|
|
|
prev = (void __user *) dirent - prev_reclen;
|
2020-04-03 07:20:51 +00:00
|
|
|
if (!user_write_access_begin(prev, reclen + prev_reclen))
|
readdir: make user_access_begin() use the real access range
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I changed filldir to not do individual __put_user()
accesses, but instead use unsafe_put_user() surrounded by the proper
user_access_begin/end() pair.
That make them enormously faster on modern x86, where the STAC/CLAC
games make individual user accesses fairly heavy-weight.
However, the user_access_begin() range was not really the exact right
one, since filldir() has the unfortunate problem that it needs to not
only fill out the new directory entry, it also needs to fix up the
previous one to contain the proper file offset.
It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
file offset of the directory entry itself - it's the offset of the next
one. So we end up backfilling the offset in the previous entry as we
walk along.
But since x86 didn't really care about the exact range, and used to be
the only architecture that did anything fancy in user_access_begin() to
begin with, the filldir[64]() changes did something lazy, and even
commented on it:
/*
* Note! This range-checks 'previous' (which may be NULL).
* The real range was checked in getdents
*/
if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;
and it all worked fine.
But now 32-bit ppc is starting to also implement user_access_begin(),
and the fact that we faked the range to only be the (possibly not even
valid) previous directory entry becomes a problem, because ppc32 will
actually be using the range that is passed in for more than just "check
that it's user space".
This is a complete rewrite of Christophe's original patch.
By saving off the record length of the previous entry instead of a
pointer to it in the filldir data structures, we can simplify the range
check and the writing of the previous entry d_off field. No need for
any conditionals in the user accesses themselves, although we retain the
conditional EINTR checking for the "was this the first directory entry"
signal handling latency logic.
Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
Reported-and-tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-22 20:37:25 +00:00
|
|
|
goto efault;
|
|
|
|
|
|
|
|
/* This might be 'dirent->d_off', but if so it will get overwritten */
|
|
|
|
unsafe_put_user(offset, &prev->d_off, efault_end);
|
Convert filldir[64]() from __put_user() to unsafe_put_user()
We really should avoid the "__{get,put}_user()" functions entirely,
because they can easily be mis-used and the original intent of being
used for simple direct user accesses no longer holds in a post-SMAP/PAN
world.
Manually optimizing away the user access range check makes no sense any
more, when the range check is generally much cheaper than the "enable
user accesses" code that the __{get,put}_user() functions still need.
So instead of __put_user(), use the unsafe_put_user() interface with
user_access_{begin,end}() that really does generate better code these
days, and which is generally a nicer interface. Under some loads, the
multiple user writes that filldir() does are actually quite noticeable.
This also makes the dirent name copy use unsafe_put_user() with a couple
of macros. We do not want to make function calls with SMAP/PAN
disabled, and the code this generates is quite good when the
architecture uses "asm goto" for unsafe_put_user() like x86 does.
Note that this doesn't bother with the legacy cases. Nobody should use
them anyway, so performance doesn't really matter there.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-22 04:59:07 +00:00
|
|
|
unsafe_put_user(d_ino, &dirent->d_ino, efault_end);
|
|
|
|
unsafe_put_user(reclen, &dirent->d_reclen, efault_end);
|
|
|
|
unsafe_put_user(d_type, (char __user *) dirent + reclen - 1, efault_end);
|
|
|
|
unsafe_copy_dirent_name(dirent->d_name, name, namlen, efault_end);
|
2020-04-03 07:20:51 +00:00
|
|
|
user_write_access_end();
|
Convert filldir[64]() from __put_user() to unsafe_put_user()
We really should avoid the "__{get,put}_user()" functions entirely,
because they can easily be mis-used and the original intent of being
used for simple direct user accesses no longer holds in a post-SMAP/PAN
world.
Manually optimizing away the user access range check makes no sense any
more, when the range check is generally much cheaper than the "enable
user accesses" code that the __{get,put}_user() functions still need.
So instead of __put_user(), use the unsafe_put_user() interface with
user_access_{begin,end}() that really does generate better code these
days, and which is generally a nicer interface. Under some loads, the
multiple user writes that filldir() does are actually quite noticeable.
This also makes the dirent name copy use unsafe_put_user() with a couple
of macros. We do not want to make function calls with SMAP/PAN
disabled, and the code this generates is quite good when the
architecture uses "asm goto" for unsafe_put_user() like x86 does.
Note that this doesn't bother with the legacy cases. Nobody should use
them anyway, so performance doesn't really matter there.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-22 04:59:07 +00:00
|
|
|
|
readdir: make user_access_begin() use the real access range
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I changed filldir to not do individual __put_user()
accesses, but instead use unsafe_put_user() surrounded by the proper
user_access_begin/end() pair.
That make them enormously faster on modern x86, where the STAC/CLAC
games make individual user accesses fairly heavy-weight.
However, the user_access_begin() range was not really the exact right
one, since filldir() has the unfortunate problem that it needs to not
only fill out the new directory entry, it also needs to fix up the
previous one to contain the proper file offset.
It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
file offset of the directory entry itself - it's the offset of the next
one. So we end up backfilling the offset in the previous entry as we
walk along.
But since x86 didn't really care about the exact range, and used to be
the only architecture that did anything fancy in user_access_begin() to
begin with, the filldir[64]() changes did something lazy, and even
commented on it:
/*
* Note! This range-checks 'previous' (which may be NULL).
* The real range was checked in getdents
*/
if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;
and it all worked fine.
But now 32-bit ppc is starting to also implement user_access_begin(),
and the fact that we faked the range to only be the (possibly not even
valid) previous directory entry becomes a problem, because ppc32 will
actually be using the range that is passed in for more than just "check
that it's user space".
This is a complete rewrite of Christophe's original patch.
By saving off the record length of the previous entry instead of a
pointer to it in the filldir data structures, we can simplify the range
check and the writing of the previous entry d_off field. No need for
any conditionals in the user accesses themselves, although we retain the
conditional EINTR checking for the "was this the first directory entry"
signal handling latency logic.
Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
Reported-and-tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-22 20:37:25 +00:00
|
|
|
buf->current_dir = (void __user *)dirent + reclen;
|
|
|
|
buf->prev_reclen = reclen;
|
2005-04-16 22:20:36 +00:00
|
|
|
buf->count -= reclen;
|
2022-08-16 15:57:56 +00:00
|
|
|
return true;
|
Convert filldir[64]() from __put_user() to unsafe_put_user()
We really should avoid the "__{get,put}_user()" functions entirely,
because they can easily be mis-used and the original intent of being
used for simple direct user accesses no longer holds in a post-SMAP/PAN
world.
Manually optimizing away the user access range check makes no sense any
more, when the range check is generally much cheaper than the "enable
user accesses" code that the __{get,put}_user() functions still need.
So instead of __put_user(), use the unsafe_put_user() interface with
user_access_{begin,end}() that really does generate better code these
days, and which is generally a nicer interface. Under some loads, the
multiple user writes that filldir() does are actually quite noticeable.
This also makes the dirent name copy use unsafe_put_user() with a couple
of macros. We do not want to make function calls with SMAP/PAN
disabled, and the code this generates is quite good when the
architecture uses "asm goto" for unsafe_put_user() like x86 does.
Note that this doesn't bother with the legacy cases. Nobody should use
them anyway, so performance doesn't really matter there.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-22 04:59:07 +00:00
|
|
|
efault_end:
|
2020-04-03 07:20:51 +00:00
|
|
|
user_write_access_end();
|
2005-04-16 22:20:36 +00:00
|
|
|
efault:
|
|
|
|
buf->error = -EFAULT;
|
2022-08-16 15:57:56 +00:00
|
|
|
return false;
|
2005-04-16 22:20:36 +00:00
|
|
|
}
|
|
|
|
|
2009-01-14 13:14:23 +00:00
|
|
|
SYSCALL_DEFINE3(getdents, unsigned int, fd,
|
|
|
|
struct linux_dirent __user *, dirent, unsigned int, count)
|
2005-04-16 22:20:36 +00:00
|
|
|
{
|
2012-08-28 16:52:22 +00:00
|
|
|
struct fd f;
|
2013-05-23 02:22:04 +00:00
|
|
|
struct getdents_callback buf = {
|
|
|
|
.ctx.actor = filldir,
|
|
|
|
.count = count,
|
|
|
|
.current_dir = dirent
|
|
|
|
};
|
2005-04-16 22:20:36 +00:00
|
|
|
int error;
|
|
|
|
|
2016-04-20 21:08:21 +00:00
|
|
|
f = fdget_pos(fd);
|
2024-05-31 18:12:01 +00:00
|
|
|
if (!fd_file(f))
|
2012-04-21 22:40:32 +00:00
|
|
|
return -EBADF;
|
2005-04-16 22:20:36 +00:00
|
|
|
|
2024-05-31 18:12:01 +00:00
|
|
|
error = iterate_dir(fd_file(f), &buf.ctx);
|
2008-08-24 11:29:52 +00:00
|
|
|
if (error >= 0)
|
|
|
|
error = buf.error;
|
readdir: make user_access_begin() use the real access range
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I changed filldir to not do individual __put_user()
accesses, but instead use unsafe_put_user() surrounded by the proper
user_access_begin/end() pair.
That make them enormously faster on modern x86, where the STAC/CLAC
games make individual user accesses fairly heavy-weight.
However, the user_access_begin() range was not really the exact right
one, since filldir() has the unfortunate problem that it needs to not
only fill out the new directory entry, it also needs to fix up the
previous one to contain the proper file offset.
It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
file offset of the directory entry itself - it's the offset of the next
one. So we end up backfilling the offset in the previous entry as we
walk along.
But since x86 didn't really care about the exact range, and used to be
the only architecture that did anything fancy in user_access_begin() to
begin with, the filldir[64]() changes did something lazy, and even
commented on it:
/*
* Note! This range-checks 'previous' (which may be NULL).
* The real range was checked in getdents
*/
if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;
and it all worked fine.
But now 32-bit ppc is starting to also implement user_access_begin(),
and the fact that we faked the range to only be the (possibly not even
valid) previous directory entry becomes a problem, because ppc32 will
actually be using the range that is passed in for more than just "check
that it's user space".
This is a complete rewrite of Christophe's original patch.
By saving off the record length of the previous entry instead of a
pointer to it in the filldir data structures, we can simplify the range
check and the writing of the previous entry d_off field. No need for
any conditionals in the user accesses themselves, although we retain the
conditional EINTR checking for the "was this the first directory entry"
signal handling latency logic.
Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
Reported-and-tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-22 20:37:25 +00:00
|
|
|
if (buf.prev_reclen) {
|
|
|
|
struct linux_dirent __user * lastdirent;
|
|
|
|
lastdirent = (void __user *)buf.current_dir - buf.prev_reclen;
|
|
|
|
|
2013-05-15 22:49:12 +00:00
|
|
|
if (put_user(buf.ctx.pos, &lastdirent->d_off))
|
2005-04-16 22:20:36 +00:00
|
|
|
error = -EFAULT;
|
|
|
|
else
|
|
|
|
error = count - buf.count;
|
|
|
|
}
|
2016-04-20 21:08:21 +00:00
|
|
|
fdput_pos(f);
|
2005-04-16 22:20:36 +00:00
|
|
|
return error;
|
|
|
|
}
|
|
|
|
|
|
|
|
struct getdents_callback64 {
|
2013-05-15 17:52:59 +00:00
|
|
|
struct dir_context ctx;
|
2005-04-16 22:20:36 +00:00
|
|
|
struct linux_dirent64 __user * current_dir;
|
readdir: make user_access_begin() use the real access range
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I changed filldir to not do individual __put_user()
accesses, but instead use unsafe_put_user() surrounded by the proper
user_access_begin/end() pair.
That make them enormously faster on modern x86, where the STAC/CLAC
games make individual user accesses fairly heavy-weight.
However, the user_access_begin() range was not really the exact right
one, since filldir() has the unfortunate problem that it needs to not
only fill out the new directory entry, it also needs to fix up the
previous one to contain the proper file offset.
It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
file offset of the directory entry itself - it's the offset of the next
one. So we end up backfilling the offset in the previous entry as we
walk along.
But since x86 didn't really care about the exact range, and used to be
the only architecture that did anything fancy in user_access_begin() to
begin with, the filldir[64]() changes did something lazy, and even
commented on it:
/*
* Note! This range-checks 'previous' (which may be NULL).
* The real range was checked in getdents
*/
if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;
and it all worked fine.
But now 32-bit ppc is starting to also implement user_access_begin(),
and the fact that we faked the range to only be the (possibly not even
valid) previous directory entry becomes a problem, because ppc32 will
actually be using the range that is passed in for more than just "check
that it's user space".
This is a complete rewrite of Christophe's original patch.
By saving off the record length of the previous entry instead of a
pointer to it in the filldir data structures, we can simplify the range
check and the writing of the previous entry d_off field. No need for
any conditionals in the user accesses themselves, although we retain the
conditional EINTR checking for the "was this the first directory entry"
signal handling latency logic.
Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
Reported-and-tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-22 20:37:25 +00:00
|
|
|
int prev_reclen;
|
2005-04-16 22:20:36 +00:00
|
|
|
int count;
|
|
|
|
int error;
|
|
|
|
};
|
|
|
|
|
2022-08-16 15:57:56 +00:00
|
|
|
static bool filldir64(struct dir_context *ctx, const char *name, int namlen,
|
2014-10-30 16:37:34 +00:00
|
|
|
loff_t offset, u64 ino, unsigned int d_type)
|
2005-04-16 22:20:36 +00:00
|
|
|
{
|
readdir: make user_access_begin() use the real access range
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I changed filldir to not do individual __put_user()
accesses, but instead use unsafe_put_user() surrounded by the proper
user_access_begin/end() pair.
That make them enormously faster on modern x86, where the STAC/CLAC
games make individual user accesses fairly heavy-weight.
However, the user_access_begin() range was not really the exact right
one, since filldir() has the unfortunate problem that it needs to not
only fill out the new directory entry, it also needs to fix up the
previous one to contain the proper file offset.
It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
file offset of the directory entry itself - it's the offset of the next
one. So we end up backfilling the offset in the previous entry as we
walk along.
But since x86 didn't really care about the exact range, and used to be
the only architecture that did anything fancy in user_access_begin() to
begin with, the filldir[64]() changes did something lazy, and even
commented on it:
/*
* Note! This range-checks 'previous' (which may be NULL).
* The real range was checked in getdents
*/
if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;
and it all worked fine.
But now 32-bit ppc is starting to also implement user_access_begin(),
and the fact that we faked the range to only be the (possibly not even
valid) previous directory entry becomes a problem, because ppc32 will
actually be using the range that is passed in for more than just "check
that it's user space".
This is a complete rewrite of Christophe's original patch.
By saving off the record length of the previous entry instead of a
pointer to it in the filldir data structures, we can simplify the range
check and the writing of the previous entry d_off field. No need for
any conditionals in the user accesses themselves, although we retain the
conditional EINTR checking for the "was this the first directory entry"
signal handling latency logic.
Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
Reported-and-tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-22 20:37:25 +00:00
|
|
|
struct linux_dirent64 __user *dirent, *prev;
|
2014-10-30 16:37:34 +00:00
|
|
|
struct getdents_callback64 *buf =
|
|
|
|
container_of(ctx, struct getdents_callback64, ctx);
|
2010-08-10 00:20:22 +00:00
|
|
|
int reclen = ALIGN(offsetof(struct linux_dirent64, d_name) + namlen + 1,
|
|
|
|
sizeof(u64));
|
readdir: make user_access_begin() use the real access range
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I changed filldir to not do individual __put_user()
accesses, but instead use unsafe_put_user() surrounded by the proper
user_access_begin/end() pair.
That make them enormously faster on modern x86, where the STAC/CLAC
games make individual user accesses fairly heavy-weight.
However, the user_access_begin() range was not really the exact right
one, since filldir() has the unfortunate problem that it needs to not
only fill out the new directory entry, it also needs to fix up the
previous one to contain the proper file offset.
It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
file offset of the directory entry itself - it's the offset of the next
one. So we end up backfilling the offset in the previous entry as we
walk along.
But since x86 didn't really care about the exact range, and used to be
the only architecture that did anything fancy in user_access_begin() to
begin with, the filldir[64]() changes did something lazy, and even
commented on it:
/*
* Note! This range-checks 'previous' (which may be NULL).
* The real range was checked in getdents
*/
if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;
and it all worked fine.
But now 32-bit ppc is starting to also implement user_access_begin(),
and the fact that we faked the range to only be the (possibly not even
valid) previous directory entry becomes a problem, because ppc32 will
actually be using the range that is passed in for more than just "check
that it's user space".
This is a complete rewrite of Christophe's original patch.
By saving off the record length of the previous entry instead of a
pointer to it in the filldir data structures, we can simplify the range
check and the writing of the previous entry d_off field. No need for
any conditionals in the user accesses themselves, although we retain the
conditional EINTR checking for the "was this the first directory entry"
signal handling latency logic.
Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
Reported-and-tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-22 20:37:25 +00:00
|
|
|
int prev_reclen;
|
2005-04-16 22:20:36 +00:00
|
|
|
|
Make filldir[64]() verify the directory entry filename is valid
This has been discussed several times, and now filesystem people are
talking about doing it individually at the filesystem layer, so head
that off at the pass and just do it in getdents{64}().
This is partially based on a patch by Jann Horn, but checks for NUL
bytes as well, and somewhat simplified.
There's also commentary about how it might be better if invalid names
due to filesystem corruption don't cause an immediate failure, but only
an error at the end of the readdir(), so that people can still see the
filenames that are ok.
There's also been discussion about just how much POSIX strictly speaking
requires this since it's about filesystem corruption. It's really more
"protect user space from bad behavior" as pointed out by Jann. But
since Eric Biederman looked up the POSIX wording, here it is for context:
"From readdir:
The readdir() function shall return a pointer to a structure
representing the directory entry at the current position in the
directory stream specified by the argument dirp, and position the
directory stream at the next entry. It shall return a null pointer
upon reaching the end of the directory stream. The structure dirent
defined in the <dirent.h> header describes a directory entry.
From definitions:
3.129 Directory Entry (or Link)
An object that associates a filename with a file. Several directory
entries can associate names with the same file.
...
3.169 Filename
A name consisting of 1 to {NAME_MAX} bytes used to name a file. The
characters composing the name may be selected from the set of all
character values excluding the slash character and the null byte. The
filenames dot and dot-dot have special meaning. A filename is
sometimes referred to as a 'pathname component'."
Note that I didn't bother adding the checks to any legacy interfaces
that nobody uses.
Also note that if this ends up being noticeable as a performance
regression, we can fix that to do a much more optimized model that
checks for both NUL and '/' at the same time one word at a time.
We haven't really tended to optimize 'memchr()', and it only checks for
one pattern at a time anyway, and we really _should_ check for NUL too
(but see the comment about "soft errors" in the code about why it
currently only checks for '/')
See the CONFIG_DCACHE_WORD_ACCESS case of hash_name() for how the name
lookup code looks for pathname terminating characters in parallel.
Link: https://lore.kernel.org/lkml/20190118161440.220134-2-jannh@google.com/
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jann Horn <jannh@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-05 18:32:52 +00:00
|
|
|
buf->error = verify_dirent_name(name, namlen);
|
|
|
|
if (unlikely(buf->error))
|
2022-08-16 15:57:56 +00:00
|
|
|
return false;
|
2005-04-16 22:20:36 +00:00
|
|
|
buf->error = -EINVAL; /* only used if we fail.. */
|
|
|
|
if (reclen > buf->count)
|
2022-08-16 15:57:56 +00:00
|
|
|
return false;
|
readdir: make user_access_begin() use the real access range
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I changed filldir to not do individual __put_user()
accesses, but instead use unsafe_put_user() surrounded by the proper
user_access_begin/end() pair.
That make them enormously faster on modern x86, where the STAC/CLAC
games make individual user accesses fairly heavy-weight.
However, the user_access_begin() range was not really the exact right
one, since filldir() has the unfortunate problem that it needs to not
only fill out the new directory entry, it also needs to fix up the
previous one to contain the proper file offset.
It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
file offset of the directory entry itself - it's the offset of the next
one. So we end up backfilling the offset in the previous entry as we
walk along.
But since x86 didn't really care about the exact range, and used to be
the only architecture that did anything fancy in user_access_begin() to
begin with, the filldir[64]() changes did something lazy, and even
commented on it:
/*
* Note! This range-checks 'previous' (which may be NULL).
* The real range was checked in getdents
*/
if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;
and it all worked fine.
But now 32-bit ppc is starting to also implement user_access_begin(),
and the fact that we faked the range to only be the (possibly not even
valid) previous directory entry becomes a problem, because ppc32 will
actually be using the range that is passed in for more than just "check
that it's user space".
This is a complete rewrite of Christophe's original patch.
By saving off the record length of the previous entry instead of a
pointer to it in the filldir data structures, we can simplify the range
check and the writing of the previous entry d_off field. No need for
any conditionals in the user accesses themselves, although we retain the
conditional EINTR checking for the "was this the first directory entry"
signal handling latency logic.
Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
Reported-and-tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-22 20:37:25 +00:00
|
|
|
prev_reclen = buf->prev_reclen;
|
|
|
|
if (prev_reclen && signal_pending(current))
|
2022-08-16 15:57:56 +00:00
|
|
|
return false;
|
Convert filldir[64]() from __put_user() to unsafe_put_user()
We really should avoid the "__{get,put}_user()" functions entirely,
because they can easily be mis-used and the original intent of being
used for simple direct user accesses no longer holds in a post-SMAP/PAN
world.
Manually optimizing away the user access range check makes no sense any
more, when the range check is generally much cheaper than the "enable
user accesses" code that the __{get,put}_user() functions still need.
So instead of __put_user(), use the unsafe_put_user() interface with
user_access_{begin,end}() that really does generate better code these
days, and which is generally a nicer interface. Under some loads, the
multiple user writes that filldir() does are actually quite noticeable.
This also makes the dirent name copy use unsafe_put_user() with a couple
of macros. We do not want to make function calls with SMAP/PAN
disabled, and the code this generates is quite good when the
architecture uses "asm goto" for unsafe_put_user() like x86 does.
Note that this doesn't bother with the legacy cases. Nobody should use
them anyway, so performance doesn't really matter there.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-22 04:59:07 +00:00
|
|
|
dirent = buf->current_dir;
|
readdir: make user_access_begin() use the real access range
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I changed filldir to not do individual __put_user()
accesses, but instead use unsafe_put_user() surrounded by the proper
user_access_begin/end() pair.
That make them enormously faster on modern x86, where the STAC/CLAC
games make individual user accesses fairly heavy-weight.
However, the user_access_begin() range was not really the exact right
one, since filldir() has the unfortunate problem that it needs to not
only fill out the new directory entry, it also needs to fix up the
previous one to contain the proper file offset.
It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
file offset of the directory entry itself - it's the offset of the next
one. So we end up backfilling the offset in the previous entry as we
walk along.
But since x86 didn't really care about the exact range, and used to be
the only architecture that did anything fancy in user_access_begin() to
begin with, the filldir[64]() changes did something lazy, and even
commented on it:
/*
* Note! This range-checks 'previous' (which may be NULL).
* The real range was checked in getdents
*/
if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;
and it all worked fine.
But now 32-bit ppc is starting to also implement user_access_begin(),
and the fact that we faked the range to only be the (possibly not even
valid) previous directory entry becomes a problem, because ppc32 will
actually be using the range that is passed in for more than just "check
that it's user space".
This is a complete rewrite of Christophe's original patch.
By saving off the record length of the previous entry instead of a
pointer to it in the filldir data structures, we can simplify the range
check and the writing of the previous entry d_off field. No need for
any conditionals in the user accesses themselves, although we retain the
conditional EINTR checking for the "was this the first directory entry"
signal handling latency logic.
Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
Reported-and-tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-22 20:37:25 +00:00
|
|
|
prev = (void __user *)dirent - prev_reclen;
|
2020-04-03 07:20:51 +00:00
|
|
|
if (!user_write_access_begin(prev, reclen + prev_reclen))
|
readdir: make user_access_begin() use the real access range
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I changed filldir to not do individual __put_user()
accesses, but instead use unsafe_put_user() surrounded by the proper
user_access_begin/end() pair.
That make them enormously faster on modern x86, where the STAC/CLAC
games make individual user accesses fairly heavy-weight.
However, the user_access_begin() range was not really the exact right
one, since filldir() has the unfortunate problem that it needs to not
only fill out the new directory entry, it also needs to fix up the
previous one to contain the proper file offset.
It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
file offset of the directory entry itself - it's the offset of the next
one. So we end up backfilling the offset in the previous entry as we
walk along.
But since x86 didn't really care about the exact range, and used to be
the only architecture that did anything fancy in user_access_begin() to
begin with, the filldir[64]() changes did something lazy, and even
commented on it:
/*
* Note! This range-checks 'previous' (which may be NULL).
* The real range was checked in getdents
*/
if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;
and it all worked fine.
But now 32-bit ppc is starting to also implement user_access_begin(),
and the fact that we faked the range to only be the (possibly not even
valid) previous directory entry becomes a problem, because ppc32 will
actually be using the range that is passed in for more than just "check
that it's user space".
This is a complete rewrite of Christophe's original patch.
By saving off the record length of the previous entry instead of a
pointer to it in the filldir data structures, we can simplify the range
check and the writing of the previous entry d_off field. No need for
any conditionals in the user accesses themselves, although we retain the
conditional EINTR checking for the "was this the first directory entry"
signal handling latency logic.
Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
Reported-and-tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-22 20:37:25 +00:00
|
|
|
goto efault;
|
|
|
|
|
|
|
|
/* This might be 'dirent->d_off', but if so it will get overwritten */
|
|
|
|
unsafe_put_user(offset, &prev->d_off, efault_end);
|
Convert filldir[64]() from __put_user() to unsafe_put_user()
We really should avoid the "__{get,put}_user()" functions entirely,
because they can easily be mis-used and the original intent of being
used for simple direct user accesses no longer holds in a post-SMAP/PAN
world.
Manually optimizing away the user access range check makes no sense any
more, when the range check is generally much cheaper than the "enable
user accesses" code that the __{get,put}_user() functions still need.
So instead of __put_user(), use the unsafe_put_user() interface with
user_access_{begin,end}() that really does generate better code these
days, and which is generally a nicer interface. Under some loads, the
multiple user writes that filldir() does are actually quite noticeable.
This also makes the dirent name copy use unsafe_put_user() with a couple
of macros. We do not want to make function calls with SMAP/PAN
disabled, and the code this generates is quite good when the
architecture uses "asm goto" for unsafe_put_user() like x86 does.
Note that this doesn't bother with the legacy cases. Nobody should use
them anyway, so performance doesn't really matter there.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-22 04:59:07 +00:00
|
|
|
unsafe_put_user(ino, &dirent->d_ino, efault_end);
|
|
|
|
unsafe_put_user(reclen, &dirent->d_reclen, efault_end);
|
|
|
|
unsafe_put_user(d_type, &dirent->d_type, efault_end);
|
|
|
|
unsafe_copy_dirent_name(dirent->d_name, name, namlen, efault_end);
|
2020-04-03 07:20:51 +00:00
|
|
|
user_write_access_end();
|
Convert filldir[64]() from __put_user() to unsafe_put_user()
We really should avoid the "__{get,put}_user()" functions entirely,
because they can easily be mis-used and the original intent of being
used for simple direct user accesses no longer holds in a post-SMAP/PAN
world.
Manually optimizing away the user access range check makes no sense any
more, when the range check is generally much cheaper than the "enable
user accesses" code that the __{get,put}_user() functions still need.
So instead of __put_user(), use the unsafe_put_user() interface with
user_access_{begin,end}() that really does generate better code these
days, and which is generally a nicer interface. Under some loads, the
multiple user writes that filldir() does are actually quite noticeable.
This also makes the dirent name copy use unsafe_put_user() with a couple
of macros. We do not want to make function calls with SMAP/PAN
disabled, and the code this generates is quite good when the
architecture uses "asm goto" for unsafe_put_user() like x86 does.
Note that this doesn't bother with the legacy cases. Nobody should use
them anyway, so performance doesn't really matter there.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-22 04:59:07 +00:00
|
|
|
|
readdir: make user_access_begin() use the real access range
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I changed filldir to not do individual __put_user()
accesses, but instead use unsafe_put_user() surrounded by the proper
user_access_begin/end() pair.
That make them enormously faster on modern x86, where the STAC/CLAC
games make individual user accesses fairly heavy-weight.
However, the user_access_begin() range was not really the exact right
one, since filldir() has the unfortunate problem that it needs to not
only fill out the new directory entry, it also needs to fix up the
previous one to contain the proper file offset.
It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
file offset of the directory entry itself - it's the offset of the next
one. So we end up backfilling the offset in the previous entry as we
walk along.
But since x86 didn't really care about the exact range, and used to be
the only architecture that did anything fancy in user_access_begin() to
begin with, the filldir[64]() changes did something lazy, and even
commented on it:
/*
* Note! This range-checks 'previous' (which may be NULL).
* The real range was checked in getdents
*/
if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;
and it all worked fine.
But now 32-bit ppc is starting to also implement user_access_begin(),
and the fact that we faked the range to only be the (possibly not even
valid) previous directory entry becomes a problem, because ppc32 will
actually be using the range that is passed in for more than just "check
that it's user space".
This is a complete rewrite of Christophe's original patch.
By saving off the record length of the previous entry instead of a
pointer to it in the filldir data structures, we can simplify the range
check and the writing of the previous entry d_off field. No need for
any conditionals in the user accesses themselves, although we retain the
conditional EINTR checking for the "was this the first directory entry"
signal handling latency logic.
Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
Reported-and-tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-22 20:37:25 +00:00
|
|
|
buf->prev_reclen = reclen;
|
|
|
|
buf->current_dir = (void __user *)dirent + reclen;
|
2005-04-16 22:20:36 +00:00
|
|
|
buf->count -= reclen;
|
2022-08-16 15:57:56 +00:00
|
|
|
return true;
|
readdir: make user_access_begin() use the real access range
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I changed filldir to not do individual __put_user()
accesses, but instead use unsafe_put_user() surrounded by the proper
user_access_begin/end() pair.
That make them enormously faster on modern x86, where the STAC/CLAC
games make individual user accesses fairly heavy-weight.
However, the user_access_begin() range was not really the exact right
one, since filldir() has the unfortunate problem that it needs to not
only fill out the new directory entry, it also needs to fix up the
previous one to contain the proper file offset.
It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
file offset of the directory entry itself - it's the offset of the next
one. So we end up backfilling the offset in the previous entry as we
walk along.
But since x86 didn't really care about the exact range, and used to be
the only architecture that did anything fancy in user_access_begin() to
begin with, the filldir[64]() changes did something lazy, and even
commented on it:
/*
* Note! This range-checks 'previous' (which may be NULL).
* The real range was checked in getdents
*/
if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;
and it all worked fine.
But now 32-bit ppc is starting to also implement user_access_begin(),
and the fact that we faked the range to only be the (possibly not even
valid) previous directory entry becomes a problem, because ppc32 will
actually be using the range that is passed in for more than just "check
that it's user space".
This is a complete rewrite of Christophe's original patch.
By saving off the record length of the previous entry instead of a
pointer to it in the filldir data structures, we can simplify the range
check and the writing of the previous entry d_off field. No need for
any conditionals in the user accesses themselves, although we retain the
conditional EINTR checking for the "was this the first directory entry"
signal handling latency logic.
Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
Reported-and-tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-22 20:37:25 +00:00
|
|
|
|
Convert filldir[64]() from __put_user() to unsafe_put_user()
We really should avoid the "__{get,put}_user()" functions entirely,
because they can easily be mis-used and the original intent of being
used for simple direct user accesses no longer holds in a post-SMAP/PAN
world.
Manually optimizing away the user access range check makes no sense any
more, when the range check is generally much cheaper than the "enable
user accesses" code that the __{get,put}_user() functions still need.
So instead of __put_user(), use the unsafe_put_user() interface with
user_access_{begin,end}() that really does generate better code these
days, and which is generally a nicer interface. Under some loads, the
multiple user writes that filldir() does are actually quite noticeable.
This also makes the dirent name copy use unsafe_put_user() with a couple
of macros. We do not want to make function calls with SMAP/PAN
disabled, and the code this generates is quite good when the
architecture uses "asm goto" for unsafe_put_user() like x86 does.
Note that this doesn't bother with the legacy cases. Nobody should use
them anyway, so performance doesn't really matter there.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-22 04:59:07 +00:00
|
|
|
efault_end:
|
2020-04-03 07:20:51 +00:00
|
|
|
user_write_access_end();
|
2005-04-16 22:20:36 +00:00
|
|
|
efault:
|
|
|
|
buf->error = -EFAULT;
|
2022-08-16 15:57:56 +00:00
|
|
|
return false;
|
2005-04-16 22:20:36 +00:00
|
|
|
}
|
|
|
|
|
2020-07-14 07:02:07 +00:00
|
|
|
SYSCALL_DEFINE3(getdents64, unsigned int, fd,
|
|
|
|
struct linux_dirent64 __user *, dirent, unsigned int, count)
|
2005-04-16 22:20:36 +00:00
|
|
|
{
|
2012-08-28 16:52:22 +00:00
|
|
|
struct fd f;
|
2013-05-23 02:22:04 +00:00
|
|
|
struct getdents_callback64 buf = {
|
|
|
|
.ctx.actor = filldir64,
|
|
|
|
.count = count,
|
|
|
|
.current_dir = dirent
|
|
|
|
};
|
2005-04-16 22:20:36 +00:00
|
|
|
int error;
|
|
|
|
|
2016-04-20 21:08:21 +00:00
|
|
|
f = fdget_pos(fd);
|
2024-05-31 18:12:01 +00:00
|
|
|
if (!fd_file(f))
|
2012-04-21 22:40:32 +00:00
|
|
|
return -EBADF;
|
2005-04-16 22:20:36 +00:00
|
|
|
|
2024-05-31 18:12:01 +00:00
|
|
|
error = iterate_dir(fd_file(f), &buf.ctx);
|
2008-08-24 11:29:52 +00:00
|
|
|
if (error >= 0)
|
|
|
|
error = buf.error;
|
readdir: make user_access_begin() use the real access range
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I changed filldir to not do individual __put_user()
accesses, but instead use unsafe_put_user() surrounded by the proper
user_access_begin/end() pair.
That make them enormously faster on modern x86, where the STAC/CLAC
games make individual user accesses fairly heavy-weight.
However, the user_access_begin() range was not really the exact right
one, since filldir() has the unfortunate problem that it needs to not
only fill out the new directory entry, it also needs to fix up the
previous one to contain the proper file offset.
It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
file offset of the directory entry itself - it's the offset of the next
one. So we end up backfilling the offset in the previous entry as we
walk along.
But since x86 didn't really care about the exact range, and used to be
the only architecture that did anything fancy in user_access_begin() to
begin with, the filldir[64]() changes did something lazy, and even
commented on it:
/*
* Note! This range-checks 'previous' (which may be NULL).
* The real range was checked in getdents
*/
if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;
and it all worked fine.
But now 32-bit ppc is starting to also implement user_access_begin(),
and the fact that we faked the range to only be the (possibly not even
valid) previous directory entry becomes a problem, because ppc32 will
actually be using the range that is passed in for more than just "check
that it's user space".
This is a complete rewrite of Christophe's original patch.
By saving off the record length of the previous entry instead of a
pointer to it in the filldir data structures, we can simplify the range
check and the writing of the previous entry d_off field. No need for
any conditionals in the user accesses themselves, although we retain the
conditional EINTR checking for the "was this the first directory entry"
signal handling latency logic.
Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
Reported-and-tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-22 20:37:25 +00:00
|
|
|
if (buf.prev_reclen) {
|
|
|
|
struct linux_dirent64 __user * lastdirent;
|
2013-05-15 22:49:12 +00:00
|
|
|
typeof(lastdirent->d_off) d_off = buf.ctx.pos;
|
readdir: make user_access_begin() use the real access range
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I changed filldir to not do individual __put_user()
accesses, but instead use unsafe_put_user() surrounded by the proper
user_access_begin/end() pair.
That make them enormously faster on modern x86, where the STAC/CLAC
games make individual user accesses fairly heavy-weight.
However, the user_access_begin() range was not really the exact right
one, since filldir() has the unfortunate problem that it needs to not
only fill out the new directory entry, it also needs to fix up the
previous one to contain the proper file offset.
It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
file offset of the directory entry itself - it's the offset of the next
one. So we end up backfilling the offset in the previous entry as we
walk along.
But since x86 didn't really care about the exact range, and used to be
the only architecture that did anything fancy in user_access_begin() to
begin with, the filldir[64]() changes did something lazy, and even
commented on it:
/*
* Note! This range-checks 'previous' (which may be NULL).
* The real range was checked in getdents
*/
if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;
and it all worked fine.
But now 32-bit ppc is starting to also implement user_access_begin(),
and the fact that we faked the range to only be the (possibly not even
valid) previous directory entry becomes a problem, because ppc32 will
actually be using the range that is passed in for more than just "check
that it's user space".
This is a complete rewrite of Christophe's original patch.
By saving off the record length of the previous entry instead of a
pointer to it in the filldir data structures, we can simplify the range
check and the writing of the previous entry d_off field. No need for
any conditionals in the user accesses themselves, although we retain the
conditional EINTR checking for the "was this the first directory entry"
signal handling latency logic.
Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
Reported-and-tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-22 20:37:25 +00:00
|
|
|
|
|
|
|
lastdirent = (void __user *) buf.current_dir - buf.prev_reclen;
|
2020-02-19 03:34:07 +00:00
|
|
|
if (put_user(d_off, &lastdirent->d_off))
|
2008-08-24 11:29:52 +00:00
|
|
|
error = -EFAULT;
|
|
|
|
else
|
|
|
|
error = count - buf.count;
|
2005-04-16 22:20:36 +00:00
|
|
|
}
|
2016-04-20 21:08:21 +00:00
|
|
|
fdput_pos(f);
|
2005-04-16 22:20:36 +00:00
|
|
|
return error;
|
|
|
|
}
|
2017-04-08 22:10:08 +00:00
|
|
|
|
|
|
|
#ifdef CONFIG_COMPAT
|
|
|
|
struct compat_old_linux_dirent {
|
|
|
|
compat_ulong_t d_ino;
|
|
|
|
compat_ulong_t d_offset;
|
|
|
|
unsigned short d_namlen;
|
2023-06-20 17:30:36 +00:00
|
|
|
char d_name[];
|
2017-04-08 22:10:08 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
struct compat_readdir_callback {
|
|
|
|
struct dir_context ctx;
|
|
|
|
struct compat_old_linux_dirent __user *dirent;
|
|
|
|
int result;
|
|
|
|
};
|
|
|
|
|
2022-08-16 15:57:56 +00:00
|
|
|
static bool compat_fillonedir(struct dir_context *ctx, const char *name,
|
2017-04-08 22:10:08 +00:00
|
|
|
int namlen, loff_t offset, u64 ino,
|
|
|
|
unsigned int d_type)
|
|
|
|
{
|
|
|
|
struct compat_readdir_callback *buf =
|
|
|
|
container_of(ctx, struct compat_readdir_callback, ctx);
|
|
|
|
struct compat_old_linux_dirent __user *dirent;
|
|
|
|
compat_ulong_t d_ino;
|
|
|
|
|
|
|
|
if (buf->result)
|
2022-08-16 15:57:56 +00:00
|
|
|
return false;
|
2021-04-17 16:27:04 +00:00
|
|
|
buf->result = verify_dirent_name(name, namlen);
|
2022-08-16 15:57:56 +00:00
|
|
|
if (buf->result)
|
|
|
|
return false;
|
2017-04-08 22:10:08 +00:00
|
|
|
d_ino = ino;
|
|
|
|
if (sizeof(d_ino) < sizeof(ino) && d_ino != ino) {
|
|
|
|
buf->result = -EOVERFLOW;
|
2022-08-16 15:57:56 +00:00
|
|
|
return false;
|
2017-04-08 22:10:08 +00:00
|
|
|
}
|
|
|
|
buf->result++;
|
|
|
|
dirent = buf->dirent;
|
2020-02-18 19:39:56 +00:00
|
|
|
if (!user_write_access_begin(dirent,
|
2017-04-08 22:10:08 +00:00
|
|
|
(unsigned long)(dirent->d_name + namlen + 1) -
|
|
|
|
(unsigned long)dirent))
|
|
|
|
goto efault;
|
2020-02-18 19:39:56 +00:00
|
|
|
unsafe_put_user(d_ino, &dirent->d_ino, efault_end);
|
|
|
|
unsafe_put_user(offset, &dirent->d_offset, efault_end);
|
|
|
|
unsafe_put_user(namlen, &dirent->d_namlen, efault_end);
|
|
|
|
unsafe_copy_dirent_name(dirent->d_name, name, namlen, efault_end);
|
|
|
|
user_write_access_end();
|
2022-08-16 15:57:56 +00:00
|
|
|
return true;
|
2020-02-18 19:39:56 +00:00
|
|
|
efault_end:
|
|
|
|
user_write_access_end();
|
2017-04-08 22:10:08 +00:00
|
|
|
efault:
|
|
|
|
buf->result = -EFAULT;
|
2022-08-16 15:57:56 +00:00
|
|
|
return false;
|
2017-04-08 22:10:08 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
COMPAT_SYSCALL_DEFINE3(old_readdir, unsigned int, fd,
|
|
|
|
struct compat_old_linux_dirent __user *, dirent, unsigned int, count)
|
|
|
|
{
|
|
|
|
int error;
|
|
|
|
struct fd f = fdget_pos(fd);
|
|
|
|
struct compat_readdir_callback buf = {
|
|
|
|
.ctx.actor = compat_fillonedir,
|
|
|
|
.dirent = dirent
|
|
|
|
};
|
|
|
|
|
2024-05-31 18:12:01 +00:00
|
|
|
if (!fd_file(f))
|
2017-04-08 22:10:08 +00:00
|
|
|
return -EBADF;
|
|
|
|
|
2024-05-31 18:12:01 +00:00
|
|
|
error = iterate_dir(fd_file(f), &buf.ctx);
|
2017-04-08 22:10:08 +00:00
|
|
|
if (buf.result)
|
|
|
|
error = buf.result;
|
|
|
|
|
|
|
|
fdput_pos(f);
|
|
|
|
return error;
|
|
|
|
}
|
|
|
|
|
|
|
|
struct compat_linux_dirent {
|
|
|
|
compat_ulong_t d_ino;
|
|
|
|
compat_ulong_t d_off;
|
|
|
|
unsigned short d_reclen;
|
2023-06-20 17:30:36 +00:00
|
|
|
char d_name[];
|
2017-04-08 22:10:08 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
struct compat_getdents_callback {
|
|
|
|
struct dir_context ctx;
|
|
|
|
struct compat_linux_dirent __user *current_dir;
|
2020-02-19 03:33:09 +00:00
|
|
|
int prev_reclen;
|
2017-04-08 22:10:08 +00:00
|
|
|
int count;
|
|
|
|
int error;
|
|
|
|
};
|
|
|
|
|
2022-08-16 15:57:56 +00:00
|
|
|
static bool compat_filldir(struct dir_context *ctx, const char *name, int namlen,
|
2017-04-08 22:10:08 +00:00
|
|
|
loff_t offset, u64 ino, unsigned int d_type)
|
|
|
|
{
|
2020-02-19 03:33:09 +00:00
|
|
|
struct compat_linux_dirent __user *dirent, *prev;
|
2017-04-08 22:10:08 +00:00
|
|
|
struct compat_getdents_callback *buf =
|
|
|
|
container_of(ctx, struct compat_getdents_callback, ctx);
|
|
|
|
compat_ulong_t d_ino;
|
|
|
|
int reclen = ALIGN(offsetof(struct compat_linux_dirent, d_name) +
|
|
|
|
namlen + 2, sizeof(compat_long_t));
|
2020-02-19 03:33:09 +00:00
|
|
|
int prev_reclen;
|
2017-04-08 22:10:08 +00:00
|
|
|
|
2020-02-19 03:33:09 +00:00
|
|
|
buf->error = verify_dirent_name(name, namlen);
|
|
|
|
if (unlikely(buf->error))
|
2022-08-16 15:57:56 +00:00
|
|
|
return false;
|
2017-04-08 22:10:08 +00:00
|
|
|
buf->error = -EINVAL; /* only used if we fail.. */
|
|
|
|
if (reclen > buf->count)
|
2022-08-16 15:57:56 +00:00
|
|
|
return false;
|
2017-04-08 22:10:08 +00:00
|
|
|
d_ino = ino;
|
|
|
|
if (sizeof(d_ino) < sizeof(ino) && d_ino != ino) {
|
|
|
|
buf->error = -EOVERFLOW;
|
2022-08-16 15:57:56 +00:00
|
|
|
return false;
|
2017-04-08 22:10:08 +00:00
|
|
|
}
|
2020-02-19 03:33:09 +00:00
|
|
|
prev_reclen = buf->prev_reclen;
|
|
|
|
if (prev_reclen && signal_pending(current))
|
2022-08-16 15:57:56 +00:00
|
|
|
return false;
|
2017-04-08 22:10:08 +00:00
|
|
|
dirent = buf->current_dir;
|
2020-02-19 03:33:09 +00:00
|
|
|
prev = (void __user *) dirent - prev_reclen;
|
|
|
|
if (!user_write_access_begin(prev, reclen + prev_reclen))
|
2017-04-08 22:10:08 +00:00
|
|
|
goto efault;
|
2020-02-19 03:33:09 +00:00
|
|
|
|
|
|
|
unsafe_put_user(offset, &prev->d_off, efault_end);
|
|
|
|
unsafe_put_user(d_ino, &dirent->d_ino, efault_end);
|
|
|
|
unsafe_put_user(reclen, &dirent->d_reclen, efault_end);
|
|
|
|
unsafe_put_user(d_type, (char __user *) dirent + reclen - 1, efault_end);
|
|
|
|
unsafe_copy_dirent_name(dirent->d_name, name, namlen, efault_end);
|
|
|
|
user_write_access_end();
|
|
|
|
|
|
|
|
buf->prev_reclen = reclen;
|
|
|
|
buf->current_dir = (void __user *)dirent + reclen;
|
2017-04-08 22:10:08 +00:00
|
|
|
buf->count -= reclen;
|
2022-08-16 15:57:56 +00:00
|
|
|
return true;
|
2020-02-19 03:33:09 +00:00
|
|
|
efault_end:
|
|
|
|
user_write_access_end();
|
2017-04-08 22:10:08 +00:00
|
|
|
efault:
|
|
|
|
buf->error = -EFAULT;
|
2022-08-16 15:57:56 +00:00
|
|
|
return false;
|
2017-04-08 22:10:08 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
COMPAT_SYSCALL_DEFINE3(getdents, unsigned int, fd,
|
|
|
|
struct compat_linux_dirent __user *, dirent, unsigned int, count)
|
|
|
|
{
|
|
|
|
struct fd f;
|
|
|
|
struct compat_getdents_callback buf = {
|
|
|
|
.ctx.actor = compat_filldir,
|
|
|
|
.current_dir = dirent,
|
|
|
|
.count = count
|
|
|
|
};
|
|
|
|
int error;
|
|
|
|
|
|
|
|
f = fdget_pos(fd);
|
2024-05-31 18:12:01 +00:00
|
|
|
if (!fd_file(f))
|
2017-04-08 22:10:08 +00:00
|
|
|
return -EBADF;
|
|
|
|
|
2024-05-31 18:12:01 +00:00
|
|
|
error = iterate_dir(fd_file(f), &buf.ctx);
|
2017-04-08 22:10:08 +00:00
|
|
|
if (error >= 0)
|
|
|
|
error = buf.error;
|
2020-02-19 03:33:09 +00:00
|
|
|
if (buf.prev_reclen) {
|
|
|
|
struct compat_linux_dirent __user * lastdirent;
|
|
|
|
lastdirent = (void __user *)buf.current_dir - buf.prev_reclen;
|
|
|
|
|
2017-04-08 22:10:08 +00:00
|
|
|
if (put_user(buf.ctx.pos, &lastdirent->d_off))
|
|
|
|
error = -EFAULT;
|
|
|
|
else
|
|
|
|
error = count - buf.count;
|
|
|
|
}
|
|
|
|
fdput_pos(f);
|
|
|
|
return error;
|
|
|
|
}
|
|
|
|
#endif
|