afs_listxattr() lists all the available special afs xattrs (i.e. those in
the "afs.*" space), no matter what type of server we're dealing with. But
OpenAFS servers, for example, cannot deal with some of the extra-capable
attributes that AuriStor (YFS) servers provide. Unfortunately, the
presence of the afs.yfs.* attributes causes errors[1] for anything that
tries to read them if the server is of the wrong type.
Fix the problem by removing afs_listxattr() so that none of the special
xattrs are listed (AFS doesn't support xattrs). It does mean, however,
that getfattr won't list them, though they can still be accessed with
getxattr() and setxattr().
This can be tested with something like:
getfattr -d -m ".*" /afs/example.com/path/to/file
With this change, none of the afs.* attributes should be visible.
Changes:
ver #2:
- Hide all of the afs.* xattrs, not just the ACL ones.
Fixes: ae46578b96 ("afs: Get YFS ACLs and information through xattrs")
Reported-by: Gaja Sophie Peters <gaja.peters@math.uni-hamburg.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Gaja Sophie Peters <gaja.peters@math.uni-hamburg.de>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003502.html [1]
Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003567.html # v1
Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003573.html # v2
If someone attempts to access YFS-related xattrs (e.g. afs.yfs.acl) on a
file on a non-YFS AFS server (such as OpenAFS), then the kernel will jump
to a NULL function pointer because the afs_fetch_acl_operation descriptor
doesn't point to a function for issuing an operation on a non-YFS
server[1].
Fix this by making afs_wait_for_operation() check that the issue_afs_rpc
method is set before jumping to it and setting -ENOTSUPP if not. This fix
also covers other potential operations that also only exist on YFS servers.
afs_xattr_get/set_yfs() then need to translate -ENOTSUPP to -ENODATA as the
former error is internal to the kernel.
The bug shows up as an oops like the following:
BUG: kernel NULL pointer dereference, address: 0000000000000000
[...]
Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[...]
Call Trace:
afs_wait_for_operation+0x83/0x1b0 [kafs]
afs_xattr_get_yfs+0xe6/0x270 [kafs]
__vfs_getxattr+0x59/0x80
vfs_getxattr+0x11c/0x140
getxattr+0x181/0x250
? __check_object_size+0x13f/0x150
? __fput+0x16d/0x250
__x64_sys_fgetxattr+0x64/0xb0
do_syscall_64+0x49/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fb120a9defe
This was triggered with "cp -a" which attempts to copy xattrs, including
afs ones, but is easier to reproduce with getfattr, e.g.:
getfattr -d -m ".*" /afs/openafs.org/
Fixes: e49c7b2f6d ("afs: Build an abstraction around an "operation" concept")
Reported-by: Gaja Sophie Peters <gaja.peters@math.uni-hamburg.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Gaja Sophie Peters <gaja.peters@math.uni-hamburg.de>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003498.html [1]
Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003566.html # v1
Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003572.html # v2
The posix acl permission checking helpers determine whether a caller is
privileged over an inode according to the acls associated with the
inode. Add helpers that make it possible to handle acls on idmapped
mounts.
The vfs and the filesystems targeted by this first iteration make use of
posix_acl_fix_xattr_from_user() and posix_acl_fix_xattr_to_user() to
translate basic posix access and default permissions such as the
ACL_USER and ACL_GROUP type according to the initial user namespace (or
the superblock's user namespace) to and from the caller's current user
namespace. Adapt these two helpers to handle idmapped mounts whereby we
either map from or into the mount's user namespace depending on in which
direction we're translating.
Similarly, cap_convert_nscap() is used by the vfs to translate user
namespace and non-user namespace aware filesystem capabilities from the
superblock's user namespace to the caller's user namespace. Enable it to
handle idmapped mounts by accounting for the mount's user namespace.
In addition the fileystems targeted in the first iteration of this patch
series make use of the posix_acl_chmod() and, posix_acl_update_mode()
helpers. Both helpers perform permission checks on the target inode. Let
them handle idmapped mounts. These two helpers are called when posix
acls are set by the respective filesystems to handle this case we extend
the ->set() method to take an additional user namespace argument to pass
the mount's user namespace down.
Link: https://lore.kernel.org/r/20210121131959.646623-9-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
The cleanup for the yfs_store_opaque_acl2_operation calls the wrong
function to destroy the ACL content buffer. It's an afs_acl struct, not
a yfs_acl struct - and the free function for latter may pass invalid
pointers to kfree().
Fix this by using the afs_acl_put() function. The yfs_acl_put()
function is then no longer used and can be removed.
general protection fault, probably for non-canonical address 0x7ebde00000000: 0000 [#1] SMP PTI
...
RIP: 0010:compound_head+0x0/0x11
...
Call Trace:
virt_to_cache+0x8/0x51
kfree+0x5d/0x79
yfs_free_opaque_acl+0x16/0x29
afs_put_operation+0x60/0x114
__vfs_setxattr+0x67/0x72
__vfs_setxattr_noperm+0x66/0xe9
vfs_setxattr+0x67/0xce
setxattr+0x14e/0x184
__do_sys_fsetxattr+0x66/0x8f
do_syscall_64+0x2d/0x3a
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: e49c7b2f6d ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The "op" pointer is freed earlier when we call afs_put_operation().
Fixes: e49c7b2f6d ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Colin Ian King <colin.king@canonical.com>
Turn the afs_operation struct into the main way that most fileserver
operations are managed. Various things are added to the struct, including
the following:
(1) All the parameters and results of the relevant operations are moved
into it, removing corresponding fields from the afs_call struct.
afs_call gets a pointer to the op.
(2) The target volume is made the main focus of the operation, rather than
the target vnode(s), and a bunch of op->vnode->volume are made
op->volume instead.
(3) Two vnode records are defined (op->file[]) for the vnode(s) involved
in most operations. The vnode record (struct afs_vnode_param)
contains:
- The vnode pointer.
- The fid of the vnode to be included in the parameters or that was
returned in the reply (eg. FS.MakeDir).
- The status and callback information that may be returned in the
reply about the vnode.
- Callback break and data version tracking for detecting
simultaneous third-parth changes.
(4) Pointers to dentries to be updated with new inodes.
(5) An operations table pointer. The table includes pointers to functions
for issuing AFS and YFS-variant RPCs, handling the success and abort
of an operation and handling post-I/O-lock local editing of a
directory.
To make this work, the following function restructuring is made:
(A) The rotation loop that issues calls to fileservers that can be found
in each function that wants to issue an RPC (such as afs_mkdir()) is
extracted out into common code, in a new file called fs_operation.c.
(B) The rotation loops, such as the one in afs_mkdir(), are replaced with
a much smaller piece of code that allocates an operation, sets the
parameters and then calls out to the common code to do the actual
work.
(C) The code for handling the success and failure of an operation are
moved into operation functions (as (5) above) and these are called
from the core code at appropriate times.
(D) The pseudo inode getting stuff used by the dynamic root code is moved
over into dynroot.c.
(E) struct afs_iget_data is absorbed into the operation struct and
afs_iget() expects to be given an op pointer and a vnode record.
(F) Point (E) doesn't work for the root dir of a volume, but we know the
FID in advance (it's always vnode 1, unique 1), so a separate inode
getter, afs_root_iget(), is provided to special-case that.
(G) The inode status init/update functions now also take an op and a vnode
record.
(H) The RPC marshalling functions now, for the most part, just take an
afs_operation struct as their only argument. All the data they need
is held there. The result delivery functions write their answers
there as well.
(I) The call is attached to the operation and then the operation core does
the waiting.
And then the new operation code is, for the moment, made to just initialise
the operation, get the appropriate vnode I/O locks and do the same rotation
loop as before.
This lays the foundation for the following changes in the future:
(*) Overhauling the rotation (again).
(*) Support for asynchronous I/O, where the fileserver rotation must be
done asynchronously also.
Signed-off-by: David Howells <dhowells@redhat.com>
As a prelude to implementing asynchronous fileserver operations in the afs
filesystem, rename struct afs_fs_cursor to afs_operation.
This struct is going to form the core of the operation management and is
going to acquire more members in later.
Signed-off-by: David Howells <dhowells@redhat.com>
sprintf and snprintf are fragile in future maintenance, switch to
using scnprintf to ensure no accidental Use After Free conditions
are introduced.
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: David Howells <dhowells@redhat.com>
Cc: linux-afs@lists.infradead.org
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: David Howells <dhowells@redhat.com>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public licence as published by
the free software foundation either version 2 of the licence or at
your option any later version
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 114 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190520170857.552531963@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When applying the status and callback in the response of an operation,
apply them in the same critical section so that there's no race between
checking the callback state and checking status-dependent state (such as
the data version).
Fix this by:
(1) Allocating a joint {status,callback} record (afs_status_cb) before
calling the RPC function for each vnode for which the RPC reply
contains a status or a status plus a callback. A flag is set in the
record to indicate if a callback was actually received.
(2) These records are passed into the RPC functions to be filled in. The
afs_decode_status() and yfs_decode_status() functions are removed and
the cb_lock is no longer taken.
(3) xdr_decode_AFSFetchStatus() and xdr_decode_YFSFetchStatus() no longer
update the vnode.
(4) xdr_decode_AFSCallBack() and xdr_decode_YFSCallBack() no longer update
the vnode.
(5) vnodes, expected data-version numbers and callback break counters
(cb_break) no longer need to be passed to the reply delivery
functions.
Note that, for the moment, the file locking functions still need
access to both the call and the vnode at the same time.
(6) afs_vnode_commit_status() is now given the cb_break value and the
expected data_version and the task of applying the status and the
callback to the vnode are now done here.
This is done under a single taking of vnode->cb_lock.
(7) afs_pages_written_back() is now called by afs_store_data() rather than
by the reply delivery function.
afs_pages_written_back() has been moved to before the call point and
is now given the first and last page numbers rather than a pointer to
the call.
(8) The indicator from YFS.RemoveFile2 as to whether the target file
actually got removed (status.abort_code == VNOVNODE) rather than
merely dropping a link is now checked in afs_unlink rather than in
xdr_decode_YFSFetchStatus().
Supplementary fixes:
(*) afs_cache_permit() now gets the caller_access mask from the
afs_status_cb object rather than picking it out of the vnode's status
record. afs_fetch_status() returns caller_access through its argument
list for this purpose also.
(*) afs_inode_init_from_status() now uses a write lock on cb_lock rather
than a read lock and now sets the callback inside the same critical
section.
Fixes: c435ee3455 ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
Make certain RPC operations non-interruptible, including:
(*) Set attributes
(*) Store data
We don't want to get interrupted during a flush on close, flush on
unlock, writeback or an inode update, leaving us in a state where we
still need to do the writeback or update.
(*) Extend lock
(*) Release lock
We don't want to get lock extension interrupted as the file locks on
the server are time-limited. Interruption during lock release is less
of an issue since the lock is time-limited, but it's better to
complete the release to avoid a several-minute wait to recover it.
*Setting* the lock isn't a problem if it's interrupted since we can
just return to the user and tell them they were interrupted - at
which point they can elect to retry.
(*) Silly unlink
We want to remove silly unlink files if we can, rather than leaving
them for the salvager to clear up.
Note that whilst these calls are no longer interruptible, they do have
timeouts on them, so if the server stops responding the call will fail with
something like ETIME or ECONNRESET.
Without this, the following:
kAFS: Unexpected error from FS.StoreData -512
appears in dmesg when a pending store data gets interrupted and some
processes may just hang.
Additionally, make the code that checks/updates the server record ignore
failure due to interruption if the main call is uninterruptible and if the
server has an address list. The next op will check it again since the
expiration time on the old list has past.
Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Jonathan Billings <jsbillings@jsbillings.org>
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
afs_xattr_get_yfs() tries to free yacl, which may hold an error value (say
if yfs_fs_fetch_opaque_acl() failed and returned an error).
Fix this by allocating yacl up front (since it's a fixed-length struct,
unlike afs_acl) and passing it in to the RPC function. This also allows
the flags to be placed in the object rather than passing them through to
the RPC function.
Fixes: ae46578b96 ("afs: Get YFS ACLs and information through xattrs")
Signed-off-by: David Howells <dhowells@redhat.com>
Fix incorrect error handling in afs_xattr_get_acl() where there appears to
be a redundant assignment before return, but in fact the return should be a
goto to the error handling at the end of the function.
Fixes: 260f082bae ("afs: Get an AFS3 ACL as an xattr")
Addresses-Coverity: ("Unused Value")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Joe Perches <joe@perches.com>
Implement the setting of YFS ACLs in AFS through the interface of setting
the afs.yfs.acl extended attribute on the file.
Signed-off-by: David Howells <dhowells@redhat.com>
The YFS/AuriStor variant of AFS provides more capable ACLs and provides
per-volume ACLs and per-file ACLs as well as per-directory ACLs. It also
provides some extra information that can be retrieved through four ACLs:
(1) afs.yfs.acl
The YFS file ACL (not the same format as afs.acl).
(2) afs.yfs.vol_acl
The YFS volume ACL.
(3) afs.yfs.acl_inherited
"1" if a file's ACL is inherited from its parent directory, "0"
otherwise.
(4) afs.yfs.acl_num_cleaned
The number of of ACEs removed from the ACL by the server because the
PT entries were removed from the PTS database (ie. the subject is no
longer known).
Signed-off-by: David Howells <dhowells@redhat.com>
Implements the setting of ACLs in AFS by means of setting the
afs.acl extended attribute on the file.
Signed-off-by: Joe Gorse <jhgorse@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Implement an xattr on AFS files called "afs.acl" that retrieves a file's
ACL. It returns the raw AFS3 ACL from the result of calling FS.FetchACL,
leaving any interpretation to userspace.
Note that whilst YFS servers will respond to FS.FetchACL, this will render
a more-advanced YFS ACL down. Use "afs.yfs.acl" instead for that.
Signed-off-by: David Howells <dhowells@redhat.com>
The AFS3 FID is three 32-bit unsigned numbers and is represented as three
up-to-8-hex-digit numbers separated by colons to the afs.fid xattr.
However, with the advent of support for YFS, the FID is now a 64-bit volume
number, a 96-bit vnode/inode number and a 32-bit uniquifier (as before).
Whilst the sprintf in afs_xattr_get_fid() has been partially updated (it
currently ignores the upper 32 bits of the 96-bit vnode number), the size
of the stack-based buffer has not been increased to match, thereby allowing
stack corruption to occur.
Fix this by increasing the buffer size appropriately and conditionally
including the upper part of the vnode number if it is non-zero. The latter
requires the lower part to be zero-padded if the upper part is non-zero.
Fixes: 3b6492df41 ("afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS")
Signed-off-by: David Howells <dhowells@redhat.com>
Fix the ->get handlers for the afs.cell and afs.volume xattrs to pass the
source data size to memcpy() rather than target buffer size.
Overcopying the source data occasionally causes the kernel to oops.
Fixes: d3e3b7eac8 ("afs: Add metadata xattrs")
Signed-off-by: David Howells <dhowells@redhat.com>
Increase the sizes of the volume ID to 64 bits and the vnode ID (inode
number equivalent) to 96 bits to allow the support of YFS.
This requires the iget comparator to check the vnode->fid rather than i_ino
and i_generation as i_ino is not sufficiently capacious. It also requires
this data to be placed into the vnode cache key for fscache.
For the moment, just discard the top 32 bits of the vnode ID when returning
it though stat.
Signed-off-by: David Howells <dhowells@redhat.com>
The current code assumes that volumes and servers are per-cell and are
never shared, but this is not enforced, and, indeed, public cells do exist
that are aliases of each other. Further, an organisation can, say, set up
a public cell and a private cell with overlapping, but not identical, sets
of servers. The difference is purely in the database attached to the VL
servers.
The current code will malfunction if it sees a server in two cells as it
assumes global address -> server record mappings and that each server is in
just one cell.
Further, each server may have multiple addresses - and may have addresses
of different families (IPv4 and IPv6, say).
To this end, the following structural changes are made:
(1) Server record management is overhauled:
(a) Server records are made independent of cell. The namespace keeps
track of them, volume records have lists of them and each vnode
has a server on which its callback interest currently resides.
(b) The cell record no longer keeps a list of servers known to be in
that cell.
(c) The server records are now kept in a flat list because there's no
single address to sort on.
(d) Server records are now keyed by their UUID within the namespace.
(e) The addresses for a server are obtained with the VL.GetAddrsU
rather than with VL.GetEntryByName, using the server's UUID as a
parameter.
(f) Cached server records are garbage collected after a period of
non-use and are counted out of existence before purging is allowed
to complete. This protects the work functions against rmmod.
(g) The servers list is now in /proc/fs/afs/servers.
(2) Volume record management is overhauled:
(a) An RCU-replaceable server list is introduced. This tracks both
servers and their coresponding callback interests.
(b) The superblock is now keyed on cell record and numeric volume ID.
(c) The volume record is now tied to the superblock which mounts it,
and is activated when mounted and deactivated when unmounted.
This makes it easier to handle the cache cookie without causing a
double-use in fscache.
(d) The volume record is loaded from the VLDB using VL.GetEntryByNameU
to get the server UUID list.
(e) The volume name is updated if it is seen to have changed when the
volume is updated (the update is keyed on the volume ID).
(3) The vlocation record is got rid of and VLDB records are no longer
cached. Sufficient information is stored in the volume record, though
an update to a volume record is now no longer shared between related
volumes (volumes come in bundles of three: R/W, R/O and backup).
and the following procedural changes are made:
(1) The fileserver cursor introduced previously is now fleshed out and
used to iterate over fileservers and their addresses.
(2) Volume status is checked during iteration, and the server list is
replaced if a change is detected.
(3) Server status is checked during iteration, and the address list is
replaced if a change is detected.
(4) The abort code is saved into the address list cursor and -ECONNABORTED
returned in afs_make_call() if a remote abort happened rather than
translating the abort into an error message. This allows actions to
be taken depending on the abort code more easily.
(a) If a VMOVED abort is seen then this is handled by rechecking the
volume and restarting the iteration.
(b) If a VBUSY, VRESTARTING or VSALVAGING abort is seen then this is
handled by sleeping for a short period and retrying and/or trying
other servers that might serve that volume. A message is also
displayed once until the condition has cleared.
(c) If a VOFFLINE abort is seen, then this is handled as VBUSY for the
moment.
(d) If a VNOVOL abort is seen, the volume is rechecked in the VLDB to
see if it has been deleted; if not, the fileserver is probably
indicating that the volume couldn't be attached and needs
salvaging.
(e) If statfs() sees one of these aborts, it does not sleep, but
rather returns an error, so as not to block the umount program.
(5) The fileserver iteration functions in vnode.c are now merged into
their callers and more heavily macroised around the cursor. vnode.c
is removed.
(6) Operations on a particular vnode are serialised on that vnode because
the server will lock that vnode whilst it operates on it, so a second
op sent will just have to wait.
(7) Fileservers are probed with FS.GetCapabilities before being used.
This is where service upgrade will be done.
(8) A callback interest on a fileserver is set up before an FS operation
is performed and passed through to afs_make_call() so that it can be
set on the vnode if the operation returns a callback. The callback
interest is passed through to afs_iget() also so that it can be set
there too.
In general, record updating is done on an as-needed basis when we try to
access servers, volumes or vnodes rather than offloading it to work items
and special threads.
Notes:
(1) Pre AFS-3.4 servers are no longer supported, though this can be added
back if necessary (AFS-3.4 was released in 1998).
(2) VBUSY is retried forever for the moment at intervals of 1s.
(3) /proc/fs/afs/<cell>/servers no longer exists.
Signed-off-by: David Howells <dhowells@redhat.com>
Overhaul the way that the in-kernel AFS client keeps track of cells in the
following manner:
(1) Cells are now held in an rbtree to make walking them quicker and RCU
managed (though this is probably overkill).
(2) Cells now have a manager work item that:
(A) Looks after fetching and refreshing the VL server list.
(B) Manages cell record lifetime, including initialising and
destruction.
(B) Manages cell record caching whereby threads are kept around for a
certain time after last use and then destroyed.
(C) Manages the FS-Cache index cookie for a cell. It is not permitted
for a cookie to be in use twice, so we have to be careful to not
allow a new cell record to exist at the same time as an old record
of the same name.
(3) Each AFS network namespace is given a manager work item that manages
the cells within it, maintaining a single timer to prod cells into
updating their DNS records.
This uses the reduce_timer() facility to make the timer expire at the
soonest timed event that needs happening.
(4) When a module is being unloaded, cells and cell managers are now
counted out using dec_after_work() to make sure the module text is
pinned until after the data structures have been cleaned up.
(5) Each cell's VL server list is now protected by a seqlock rather than a
semaphore.
Signed-off-by: David Howells <dhowells@redhat.com>
Add xattrs to allow the user to get/set metadata in lieu of having pioctl()
available. The following xattrs are now available:
- "afs.cell"
The name of the cell in which the vnode's volume resides.
- "afs.fid"
The volume ID, vnode ID and vnode uniquifier of the file as three hex
numbers separated by colons.
- "afs.volume"
The name of the volume in which the vnode resides.
For example:
# getfattr -d -m ".*" /mnt/scratch
getfattr: Removing leading '/' from absolute path names
# file: mnt/scratch
afs.cell="mycell.myorg.org"
afs.fid="10000b:1:1"
afs.volume="scratch"
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>