linux

Author	SHA1	Message	Date
Dave Chinner	eac152b474	xfs: kill VN_DIRTY() Only one user of the macro and the dirty mapping check is redundant so just get rid of it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-08-04 13:22:49 +10:00
Dave Chinner	ad3714b82c	xfs: dquot recovery needs verifiers dquot recovery should add verifiers to the dquot buffers that it recovers changes into. Unfortunately, it doesn't attached the verifiers to the buffers in a consistent manner. For example, xlog_recover_dquot_pass2() reads dquot buffers without a verifier and then writes it without ever having attached a verifier to the buffer. Further, dquot buffer recovery may write a dquot buffer that has not been modified, or indeed, shoul dbe written because quotas are not enabled and hence changes to the buffer were not replayed. In this case, we again write buffers without verifiers attached because that doesn't happen until after the buffer changes have been replayed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-08-04 12:59:31 +10:00
Dave Chinner	5fd364fee8	xfs: quotacheck leaves dquot buffers without verifiers When running xfs/305, I noticed that quotacheck was flushing dquot buffers that did not have the xfs_dquot_buf_ops verifiers attached: XFS (vdb): _xfs_buf_ioapply: no ops on block 0x1dc8/0x1dc8 ffff880052489000: 44 51 01 04 00 00 65 b8 00 00 00 00 00 00 00 00 DQ....e......... ffff880052489010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff880052489020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff880052489030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ CPU: 1 PID: 2376 Comm: mount Not tainted 3.16.0-rc2-dgc+ #306 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 ffff88006fe38000 ffff88004a0ffae8 ffffffff81cf1cca 0000000000000001 ffff88004a0ffb88 ffffffff814d50ca 000010004a0ffc70 0000000000000000 ffff88006be56dc4 0000000000000021 0000000000001dc8 ffff88007c773d80 Call Trace: [<ffffffff81cf1cca>] dump_stack+0x45/0x56 [<ffffffff814d50ca>] _xfs_buf_ioapply+0x3ca/0x3d0 [<ffffffff810db520>] ? wake_up_state+0x20/0x20 [<ffffffff814d51f5>] ? xfs_bdstrat_cb+0x55/0xb0 [<ffffffff814d513b>] xfs_buf_iorequest+0x6b/0xd0 [<ffffffff814d51f5>] xfs_bdstrat_cb+0x55/0xb0 [<ffffffff814d53ab>] __xfs_buf_delwri_submit+0x15b/0x220 [<ffffffff814d6040>] ? xfs_buf_delwri_submit+0x30/0x90 [<ffffffff814d6040>] xfs_buf_delwri_submit+0x30/0x90 [<ffffffff8150f89d>] xfs_qm_quotacheck+0x17d/0x3c0 [<ffffffff81510591>] xfs_qm_mount_quotas+0x151/0x1e0 [<ffffffff814ed01c>] xfs_mountfs+0x56c/0x7d0 [<ffffffff814f0f12>] xfs_fs_fill_super+0x2c2/0x340 [<ffffffff811c9fe4>] mount_bdev+0x194/0x1d0 [<ffffffff814f0c50>] ? xfs_finish_flags+0x170/0x170 [<ffffffff814ef0f5>] xfs_fs_mount+0x15/0x20 [<ffffffff811ca8c9>] mount_fs+0x39/0x1b0 [<ffffffff811e4d67>] vfs_kern_mount+0x67/0x120 [<ffffffff811e757e>] do_mount+0x23e/0xad0 [<ffffffff8117abde>] ? __get_free_pages+0xe/0x50 [<ffffffff811e71e6>] ? copy_mount_options+0x36/0x150 [<ffffffff811e8103>] SyS_mount+0x83/0xc0 [<ffffffff81cfd40b>] tracesys+0xdd/0xe2 This was caused by dquot buffer readahead not attaching a verifier structure to the buffer when readahead was issued, resulting in the followup read of the buffer finding a valid buffer and so not attaching new verifiers to the buffer as part of the read. Also, when a verifier failure occurs, we then read the buffer without verifiers. Attach the verifiers manually after this read so that if the buffer is then written it will be verified that the corruption has been repaired. Further, when flushing a dquot we don't ask for a verifier when reading in the dquot buffer the dquot belongs to. Most of the time this isn't an issue because the buffer is still cached, but when it is not cached it will result in writing the dquot buffer without having the verfier attached. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-08-04 12:43:26 +10:00
Dave Chinner	67dc288c21	xfs: ensure verifiers are attached to recovered buffers Crash testing of CRC enabled filesystems has resulted in a number of reports of bad CRCs being detected after the filesystem was mounted. Errors such as the following were being seen: XFS (sdb3): Mounting V5 Filesystem XFS (sdb3): Starting recovery (logdev: internal) XFS (sdb3): Metadata CRC error detected at xfs_agf_read_verify+0x5a/0x100 [xfs], block 0x1 XFS (sdb3): Unmount and run xfs_repair XFS (sdb3): First 64 bytes of corrupted metadata buffer: ffff880136ffd600: 58 41 47 46 00 00 00 01 00 00 00 00 00 0f aa 40 XAGF...........@ ffff880136ffd610: 00 02 6d 53 00 02 77 f8 00 00 00 00 00 00 00 01 ..mS..w......... ffff880136ffd620: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 03 ................ ffff880136ffd630: 00 00 00 04 00 08 81 d0 00 08 81 a7 00 00 00 00 ................ XFS (sdb3): metadata I/O error: block 0x1 ("xfs_trans_read_buf_map") error 74 numblks 1 The errors were typically being seen in AGF, AGI and their related btree block buffers some time after log recovery had run. Often it wasn't until later subsequent mounts that the problem was discovered. The common symptom was a buffer with the correct contents, but a CRC and an LSN that matched an older version of the contents. Some debug added to _xfs_buf_ioapply() indicated that buffers were being written without verifiers attached to them from log recovery, and Jan Kara isolated the cause to log recovery readahead an dit's interactions with buffers that had a more recent LSN on disk than the transaction being recovered. In this case, the buffer did not get a verifier attached, and os when the second phase of log recovery ran and recovered EFIs and unlinked inodes, the buffers were modified and written without the verifier running. Hence they had up to date contents, but stale LSNs and CRCs. Fix it by attaching verifiers to buffers we skip due to future LSN values so they don't escape into the buffer cache without the correct verifier attached. This patch is based on analysis and a patch from Jan Kara. cc: <stable@vger.kernel.org> Reported-by: Jan Kara <jack@suse.cz> Reported-by: Fanael Linithien <fanael4@gmail.com> Reported-by: Grozdan <neutrino8@gmail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-08-04 12:43:06 +10:00
Dave Chinner	400b9d8875	xfs: catch buffers written without verifiers attached We recently had a bug where buffers were slipping through log recovery without any verifier attached to them. This was resulting in on-disk CRC mismatches for valid data. Add some warning code to catch this occurrence so that we catch such bugs during development rather than not being aware they exist. Note that we cannot do this verification unconditionally as non-CRC filesystems don't always attach verifiers to the buffers being written. e.g. during log recovery we cannot identify all the different types of buffers correctly on non-CRC filesystems, so we can't attach the correct verifiers in all cases and so we don't attach any. Hence we don't want on non-CRC filesystems to avoid spamming the logs with false indications. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-08-04 12:42:40 +10:00
Eric Sandeen	5ef828c415	xfs: avoid false quotacheck after unclean shutdown The commit `83e782e` xfs: Remove incore use of XFS_OQUOTA_ENFD and XFS_OQUOTA_CHKD added a new function xfs_sb_quota_from_disk() which swaps on-disk XFS_OQUOTA_* flags for in-core XFS_GQUOTA_* and XFS_PQUOTA_* flags after the superblock is read. However, if log recovery is required, the superblock is read again, and the modified in-core flags are re-read from disk, so we have XFS_OQUOTA_* flags in memory again. This causes the XFS_QM_NEED_QUOTACHECK() test to be true, because the XFS_OQUOTA_CHKD is still set, and not XFS_GQUOTA_CHKD or XFS_PQUOTA_CHKD. Change xfs_sb_from_disk to call xfs_sb_quota_from disk and always convert the disk flags to in-memory flags. Add a lower-level function which can be called with "false" to not convert the flags, so that the sb verifier can verify exactly what was on disk, per Brian Foster's suggestion. Reported-by: Cyril B. <cbay@excellency.fr> Signed-off-by: Eric Sandeen <sandeen@redhat.com>	2014-08-04 11:35:44 +10:00
Brian Foster	eedf32bfca	xfs: fix rounding error of fiemap length parameter The offset and length parameters are converted from bytes to basic blocks by xfs_vn_fiemap(). The BTOBB() converter rounds the value up to the nearest basic block. This leads to unexpected behavior when unaligned offsets are provided to FIEMAP. Fix the conversions of byte values to block values to cover the provided offsets. Round down the start offset to the nearest basic block. Calculate the end offset based on the provided values, round up and calculate length based on the start block offset. Reported-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-08-04 11:35:35 +10:00
Jie Liu	1e773c4989	xfs: introduce xfs_bulkstat_ag_ichunk Introduce xfs_bulkstat_ag_ichunk() to process inodes in chunk with a pointer to a formatter function that will iget the inode and fill in the appropriate structure. Refactor xfs_bulkstat() with it. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-08-04 11:22:31 +10:00
NeilBrown	f682a398b2	NFS: allow lockless access to access_cache The access cache is used during RCU-walk path lookups, so it is best to avoid locking if possible as taking a lock kills concurrency. The rbtree is not rcu-safe and cannot easily be made so. Instead we simply check the last (i.e. most recent) entry on the LRU list. If this doesn't match, then we return -ECHILD and retry in lock/refcount mode. This requires freeing the nfs_access_entry struct with rcu, and requires using rcu access primatives when adding entries to the lru, and when examining the last entry. Calling put_rpccred before kfree_rcu looks a bit odd, but as put_rpccred already provides rcu protection, we know that the cred will not actually be freed until the next grace period, so any concurrent access will be safe. This patch provides about 5% performance improvement on a stat-heavy synthetic work load with 4 threads on a 2-core CPU. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:14:13 -04:00
NeilBrown	1fa1e38447	NFS: teach nfs_lookup_verify_inode to handle LOOKUP_RCU It fails with -ECHILD rather than make an RPC call. This allows nfs_lookup_revalidate to call it in RCU-walk mode. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:14:13 -04:00
NeilBrown	912a108da7	NFS: teach nfs_neg_need_reval to understand LOOKUP_RCU This requires nfs_check_verifier to take an rcu_walk flag, and requires an rcu version of nfs_revalidate_inode which returns -ECHILD rather than making an RPC call. With this, nfs_lookup_revalidate can call nfs_neg_need_reval in RCU-walk mode. We can also move the LOOKUP_RCU check past the nfs_check_verifier() call in nfs_lookup_revalidate. If RCU_WALK prevents nfs_check_verifier or nfs_neg_need_reval from doing a full check, they return a status indicating that a revalidation is required. As this revalidation will not be possible in RCU_WALK mode, -ECHILD will ultimately be returned, which is the desired result. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:14:12 -04:00
NeilBrown	f3324a2a94	NFS: support RCU_WALK in nfs_permission() nfs_permission makes two calls which are not always safe in RCU_WALK, rpc_lookup_cred and nfs_do_access. The second can easily be made rcu-safe by aborting with -ECHILD before making the RPC call. The former can be made rcu-safe by calling rpc_lookup_cred_nonblock() instead. As this will almost always succeed, we use it even when RCU_WALK isn't being used as it still saves some spinlocks in a common case. We only fall back to rpc_lookup_cred() if rpc_lookup_cred_nonblock() fails and MAY_NOT_BLOCK isn't set. This optimisation (always trying rpc_lookup_cred_nonblock()) is particularly important when a security module is active. In that case inode_permission() may return -ECHILD from security_inode_permission() even though ->permission() succeeded in RCU_WALK mode. This leads to may_lookup() retrying inode_permission after performing unlazy_walk(). The spinlock that rpc_lookup_cred() takes is often more expensive than anything security_inode_permission() does, so that spinlock becomes the main bottleneck. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:14:12 -04:00
NeilBrown	d51ac1a8e9	NFS: prepare for RCU-walk support but pushing tests later in code. nfs_lookup_revalidate, nfs4_lookup_revalidate, and nfs_permission all need to understand and handle RCU-walk for NFS to gain the benefits of RCU-walk for cached information. Currently these functions all immediately return -ECHILD if the relevant flag (LOOKUP_RCU or MAY_NOT_BLOCK) is set. This patch pushes those tests later in the code so that we only abort immediately before we enter rcu-unsafe code. As subsequent patches make that rcu-unsafe code rcu-safe, several of these new tests will disappear. With this patch there are several paths through the code which will no longer return -ECHILD during an RCU-walk. However these are mostly error paths or other uninteresting cases. A noteworthy change in nfs_lookup_revalidate is that we don't take (or put) the reference to ->d_parent when LOOKUP_RCU is set. Rather we rcu_dereference ->d_parent, and check that ->d_inode is not NULL. We also check that ->d_parent hasn't changed after all the tests. In nfs4_lookup_revalidate we simply avoid testing LOOKUP_RCU on the path that only calls nfs_lookup_revalidate() as that function already performs the required test. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:14:11 -04:00
NeilBrown	49317a7fda	NFS: nfs4_lookup_revalidate: only evaluate parent if it will be used. nfs4_lookup_revalidate only uses 'parent' to get 'dir', and only uses 'dir' if 'inode == NULL'. So we don't need to find out what 'parent' or 'dir' is until we know that 'inode' is NULL. By moving 'dget_parent' inside the 'if', we can reduce the number of call sites for 'dput(parent)'. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:14:11 -04:00
Alexey Khoroshilov	1f70ef96b1	NFS: add checks for returned value of try_module_get() There is a couple of places in client code where returned value of try_module_get() is ignored. As a result there is a small chance to premature unload module because of unbalanced refcounting. The patch adds error handling in that places. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:14:10 -04:00
Weston Andros Adamson	411a99adff	nfs: clear_request_commit while holding i_lock Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:05:26 -04:00
Weston Andros Adamson	e6cf82d183	pnfs: add pnfs_put_lseg_async This is useful when lsegs need to be released while holding locks. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:05:25 -04:00
Weston Andros Adamson	02d1426c70	pnfs: find swapped pages on pnfs commit lists too nfs_page_find_head_request_locked looks through the regular nfs commit lists when the page is swapped out, but doesn't look through the pnfs commit lists. I'm not sure if anyone has hit any issues caused by this. Suggested-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:05:25 -04:00
Weston Andros Adamson	b412ddf066	nfs: fix comment and add warn_on for PG_INODE_REF Fix the comment in nfs_page.h for PG_INODE_REF to reflect that it's no longer set only on head requests. Also add a WARN_ON_ONCE in nfs_inode_remove_request as PG_INODE_REF should always be set. Suggested-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:05:25 -04:00
Weston Andros Adamson	e7029206ff	nfs: check wait_on_bit_lock err in page_group_lock Return errors from wait_on_bit_lock from nfs_page_group_lock. Add a bool argument @wait to nfs_page_group_lock. If true, loop over wait_on_bit_lock until it returns cleanly. If false, return the error from wait_on_bit_lock. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:05:24 -04:00
NeilBrown	4fa2c54b51	NFS: nfs4_do_open should add negative results to the dcache. If you have an NFSv4 mounted directory which does not container 'foo' and: ls -l foo ssh $server touch foo cat foo then the 'cat' will fail (usually, depending a bit on the various cache ages). This is correct as negative looks are cached by default. However with the same initial conditions: cat foo ssh $server touch foo cat foo will usually succeed. This is because an "open" does not add a negative dentry to the dcache, while a "lookup" does. This can have negative performance effects. When "gcc" searches for an include file, it will try to "open" the file in every director in the search path. Without caching of negative "open" results, this generates much more traffic to the server than it should (or than NFSv3 does). The root of the problem is that _nfs4_open_and_get_state() will call d_add_unique() on a positive result, but not on a negative result. Compare with nfs_lookup() which calls d_materialise_unique on both a positive result and on ENOENT. This patch adds a call d_add() in the ENOENT case for _nfs4_open_and_get_state() and also calls nfs_set_verifier(). With it, many fewer "open" requests for known-non-existent files are sent to the server. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:05:22 -04:00
Andrey Utkin	7a9e75a185	nfs3_list_one_acl(): check get_acl() result with IS_ERR_OR_NULL There was a check for result being not NULL. But get_acl() may return NULL, or ERR_PTR, or actual pointer. The purpose of the function where current change is done is to "list ACLs only when they are available", so any error condition of get_acl() mustn't be elevated, and returning 0 there is still valid. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81111 Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Fixes: `74adf83f5d` (nfs: only show Posix ACLs in listxattr if actually...) Cc: stable@vger.kernel.org # 3.14+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:05:22 -04:00
Trond Myklebust	9806755c56	Merge branch 'nfs-rdma' of git://git.linux-nfs.org/projects/anna/nfs-rdma into linux-next * 'nfs-rdma' of git://git.linux-nfs.org/projects/anna/nfs-rdma: (916 commits) xprtrdma: Handle additional connection events xprtrdma: Remove RPCRDMA_PERSISTENT_REGISTRATION macro xprtrdma: Make rpcrdma_ep_disconnect() return void xprtrdma: Schedule reply tasklet once per upcall xprtrdma: Allocate each struct rpcrdma_mw separately xprtrdma: Rename frmr_wr xprtrdma: Disable completions for LOCAL_INV Work Requests xprtrdma: Disable completions for FAST_REG_MR Work Requests xprtrdma: Don't post a LOCAL_INV in rpcrdma_register_frmr_external() xprtrdma: Reset FRMRs after a flushed LOCAL_INV Work Request xprtrdma: Reset FRMRs when FAST_REG_MR is flushed by a disconnect xprtrdma: Properly handle exhaustion of the rb_mws list xprtrdma: Chain together all MWs in same buffer pool xprtrdma: Back off rkey when FAST_REG_MR fails xprtrdma: Unclutter struct rpcrdma_mr_seg xprtrdma: Don't invalidate FRMRs if registration fails xprtrdma: On disconnect, don't ignore pending CQEs xprtrdma: Update rkeys after transport reconnect xprtrdma: Limit data payload size for ALLPHYSICAL xprtrdma: Protect ia->ri_id when unmapping/invalidating MRs ...	2014-08-03 17:04:51 -04:00
Trond Myklebust	3a505845cd	NFS: Enforce an upper limit on the number of cached access call This may be used to limit the number of cached credentials building up inside the access cache. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-08-03 17:03:22 -04:00
Steve French	59b04c5df7	[CIFS] Fix incorrect hex vs. decimal in some debug print statements Joe Perches and Hans Wennborg noticed that various places in the kernel were printing decimal numbers with 0x prefix. printk("0x%d") or equivalent This fixes the instances of this in the cifs driver. CC: Hans Wennborg <hans@hanshq.net> CC: Joe Perches <joe@perches.com> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 21:16:48 -05:00
Chao Yu	70cfed88ef	f2fs: avoid skipping recover_inline_xattr after recover_inline_data When we recover data of inode in roll-forward procedure, and the inode has both inline data and inline xattr. We may skip recovering inline xattr if we recover inline data form node page first. This patch will fix the problem that we lost inline xattr data in above scenario. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-02 07:43:51 -07:00
Chao Yu	70407fad85	f2fs: add tracepoint for f2fs_direct_IO This patch adds a tracepoint for f2fs_direct_IO. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-02 07:34:46 -07:00
Steve French	81691503b2	Update cifs version to 2.04 Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:04 -05:00
Pavel Shilovsky	21496687a7	CIFS: Fix STATUS_CANNOT_DELETE error mapping for SMB2 The existing mapping causes unlink() call to return error after delete operation. Changing the mapping to -EACCES makes the client process the call like CIFS protocol does - reset dos attributes with ATTR_READONLY flag masked off and retry the operation. Cc: stable@vger.kernel.org Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:04 -05:00
Pavel Shilovsky	b770ddfa26	CIFS: Optimize readpages in a short read case on reconnects by marking pages with a data from a partially received response up-to-date. This is suitable for non-signed connections. Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:04 -05:00
Pavel Shilovsky	d913ed17f0	CIFS: Optimize cifs_user_read() in a short read case on reconnects by filling the output buffer with a data got from a partially received response and requesting the remaining data from the server. This is suitable for non-signed connections. Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:04 -05:00
Pavel Shilovsky	fb8a3e5255	CIFS: Improve indentation in cifs_user_read() Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:04 -05:00
Pavel Shilovsky	2e8a05d802	CIFS: Fix possible buffer corruption in cifs_user_read() If there was a short read in the middle of the rdata list, we can end up with a corrupt output buffer. Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:04 -05:00
Pavel Shilovsky	b3160aebb4	CIFS: Count got bytes in read_into_pages() that let us know how many bytes we have already got before reconnect. Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:04 -05:00
Pavel Shilovsky	34a54d6177	CIFS: Use separate var for the number of bytes got in async read and don't mix it with the number of bytes that was requested. Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:04 -05:00
Pavel Shilovsky	3fabaa2746	CIFS: Indicate reconnect with ECONNABORTED error code that let us not mix it with EAGAIN. Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:03 -05:00
Pavel Shilovsky	bed9da0213	CIFS: Use multicredits for SMB 2.1/3 reads If we negotiate SMB 2.1 and higher version of the protocol and a server supports large read buffer size, we need to consume 1 credit per 65536 bytes. So, we need to know how many credits we have and obtain the required number of them before constructing a readdata structure in readpages and user read. Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:03 -05:00
Pavel Shilovsky	e374d90f8a	CIFS: Fix rsize usage for sync read If a server changes maximum buffer size for read requests (rsize) on reconnect we can fail on repeating with a big size buffer on -EAGAIN error in cifs_read. Fix this by checking rsize all the time before repeating requests. Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:03 -05:00
Pavel Shilovsky	25f402598d	CIFS: Fix rsize usage in user read If a server changes maximum buffer size for read (rsize) requests on reconnect we can fail on repeating with a big size buffer on -EAGAIN error in user read. Fix this by checking rsize all the time before repeating requests. Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:03 -05:00
Pavel Shilovsky	0ada36b244	CIFS: Separate page reading from user read Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:03 -05:00
Pavel Shilovsky	69cebd7560	CIFS: Fix rsize usage in readpages If a server changes maximum buffer size for read (rsize) requests on reconnect we can fail on repeating with a big size buffer on -EAGAIN error in readpages. Fix this by checking rsize all the time before repeating requests. Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:03 -05:00
Pavel Shilovsky	387eb92ac6	CIFS: Separate page search from readpages Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:03 -05:00
Pavel Shilovsky	cb7e9eabb2	CIFS: Use multicredits for SMB 2.1/3 writes If we negotiate SMB 2.1 and higher version of the protocol and a server supports large write buffer size, we need to consume 1 credit per 65536 bytes. So, we need to know how many credits we have and obtain the required number of them before constructing a writedata structure in writepages and iovec write. Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:03 -05:00
Pavel Shilovsky	6ec0b01b26	CIFS: Fix wsize usage in iovec write If a server change maximum buffer size for write (wsize) requests on reconnect we can fail on repeating with a big size buffer on -EAGAIN error in iovec write. Fix this by checking wsize all the time before repeating request in iovec write. Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:02 -05:00
Pavel Shilovsky	43de94eadf	CIFS: Separate writing from iovec write Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:02 -05:00
Pavel Shilovsky	66386c08be	CIFS: Separate filling pages from iovec write Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:02 -05:00
Pavel Shilovsky	7f6c50086a	CIFS: Fix cifs_writev_requeue when wsize changes If wsize changes on reconnect we need to use new writedata structure that for retrying. Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:02 -05:00
Pavel Shilovsky	66231a4796	CIFS: Fix wsize usage in writepages If a server change maximum buffer size for write (wsize) requests on reconnect we can fail on repeating with a big size buffer on -EAGAIN error in writepages. Fix this by checking wsize all the time before repeating request in writepages. Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:02 -05:00
Pavel Shilovsky	90ac1387c2	CIFS: Separate pages initialization from writepages Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:02 -05:00
Pavel Shilovsky	619aa48edb	CIFS: Separate page sending from writepages Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:02 -05:00
Steve French	27924075b5	Remove sparse build warning The recent session setup patch set (cifs-Separate-rawntlmssp-auth-from-CIFS_SessSetup.patch) had introduced a trivial sparse build warning. Signed-off-by: Steve French <smfrench@gmail.com> Cc: Sachin Prabhu <sprabhu@redhat.com>	2014-08-02 01:23:01 -05:00
Pavel Shilovsky	7e48ff8202	CIFS: Separate page processing from writepages Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Reviewed-by: Jeff Layton <jlayton@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:01 -05:00
Pavel Shilovsky	038bc961c3	CIFS: Fix async reading on reconnects If we get into read_into_pages() from cifs_readv_receive() and then loose a network, we issue cifs_reconnect that moves all mids to a private list and issue their callbacks. The callback of the async read request sets a mid to retry, frees it and wakes up a process that waits on the rdata completion. After the connection is established we return from read_into_pages() with a short read, use the mid that was freed before and try to read the remaining data from the a newly created socket. Both actions are not what we want to do. In reconnect cases (-EAGAIN) we should not mask off the error with a short read but should return the error code instead. Acked-by: Jeff Layton <jlayton@samba.org> Cc: stable@vger.kernel.org Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>	2014-08-02 01:23:01 -05:00
Jeff Layton	7abea1e8e8	nfsd: don't destroy client if mark_client_expired_locked fails If it fails, it means that the client is in use and so destroying it would be bad. Currently, the client_mutex prevents this from happening but once we remove it, we won't be able to do this. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:26 -04:00
Jeff Layton	97403d95e1	nfsd: move unhash_client_locked call into mark_client_expired_locked All the callers except for the fault injection code call it directly afterward, and in the fault injection case it won't hurt to do so anyway. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:25 -04:00
Jeff Layton	217526e7ec	nfsd: protect the close_lru list and oo_last_closed_stid with client_lock Currently, it's protected by the client_mutex. Move it so that the list and the fields in the openowner are protected by the client_lock. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:24 -04:00
Trond Myklebust	0a880a28f8	nfsd: Add lockdep assertions to document the nfs4_client/session locking Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:23 -04:00
Trond Myklebust	3e339f964b	nfsd: Ensure lookup_clientid() takes client_lock Ensure that the client lookup is done safely under the client_lock, so we're not relying on the client_mutex. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:23 -04:00
Trond Myklebust	6b10ad193d	nfsd: Protect nfsd4_destroy_clientid using client_lock ...instead of relying on the client_mutex. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:22 -04:00
Jeff Layton	d20c11d86d	nfsd: Protect session creation and client confirm using client_lock In particular, we want to ensure that the move_to_confirmed() is protected by the nn->client_lock spin lock, so that we can use that when looking up the clientid etc. instead of relying on the client_mutex. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:21 -04:00
Trond Myklebust	3dbacee6e1	nfsd: Protect unconfirmed client creation using client_lock ...instead of relying on the client_mutex. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:20 -04:00
Trond Myklebust	5cc40fd7b6	nfsd: Move create_client() call outside the lock For efficiency reasons, and because we want to use spin locks instead of relying on the client_mutex. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:20 -04:00
Trond Myklebust	425510f5c8	nfsd: Don't require client_lock in free_client The struct nfs_client is supposed to be invisible and unreferenced before it gets here. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:19 -04:00
Trond Myklebust	4864af97e0	nfsd: Ensure that the laundromat unhashes the client before releasing locks If we leave the client on the confirmed/unconfirmed tables, and leave the sessions visible on the sessionid_hashtbl, then someone might find them before we've had a chance to destroy them. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:18 -04:00
Trond Myklebust	4beb345b37	nfsd: Ensure struct nfs4_client is unhashed before we try to destroy it When we remove the client_mutex protection, we will need to ensure that it can't be found by other threads while we're destroying it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:17 -04:00
J. Bruce Fields	83e452fee8	nfsd4: fix out of date comment Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:28:16 -04:00
Kinglong Mee	d9499a9571	NFSD: Decrease nfsd_users in nfsd_startup_generic fail A memory allocation failure could cause nfsd_startup_generic to fail, in which case nfsd_users wouldn't be incorrectly left elevated. After nfsd restarts nfsd_startup_generic will then succeed without doing anything--the first consequence is likely nfs4_start_net finding a bad laundry_wq and crashing. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Fixes: `4539f14981` "nfsd: replace boolean nfsd_up flag by users counter" Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-01 16:26:09 -04:00
Eric Biggers	6d2b6170c8	vfs: fix check for fallocate on active swapfile Fix the broken check for calling sys_fallocate() on an active swapfile, introduced by commit `0790b31b69` ("fs: disallow all fallocate operation on active swapfile"). Signed-off-by: Eric Biggers <ebiggers3@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-08-01 02:36:04 -04:00
Christoph Hellwig	af43647277	direct-io: fix AIO regression The direct-io.c rewrite to use the iov_iter infrastructure stopped updating the size field in struct dio_submit, and thus rendered the check for allowing asynchronous completions to always return false. Fix this by comparing it to the count of bytes in the iov_iter instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Tim Chen <tim.c.chen@linux.intel.com> Tested-by: Tim Chen <tim.c.chen@linux.intel.com>	2014-08-01 02:35:51 -04:00
Sachin Prabhu	cc87c47d9d	cifs: Separate rawntlmssp auth from CIFS_SessSetup() Separate rawntlmssp authentication from CIFS_SessSetup(). Also cleanup CIFS_SessSetup() since we no longer do any auth within it. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Steve French <smfrench@gmail.com>	2014-07-31 23:11:15 -05:00
Sachin Prabhu	ee03c646dd	cifs: Split Kerberos authentication off CIFS_SessSetup() Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Steve French <smfrench@gmail.com>	2014-07-31 23:11:15 -05:00
Sachin Prabhu	583cf7afc7	cifs: Split ntlm and ntlmv2 authentication methods off CIFS_SessSetup() Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Steve French <smfrench@gmail.com>	2014-07-31 23:11:15 -05:00
Sachin Prabhu	80a0e63751	cifs: Split lanman auth from CIFS_SessSetup() In preparation for splitting CIFS_SessSetup() into smaller more manageable chunks, we first add helper functions. We then proceed to split out lanman auth out of CIFS_SessSetup() Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Steve French <smfrench@gmail.com>	2014-07-31 23:11:15 -05:00
Sachin Prabhu	6d81ed1ec2	cifs: replace code with free_rsp_buf() The functionality provided by free_rsp_buf() is duplicated in a number of places. Replace these instances with a call to free_rsp_buf(). Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Steve French <smfrench@gmail.com>	2014-07-31 23:11:15 -05:00
Eric W. Biederman	ffbc6f0ead	mnt: Change the default remount atime from relatime to the existing value Since March 2009 the kernel has treated the state that if no MS_..ATIME flags are passed then the kernel defaults to relatime. Defaulting to relatime instead of the existing atime state during a remount is silly, and causes problems in practice for people who don't specify any MS_...ATIME flags and to get the default filesystem atime setting. Those users may encounter a permission error because the default atime setting does not work. A default that does not work and causes permission problems is ridiculous, so preserve the existing value to have a default atime setting that is always guaranteed to work. Using the default atime setting in this way is particularly interesting for applications built to run in restricted userspace environments without /proc mounted, as the existing atime mount options of a filesystem can not be read from /proc/mounts. In practice this fixes user space that uses the default atime setting on remount that are broken by the permission checks keeping less privileged users from changing more privileged users atime settings. Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-07-31 17:12:59 -07:00
Eric W. Biederman	9566d67428	mnt: Correct permission checks in do_remount While invesgiating the issue where in "mount --bind -oremount,ro ..." would result in later "mount --bind -oremount,rw" succeeding even if the mount started off locked I realized that there are several additional mount flags that should be locked and are not. In particular MNT_NOSUID, MNT_NODEV, MNT_NOEXEC, and the atime flags in addition to MNT_READONLY should all be locked. These flags are all per superblock, can all be changed with MS_BIND, and should not be changable if set by a more privileged user. The following additions to the current logic are added in this patch. - nosuid may not be clearable by a less privileged user. - nodev may not be clearable by a less privielged user. - noexec may not be clearable by a less privileged user. - atime flags may not be changeable by a less privileged user. The logic with atime is that always setting atime on access is a global policy and backup software and auditing software could break if atime bits are not updated (when they are configured to be updated), and serious performance degradation could result (DOS attack) if atime updates happen when they have been explicitly disabled. Therefore an unprivileged user should not be able to mess with the atime bits set by a more privileged user. The additional restrictions are implemented with the addition of MNT_LOCK_NOSUID, MNT_LOCK_NODEV, MNT_LOCK_NOEXEC, and MNT_LOCK_ATIME mnt flags. Taken together these changes and the fixes for MNT_LOCK_READONLY should make it safe for an unprivileged user to create a user namespace and to call "mount --bind -o remount,... ..." without the danger of mount flags being changed maliciously. Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-07-31 17:12:34 -07:00
Eric W. Biederman	07b645589d	mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount There are no races as locked mount flags are guaranteed to never change. Moving the test into do_remount makes it more visible, and ensures all filesystem remounts pass the MNT_LOCK_READONLY permission check. This second case is not an issue today as filesystem remounts are guarded by capable(CAP_DAC_ADMIN) and thus will always fail in less privileged mount namespaces, but it could become an issue in the future. Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-07-31 17:12:17 -07:00
Eric W. Biederman	a6138db815	mnt: Only change user settable mount flags in remount Kenton Varda <kenton@sandstorm.io> discovered that by remounting a read-only bind mount read-only in a user namespace the MNT_LOCK_READONLY bit would be cleared, allowing an unprivileged user to the remount a read-only mount read-write. Correct this by replacing the mask of mount flags to preserve with a mask of mount flags that may be changed, and preserve all others. This ensures that any future bugs with this mask and remount will fail in an easy to detect way where new mount flags simply won't change. Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-07-31 17:11:54 -07:00
Jeff Layton	4ae098d327	nfsd: rename unhash_generic_stateid to unhash_ol_stateid ...to better match other functions that deal with open/lock stateids. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:31 -04:00
Jeff Layton	d83017f94c	nfsd: don't thrash the cl_lock while freeing an open stateid When we remove the client_mutex, we'll have a potential race between FREE_STATEID and CLOSE. The root of the problem is that we are walking the st_locks list, dropping the spinlock and then trying to release the persistent reference to the lockstateid. In between, a FREE_STATEID call can come along and take the lock, find the stateid and then try to put the reference. That leads to a double put. Fix this by not releasing the cl_lock in order to release each lock stateid. Use put_generic_stateid_locked to unhash them and gather them onto a list, and free_ol_stateid_reaplist to free any that end up on the list. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:31 -04:00
Jeff Layton	2c41beb0e5	nfsd: reduce cl_lock thrashing in release_openowner Releasing an openowner is a bit inefficient as it can potentially thrash the cl_lock if you have a lot of stateids attached to it. Once we remove the client_mutex, it'll also potentially be dangerous to do this. Add some functions to make it easier to defer the part of putting a generic stateid reference that needs to be done outside the cl_lock while doing the parts that must be done while holding it under a single lock. First we unhash each open stateid. Then we call put_generic_stateid_locked which will put the reference to an nfs4_ol_stateid. If it turns out to be the last reference, it'll go ahead and remove the stid from the IDR tree and put it onto the reaplist using the st_locks list_head. Then, after dropping the lock we'll call free_ol_stateid_reaplist to walk the list of stateids that are fully unhashed and ready to be freed, and free each of them. This function can sleep, so it must be done outside any spinlocks. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:30 -04:00
Jeff Layton	fc5a96c3b7	nfsd: close potential race in nfsd4_free_stateid Once we remove the client_mutex, it'll be possible for the sc_type of a lock stateid to change after it's found and checked, but before we can go to destroy it. If that happens, we can end up putting the persistent reference to the stateid more than once, and unhash it more than once. Fix this by unhashing the lock stateid prior to dropping the cl_lock but after finding it. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:29 -04:00
Jeff Layton	3c1c995cc2	nfsd: optimize destroy_lockowner cl_lock thrashing Reduce the cl_lock trashing in destroy_lockowner. Unhash all of the lockstateids on the lockowner's list. Put the reference under the lock and see if it was the last one. If so, then add it to a private list to be destroyed after we drop the lock. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:28 -04:00
Jeff Layton	a819ecc1bb	nfsd: add locking to stateowner release Once we remove the client_mutex, we'll need to properly protect the stateowner reference counts using the cl_lock. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:27 -04:00
Jeff Layton	882e9d25e1	nfsd: clean up and reorganize release_lockowner Do more within the main loop, and simplify the function a bit. Also, there's no need to take a stateowner reference unless we're going to call release_lockowner. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:27 -04:00
Trond Myklebust	d4f0489f38	nfsd: Move the open owner hash table into struct nfs4_client Preparation for removing the client_mutex. Convert the open owner hash table into a per-client table and protect it using the nfs4_client->cl_lock spin lock. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:26 -04:00
Trond Myklebust	c58c6610ec	nfsd: Protect adding/removing lock owners using client_lock Once we remove client mutex protection, we'll need to ensure that stateowner lookup and creation are atomic between concurrent compounds. Ensure that alloc_init_lock_stateowner checks the hashtable under the client_lock before adding a new element. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:25 -04:00
Trond Myklebust	7ffb588086	nfsd: Protect adding/removing open state owners using client_lock Once we remove client mutex protection, we'll need to ensure that stateowner lookup and creation are atomic between concurrent compounds. Ensure that alloc_init_open_stateowner checks the hashtable under the client_lock before adding a new element. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:24 -04:00
Jeff Layton	b401be22b5	nfsd: don't allow CLOSE to proceed until refcount on stateid drops Once we remove client_mutex protection, it'll be possible to have an in-flight operation using an openstateid when a CLOSE call comes in. If that happens, we can't just put the sc_file reference and clear its pointer without risking an oops. Fix this by ensuring that v4.0 CLOSE operations wait for the refcount to drop before proceeding to do so. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:23 -04:00
Jeff Layton	d3134b1049	nfsd: make openstateids hold references to their openowners Change it so that only openstateids hold persistent references to openowners. References can still be held by compounds in progress. With this, we can get rid of NFS4_OO_NEW. It's possible that we will create a new openowner in the process of doing the open, but something later fails. In the meantime, another task could find that openowner and start using it on a successful open. If that occurs we don't necessarily want to tear it down, just put the reference that the failing compound holds. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:23 -04:00
Jeff Layton	5adfd8850b	nfsd: clean up refcounting for lockowners Ensure that lockowner references are only held by lockstateids and operations that are in-progress. With this, we can get rid of release_lockowner_if_empty, which will be racy once we remove client_mutex protection. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:22 -04:00
Trond Myklebust	e4f1dd7fc2	nfsd: Make lock stateid take a reference to the lockowner A necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:21 -04:00
Jeff Layton	8f4b54c53f	nfsd: add an operation for unhashing a stateowner Allow stateowners to be unhashed and destroyed when the last reference is put. The unhashing must be idempotent. In a future patch, we'll add some locking around it, but for now it's only protected by the client_mutex. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:20 -04:00
Jeff Layton	5db1c03feb	nfsd: clean up lockowner refcounting when finding them Ensure that when finding or creating a lockowner, that we get a reference to it. For now, we also take an extra reference when a lockowner is created that can be put when release_lockowner is called, but we'll remove that in a later patch once we change how references are held. Since we no longer destroy lockowners in the event of an error in nfsd4_lock, we must change how the seqid gets bumped in the lk_is_new case. Instead of doing so on creation, do it manually in nfsd4_lock. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:20 -04:00
Jeff Layton	58fb12e6a4	nfsd: Add a mutex to protect the NFSv4.0 open owner replay cache We don't want to rely on the client_mutex for protection in the case of NFSv4 open owners. Instead, we add a mutex that will only be taken for NFSv4.0 state mutating operations, and that will be released once the entire compound is done. Also, ensure that nfsd4_cstate_assign_replay/nfsd4_cstate_clear_replay take a reference to the stateowner when they are using it for NFSv4.0 open and lock replay caching. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:19 -04:00
Jeff Layton	6b180f0b57	nfsd: Add reference counting to state owners The way stateowners are managed today is somewhat awkward. They need to be explicitly destroyed, even though the stateids reference them. This will be particularly problematic when we remove the client_mutex. We may create a new stateowner and attempt to open a file or set a lock, and have that fail. In the meantime, another RPC may come in that uses that same stateowner and succeed. We can't have the first task tearing down the stateowner in that situation. To fix this, we need to change how stateowners are tracked altogether. Refcount them and only destroy them once all stateids that reference them have been destroyed. This patch starts by adding the refcounting necessary to do that. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:18 -04:00
Trond Myklebust	2d3f96689f	nfsd: Migrate the stateid reference into nfs4_find_stateid_by_type() Allow nfs4_find_stateid_by_type to take the stateid reference, while still holding the &cl->cl_lock. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:17 -04:00
Trond Myklebust	fd9110113c	nfsd: Migrate the stateid reference into nfs4_lookup_stateid() Allow nfs4_lookup_stateid to take the stateid reference, instead of having all the callers do so. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:16 -04:00
Trond Myklebust	4cbfc9f704	nfsd: Migrate the stateid reference into nfs4_preprocess_seqid_op Allow nfs4_preprocess_seqid_op to take the stateid reference, instead of having all the callers do so. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:15 -04:00
Trond Myklebust	0667b1e9d8	nfsd: Add reference counting to nfs4_preprocess_confirmed_seqid_op Ensure that all the callers put the open stateid after use. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:15 -04:00
Trond Myklebust	2585fc7958	nfsd: nfsd4_open_confirm() must reference the open stateid Ensure that nfsd4_open_confirm() keeps a reference to the open stateid until it is done working with it. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:14 -04:00
Trond Myklebust	8a0b589d8f	nfsd: Prepare nfsd4_close() for open stateid referencing Prepare nfsd4_close for a future where nfs4_preprocess_seqid_op() hands it a fully referenced open stateid. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:13 -04:00
Trond Myklebust	d6f2bc5dcf	nfsd: nfsd4_process_open2() must reference the open stateid Ensure that nfsd4_process_open2() keeps a reference to the open stateid until it is done working with it. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:12 -04:00
Trond Myklebust	dcd94cc2e7	nfsd: nfsd4_process_open2() must reference the delegation stateid Ensure that nfsd4_process_open2() keeps a reference to the delegation stateid until it is done working with it. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:11 -04:00
Trond Myklebust	67cb1279be	nfsd: Ensure that nfs4_open_delegation() references the delegation stateid Ensure that nfs4_open_delegation() keeps a reference to the delegation stateid until it is done working with it. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:11 -04:00
Trond Myklebust	858cc57336	nfsd: nfsd4_locku() must reference the lock stateid Ensure that nfsd4_locku() keeps a reference to the lock stateid until it is done working with it. Necessary step toward client_mutex removal. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:10 -04:00
Trond Myklebust	3d0fabd5a4	nfsd: Add reference counting to lock stateids Ensure that nfsd4_lock() references the lock stateid while it is manipulating it. Not currently necessary, but will be once the client_mutex is removed. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:09 -04:00
Jeff Layton	1af71cc801	nfsd: ensure atomicity in nfsd4_free_stateid and nfsd4_validate_stateid Hold the cl_lock over the bulk of these functions. In addition to ensuring that they aren't freed prematurely, this will also help prevent a potential race that could be introduced later. Once we remove the client_mutex, it'll be possible for FREE_STATEID and CLOSE to race and for both to try to put the "persistent" reference to the stateid. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:08 -04:00
Jeff Layton	356a95ece7	nfsd: clean up races in lock stateid searching and creation Preparation for removal of the client_mutex. Currently, no lock aside from the client_mutex is held when calling find_lock_state. Ensure that the cl_lock is held by adding a lockdep assertion. Once we remove the client_mutex, it'll be possible for another thread to race in and insert a lock state for the same file after we search but before we insert a new one. Ensure that doesn't happen by redoing the search after allocating a new stid that we plan to insert. If one is found just put the one that was allocated. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:07 -04:00
Jeff Layton	1c755dc1ad	nfsd: Add locking to protect the state owner lists Change to using the clp->cl_lock for this. For now, there's a lot of cl_lock thrashing, but in later patches we'll eliminate that and close the potential races that can occur when releasing the cl_lock while walking the lists. For now, the client_mutex prevents those races. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:07 -04:00
Jeff Layton	b49e084d8c	nfsd: do filp_close in sc_free callback for lock stateids Releasing locks when we unhash the stateid instead of doing so only when the stateid is actually released will be problematic in later patches when we need to protect the unhashing with spinlocks. Move it into the sc_free operation instead. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:19:50 -04:00
Jeff Layton	4770d72201	nfsd4: use cl_lock to synchronize all stateid idr calls Currently, this is serialized by the client_mutex, which is slated for removal. Add finer-grained locking here. Also, do some cleanup around find_stateid to prepare for taking references. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Benny Halevy <bhalevy@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:19:25 -04:00
Trond Myklebust	11b9164ada	nfsd: Add a struct nfs4_file field to struct nfs4_stid All stateids are associated with a nfs4_file. Let's consolidate. Replace delegation->dl_file with the dl_stid.sc_file, and nfs4_ol_stateid->st_file with st_stid.sc_file. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 12:51:34 -04:00
Trond Myklebust	6011695da2	nfsd: Add reference counting to the lock and open stateids When we remove the client_mutex, we'll need to be able to ensure that these objects aren't destroyed while we're not holding locks. Add a ->free() callback to the struct nfs4_stid, so that we can release a reference to the stid without caring about the contents. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 12:43:53 -04:00
hujianyang	25601a3c97	UBIFS: Add log overlap assertions We use a circle area to record the log nodes in ubifs. This log area should not be overlapped. But after researching the code, I found some conditions may lead log head wraps log ltail. Although we've fixed the problems discovered, there may be some other issues still left. This patch adds assertions where lhead changes to next leb to make sure ltail is not wrapped. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-31 15:52:51 +03:00
Chao Yu	b3582c6892	f2fs: reduce competition among node page writes We do not need to block on ->node_write among different node page writers e.g. fsync/flush, unless we have a node page writer from write_checkpoint. So it's better use rw_semaphore instead of mutex type for ->node_write to promote performance. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 23:28:37 -07:00
Theodore Ts'o	86f0afd463	ext4: fix ext4_discard_allocated_blocks() if we can't allocate the pa struct If there is a failure while allocating the preallocation structure, a number of blocks can end up getting marked in the in-memory buddy bitmap, and then not getting released. This can result in the following corruption getting reported by the kernel: EXT4-fs error (device sda3): ext4_mb_generate_buddy:758: group 1126, 12793 clusters in bitmap, 12729 in gd In that case, we need to release the blocks using mb_free_blocks(). Tested: fs smoke test; also demonstrated that with injected errors, the file system is no longer getting corrupted Google-Bug-Id: 16657874 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2014-07-30 22:17:17 -04:00
Jaegeuk Kim	65b85ccce0	f2fs: fix coding style This patch fixes wrong coding style. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 17:25:54 -07:00
Dongho Sim	33be828ada	f2fs: remove redundant lines in allocate_data_block There are redundant lines in allocate_data_block. In this function, we call refresh_sit_entry with old seg and old curseg. After that, we call locate_dirty_segment with old curseg. But, the new address is always allocated from old curseg and we call locate_dirty_segment with old curseg in refresh_sit_entry. So, we do not need to call locate_dirty_segment with old curseg again. We've discussed like below: Jaegeuk said: "When considering SSR, we need to take care of the following scenario. - old segno : X - new address : Z - old curseg : Y This means, a new block is supposed to be written to Z from X. And Z is newly allocated in the same path from Y. In that case, we should trigger locate_dirty_segment for Y, since it was a current_segment and can be dirty owing to SSR. But that was not included in the dirty list." Changman said: "We already choosed old curseg(Y) and then we allocate new address(Z) from old curseg(Y). After that we call refresh_sit_entry(old address, new address). In the funcation, we call locate_dirty_segment with old seg and old curseg. So calling locate_dirty_segment after refresh_sit_entry again is redundant." Jaegeuk said: "Right. The new address is always allocated from old_curseg." Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Dongho Sim <dh.sim@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 15:38:59 -07:00
Jaegeuk Kim	24a9ee0fa3	f2fs: add tracepoint for f2fs_issue_flush This patch adds a tracepoint for f2fs_issue_flush. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 14:13:36 -07:00
Jaegeuk Kim	cf2271e781	f2fs: avoid retrying wrong recovery routine when error was occurred This patch eliminates the propagation of recovery errors to the next mount. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 14:13:35 -07:00
Jaegeuk Kim	61e0f2d0a5	f2fs: test before set/clear bits If the bit is already set, we don't need to reset it, and vice versa. Because we don't need to make the caches dirty for that. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 14:13:35 -07:00
Jaegeuk Kim	01229f5e1b	f2fs: fix wrong condition for unlikely This patch fixes the wrongly used unlikely condition. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 14:13:31 -07:00
Jaegeuk Kim	ea1aa12ca2	f2fs: enable in-place-update for fdatasync This patch enforces in-place-updates only when fdatasync is requested. If we adopt this in-place-updates for the fdatasync, we can skip to write the recovery information. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 14:13:23 -07:00
Jaegeuk Kim	6d99ba41a7	f2fs: skip unnecessary data writes during fsync This patch intends to improve the fsync performance by skipping remaining the recovery information, only when there is no data that we should recover. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 14:13:03 -07:00
David S. Miller	f139c74a8d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-30 13:25:49 -07:00
Jeff Layton	b3fbfe0e7a	nfsd: print status when nfsd4_open fails to open file it just created It's possible for nfsd to fail opening a file that it has just created. When that happens, we throw a WARN but it doesn't include any info about the error code. Print the status code to give us a bit more info. Our QA group hit some of these warnings under some very heavy stress testing. My suspicion is that they hit the file-max limit, but it's hard to know for sure. Go ahead and add a -ENFILE mapping to nfserr_serverfault to make the error more distinct (and correct). Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 23:08:38 -04:00
Eric W. Biederman	728dba3a39	namespaces: Use task_lock and not rcu to protect nsproxy The synchronous syncrhonize_rcu in switch_task_namespaces makes setns a sufficiently expensive system call that people have complained. Upon inspect nsproxy no longer needs rcu protection for remote reads. remote reads are rare. So optimize for same process reads and write by switching using rask_lock instead. This yields a simpler to understand lock, and a faster setns system call. In particular this fixes a performance regression observed by Rafael David Tinoco <rafael.tinoco@canonical.com>. This is effectively a revert of Pavel Emelyanov's commit `cf7b708c8d` Make access to task's nsproxy lighter from 2007. The race this originialy fixed no longer exists as do_notify_parent uses task_active_pid_ns(parent) instead of parent->nsproxy. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-07-29 18:08:50 -07:00
Christoph Hellwig	d5cf09bace	xfs: require 64-bit sector_t Trying to support tiny disks only and saving a bit memory might have made sense on an SGI O2 15 years ago, but is pretty pointless today. Remove the rarely tested codepath that uses various smaller in-memory types to reduce our test matrix and make the codebase a little bit smaller and less complicated. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-30 09:12:05 +10:00
Jeff Layton	650ecc8f8f	nfsd: remove dl_fh field from struct nfs4_delegation Now that the nfs4_file has a filehandle in it, we no longer need to keep a per-delegation copy of it. Switch to using the one in the nfs4_file instead. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:58 -04:00
Jeff Layton	f54fe962b8	nfsd: give block_delegation and delegation_blocked its own spinlock The state lock can be fairly heavily contended, and there's no reason that nfs4_file lookups and delegation_blocked should be mutually exclusive. Let's give the new block_delegation code its own spinlock. It does mean that we'll need to take a different lock in the delegation break code, but that's not generally as critical to performance. Cc: Neil Brown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:57 -04:00
Jeff Layton	0b26693c56	nfsd: clean up nfs4_set_delegation Move the alloc_init_deleg call into nfs4_set_delegation and change the function to return a pointer to the delegation or an IS_ERR return. This allows us to skip allocating a delegation if the file has already experienced a lease conflict. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:56 -04:00
Jeff Layton	4cf59221c7	nfsd: clean up arguments to nfs4_open_delegation No need to pass in a net pointer since we can derive that. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:55 -04:00
Jeff Layton	f9416e281e	nfsd: drop unused stp arg to alloc_init_deleg Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:54 -04:00
Trond Myklebust	02a3508dba	nfsd: Convert delegation counter to an atomic_long_t type We want to convert to an atomic type so that we don't need to lock across the call to alloc_init_deleg(). Then convert to a long type so that we match the size of 'max_delegations'. None of this is a problem today, but it will be once we remove client_mutex protection. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:54 -04:00
Jeff Layton	2d4a532d38	nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock Currently, both destroy_revoked_delegation and revoke_delegation manipulate the cl_revoked list without any locking aside from the client_mutex. Ensure that the clp->cl_lock is held when manipulating it, except for the list walking in destroy_client. At that point, the client should no longer be in use, and so it should be safe to walk the list without any locking. That also means that we don't need to do the list_splice_init there either. Also, the fact that revoke_delegation deletes dl_recall_lru list_head without any locking makes it difficult to know whether it's doing so safely in all cases. Move the list_del_init calls into the callers, and add a WARN_ON in the event that t's passed a delegation that has a non-empty list_head. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:53 -04:00
Jeff Layton	4269067696	nfsd: fully unhash delegations when revoking them Ensure that the delegations cannot be found by the laundromat etc once we add them to the various 'revoke' lists. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:52 -04:00
Trond Myklebust	f83388341b	nfsd: simplify stateid allocation and file handling Don't allow stateids to clear the open file pointer until they are being destroyed. In a later patches we'll want to rely on the fact that we have a valid file pointer when dealing with the stateid and this will save us from having to do a lot of NULL pointer checks before doing so. Also, move to allocating stateids with kzalloc and get rid of the explicit zeroing of fields. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 14:49:51 -04:00
David Howells	0ef1351523	AFS: Correctly assemble the client UUID Correctly assemble the client UUID by OR'ing in the flags rather than assigning them over the other components. Reported-by: Himangi Saraogi <himangi774@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-07-29 10:14:36 -07:00
Jaegeuk Kim	fff04f90c1	f2fs: add info of appended or updated data writes This patch introduces a inode number list in which represents inodes having appended data writes or updated data writes after last checkpoint. This will be used at fsync to determine whether the recovery information should be written or not. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 07:46:11 -07:00
Jaegeuk Kim	39efac41fb	f2fs: use radix_tree for ino management For better ino management, this patch replaces the data structure from list to radix tree. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 07:46:11 -07:00
Jaegeuk Kim	6451e041c8	f2fs: add infra for ino management This patch changes the naming of orphan-related data structures to use as inode numbers managed globally. Later, we can use this facility for managing any inode number lists. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 07:45:54 -07:00
Namjae Jeon	ee98fa3a8b	ext4: fix COLLAPSE RANGE test for bigalloc file systems Blocks in collapse range should be collapsed per cluster unit when bigalloc is enable. If bigalloc is not enable, EXT4_CLUSTER_SIZE will be same with EXT4_BLOCK_SIZE. With this bug fixed, patch enables COLLAPSE_RANGE for bigalloc, which fixes a large number of xfstest failures which use fsx. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-07-29 10:45:31 -04:00
Jaegeuk Kim	953e6cc6bc	f2fs: punch the core function for inode management This patch punches out the core functions to manage the inode numbers. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 07:40:25 -07:00
Jaegeuk Kim	0f7b2abd18	f2fs: add nobarrier mount option This patch adds a mount option, nobarrier, in f2fs. The assumption in here is that file system keeps the IO ordering, but doesn't care about cache flushes inside the storages. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 05:27:48 -07:00
David S. Miller	3fd0202a0d	Merge tag 'master-2014-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== pull request: wireless-next 2014-07-25 Please pull this batch of updates intended for the 3.17 stream! For the mac80211 bits, Johannes says: "We have a lot of TDLS patches, among them a fix that should make hwsim tests happy again. The rest, this time, is mostly small fixes." For the Bluetooth bits, Gustavo says: "Some more patches for 3.17. The most important change here is the move of the 6lowpan code to net/6lowpan. It has been agreed with Davem that this change will go through the bluetooth tree. The rest are mostly clean up and fixes." and, "Here follows some more patches for 3.17. These are mostly fixes to what we've sent to you before for next merge window." For the iwlwifi bits, Emmanuel says: "I have the usual amount of BT Coex stuff. Arik continues to work on TDLS and Ariej contributes a few things for HS2.0. I added a few more things to the firmware debugging infrastructure. Eran fixes a small bug - pretty normal content." And for the Atheros bits, Kalle says: "For ath6kl me and Jessica added support for ar6004 hw3.0, our latest version of ar6004. For ath10k Janusz added a printout so that it's easier to check what ath10k kconfig options are enabled. He also added a debugfs file to configure maximum amsdu and ampdu values. Also we had few fixes as usual." On top of that is the usual large batch of various driver updates -- brcmfmac, mwifiex, the TI drivers, and wil6210 all get some action. Rafał has also been very busy with b43 and related updates. Also, I pulled the wireless tree into this in order to resolve a merge conflict... P.S. The change to fs/compat_ioctl.c reflects a name change in a Bluetooth header file... ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-28 17:36:25 -07:00
Darrick J. Wong	40b163f1c4	ext4: check inline directory before converting Before converting an inline directory to a regular directory, check the directory entries to make sure they're not obviously broken. This helps us to avoid a BUG_ON if one of the dirents is trashed. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>	2014-07-28 13:06:26 -04:00
Artem Bityutskiy	6390e99177	Revert "UBIFS: add a log overlap assertion" This reverts commit `545f7fdf6d`. Hujianyang's testing revealed that the patch is bogus. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-28 19:15:19 +03:00
Yan, Zheng	06fee30f6a	ceph: fix append mode write generic_write_checks() may update 'pos', so we need to pass 'pos' to ceph_sync_write() and ceph_sync_direct_write(); Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-07-28 13:29:33 +04:00
Ilya Dryomov	7e8a295295	ceph: fix sizeof(struct tYpO *) typo struct ceph_xattr -> struct ceph_inode_xattr Reported-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-07-28 13:29:27 +04:00
Ilya Dryomov	1a295bd8c8	ceph: remove redundant memset(0) xattrs array of pointers is allocated with kcalloc() - no need to memset() it to 0 right after that. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-07-28 13:28:33 +04:00
Ingo Molnar	ca5bc6cd5d	Merge branch 'sched/urgent' into sched/core, to merge fixes before applying new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-07-28 10:03:00 +02:00
Dmitry Monakhov	6e2631463f	ext4: fix incorrect locking in move_extent_per_page If we have to copy data we must drop i_data_sem because of get_blocks() will be called inside mext_page_mkuptodate(), but later we must reacquire it again because we are about to change extent's tree Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2014-07-27 22:32:27 -04:00
Dmitry Monakhov	29faed1638	ext4: use correct depth value Inode's depth can be changed from here: ext4_ext_try_to_merge() ->ext4_ext_try_to_merge_up() We must use correct value. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2014-07-27 22:30:29 -04:00
Dmitry Monakhov	4b1f166071	ext4: add i_data_sem sanity check Each caller of ext4_ext_dirty must hold i_data_sem, The only exception is migration code, let's make it convenient. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2014-07-27 22:28:15 -04:00
Xiaoguang Wang	b27b1535ac	ext4: fix wrong size computation in ext4_mb_normalize_request() As the member fe_len defined in struct ext4_free_extent is expressed as number of clusters, the variable "size" computation is wrong, we need to first translate fe_len to block number, then to bytes. Signed-off-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com>	2014-07-27 22:26:36 -04:00
Linus Torvalds	fbf08efa04	Merge branch 'vfs-for-3.16' of git://git.infradead.org/users/hch/vfs Pull vfs fixes from Christoph Hellwig: "A vfsmount leak fix, and a compile warning fix" * 'vfs-for-3.16' of git://git.infradead.org/users/hch/vfs: fs: umount on symlink leaks mnt count direct-io: fix uninitialized warning in do_direct_IO()	2014-07-27 09:43:41 -07:00
Linus Torvalds	0246544fc9	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "These two pathes fix issues with the kernel-userspace protocol changes in v3.15" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: add FUSE_NO_OPEN_SUPPORT flag to INIT fuse: s_time_gran fix	2014-07-25 16:16:34 -07:00
Chao Yu	9d84795077	f2fs: fix to put root inode in error path of fill_super We should put root inode correctly in error path of fill_super, otherwise we may encounter a leak case of inode resource. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-25 08:19:57 -07:00
Chao Yu	dbf20cb259	f2fs: avoid use invalid mapping of node_inode when evict meta inode Andrey Tsyvarev reported: "Using memory error detector reveals the following use-after-free error in 3.15.0: AddressSanitizer: heap-use-after-free in f2fs_evict_inode Read of size 8 by thread T22279: [<ffffffffa02d8702>] f2fs_evict_inode+0x102/0x2e0 [f2fs] [<ffffffff812359af>] evict+0x15f/0x290 [< inlined >] iput+0x196/0x280 iput_final [<ffffffff812369a6>] iput+0x196/0x280 [<ffffffffa02dc416>] f2fs_put_super+0xd6/0x170 [f2fs] [<ffffffff81210095>] generic_shutdown_super+0xc5/0x1b0 [<ffffffff812105fd>] kill_block_super+0x4d/0xb0 [<ffffffff81210a86>] deactivate_locked_super+0x66/0x80 [<ffffffff81211c98>] deactivate_super+0x68/0x80 [<ffffffff8123cc88>] mntput_no_expire+0x198/0x250 [< inlined >] SyS_umount+0xe9/0x1a0 SYSC_umount [<ffffffff8123f1c9>] SyS_umount+0xe9/0x1a0 [<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b Freed by thread T3: [<ffffffffa02dc337>] f2fs_i_callback+0x27/0x30 [f2fs] [< inlined >] rcu_process_callbacks+0x2d6/0x930 __rcu_reclaim [< inlined >] rcu_process_callbacks+0x2d6/0x930 rcu_do_batch [< inlined >] rcu_process_callbacks+0x2d6/0x930 invoke_rcu_callbacks [< inlined >] rcu_process_callbacks+0x2d6/0x930 __rcu_process_callbacks [<ffffffff810fd266>] rcu_process_callbacks+0x2d6/0x930 [<ffffffff8107cce2>] __do_softirq+0x142/0x380 [<ffffffff8107cf50>] run_ksoftirqd+0x30/0x50 [<ffffffff810b2a87>] smpboot_thread_fn+0x197/0x280 [<ffffffff810a8238>] kthread+0x148/0x160 [<ffffffff81cc8d4c>] ret_from_fork+0x7c/0xb0 Allocated by thread T22276: [<ffffffffa02dc7dd>] f2fs_alloc_inode+0x2d/0x170 [f2fs] [<ffffffff81235e2a>] iget_locked+0x10a/0x230 [<ffffffffa02d7495>] f2fs_iget+0x35/0xa80 [f2fs] [<ffffffffa02e2393>] f2fs_fill_super+0xb53/0xff0 [f2fs] [<ffffffff81211bce>] mount_bdev+0x1de/0x240 [<ffffffffa02dbce0>] f2fs_mount+0x10/0x20 [f2fs] [<ffffffff81212a85>] mount_fs+0x55/0x220 [<ffffffff8123c026>] vfs_kern_mount+0x66/0x200 [< inlined >] do_mount+0x2b4/0x1120 do_new_mount [<ffffffff812400d4>] do_mount+0x2b4/0x1120 [< inlined >] SyS_mount+0xb2/0x110 SYSC_mount [<ffffffff812414a2>] SyS_mount+0xb2/0x110 [<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b The buggy address ffff8800587866c8 is located 48 bytes inside of 680-byte region [ffff880058786698, ffff880058786940) Memory state around the buggy address: ffff880058786100: ffffffff ffffffff ffffffff ffffffff ffff880058786200: ffffffff ffffffff ffffffrr rrrrrrrr ffff880058786300: rrrrrrrr rrffffff ffffffff ffffffff ffff880058786400: ffffffff ffffffff ffffffff ffffffff ffff880058786500: ffffffff ffffffff ffffffff fffffffr >ffff880058786600: rrrrrrrr rrrrrrrr rrrfffff ffffffff ^ ffff880058786700: ffffffff ffffffff ffffffff ffffffff ffff880058786800: ffffffff ffffffff ffffffff ffffffff ffff880058786900: ffffffff rrrrrrrr rrrrrrrr rrrr.... ffff880058786a00: ........ ........ ........ ........ ffff880058786b00: ........ ........ ........ ........ Legend: f - 8 freed bytes r - 8 redzone bytes . - 8 allocated bytes x=1..7 - x allocated bytes + (8-x) redzone bytes Investigation shows, that f2fs_evict_inode, when called for 'meta_inode', uses invalidate_mapping_pages() for 'node_inode'. But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via iput(). It seems that in common usage scenario this use-after-free is benign, because 'node_inode' remains partially valid data even after kmem_cache_free(). But things may change if, while 'meta_inode' is evicted in one f2fs filesystem, another (mounted) f2fs filesystem requests inode from cache, and formely 'node_inode' of the first filesystem is returned." Nids for both meta_inode and node_inode are reservation, so it's not necessary for us to invalidate pages which will never be allocated. To fix this issue, let's skipping needlessly invalidating pages for {meta,node}_inode in f2fs_evict_inode. Reported-by: Andrey Tsyvarev <tsyvarev@ispras.ru> Tested-by: Andrey Tsyvarev <tsyvarev@ispras.ru> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-25 08:19:55 -07:00
Chao Yu	32f9bc25cb	f2fs: support ->rename2() Now new interface ->rename2() is added to VFS, here are related description: https://lkml.org/lkml/2014/2/7/873 https://lkml.org/lkml/2014/2/7/758 This patch adds function f2fs_rename2() to support ->rename2() including handling both RENAME_EXCHANGE and RENAME_NOREPLACE flag. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-25 08:14:08 -07:00
Huang Ying	79e35dc3c2	f2fs: add f2fs_balance_fs for direct IO Otherwise, if a large amount of direct IO writes were done, the segment allocation may be failed because no enough segments are gced. Changes: v2: add f2fs_balance_fs into __get_data_block instead of f2fs_direct_IO. Signed-off-by: Huang, Ying <ying.huang@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-25 08:13:58 -07:00
Gu Zheng	00fefb9cf2	aio: use iovec array rather than the single one Previously, we only offer a single iovec to handle all the read/write cases, so the PREADV/PWRITEV request always need to alloc more iovec buffer when copying user vectors. If we use a tmp iovec array rather than the single one, some small PREADV/PWRITEV workloads(vector size small than the tmp buffer) will not need to alloc more iovec buffer when copying user vectors. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>	2014-07-24 10:59:40 -04:00
Gu Zheng	2be4e7deec	aio: fix some comments The function comments of aio_run_iocb and aio_read_events are out of date, so fix them here. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>	2014-07-24 10:59:40 -04:00
Gu Zheng	8dc4379e17	aio: use the macro rather than the inline magic number Replace the inline magic number with the ready-made macro(AIO_RING_MAGIC), just clean up. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>	2014-07-24 10:59:40 -04:00
Gu Zheng	b53f1f82fb	aio: remove the needless registration of ring file's private_data Remove the registration of ring file's private_data, we do not use it. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>	2014-07-24 10:59:40 -04:00
Eric Paris	7d8b6c6375	CAPABILITIES: remove undefined caps from all processes This is effectively a revert of `7b9a7ec565` plus fixing it a different way... We found, when trying to run an application from an application which had dropped privs that the kernel does security checks on undefined capability bits. This was ESPECIALLY difficult to debug as those undefined bits are hidden from /proc/$PID/status. Consider a root application which drops all capabilities from ALL 4 capability sets. We assume, since the application is going to set eff/perm/inh from an array that it will clear not only the defined caps less than CAP_LAST_CAP, but also the higher 28ish bits which are undefined future capabilities. The BSET gets cleared differently. Instead it is cleared one bit at a time. The problem here is that in security/commoncap.c::cap_task_prctl() we actually check the validity of a capability being read. So any task which attempts to 'read all things set in bset' followed by 'unset all things set in bset' will not even attempt to unset the undefined bits higher than CAP_LAST_CAP. So the 'parent' will look something like: CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: ffffffc000000000 All of this 'should' be fine. Given that these are undefined bits that aren't supposed to have anything to do with permissions. But they do... So lets now consider a task which cleared the eff/perm/inh completely and cleared all of the valid caps in the bset (but not the invalid caps it couldn't read out of the kernel). We know that this is exactly what the libcap-ng library does and what the go capabilities library does. They both leave you in that above situation if you try to clear all of you capapabilities from all 4 sets. If that root task calls execve() the child task will pick up all caps not blocked by the bset. The bset however does not block bits higher than CAP_LAST_CAP. So now the child task has bits in eff which are not in the parent. These are 'meaningless' undefined bits, but still bits which the parent doesn't have. The problem is now in cred_cap_issubset() (or any operation which does a subset test) as the child, while a subset for valid cap bits, is not a subset for invalid cap bits! So now we set durring commit creds that the child is not dumpable. Given it is 'more priv' than its parent. It also means the parent cannot ptrace the child and other stupidity. The solution here: 1) stop hiding capability bits in status This makes debugging easier! 2) stop giving any task undefined capability bits. it's simple, it you don't put those invalid bits in CAP_FULL_SET you won't get them in init and you won't get them in any other task either. This fixes the cap_issubset() tests and resulting fallout (which made the init task in a docker container untraceable among other things) 3) mask out undefined bits when sys_capset() is called as it might use ~0, ~0 to denote 'all capabilities' for backward/forward compatibility. This lets 'capsh --caps="all=eip" -- -c /bin/bash' run. 4) mask out undefined bit when we read a file capability off of disk as again likely all bits are set in the xattr for forward/backward compatibility. This lets 'setcap all+pe /bin/bash; /bin/bash' run Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Andrew G. Morgan <morgan@kernel.org> Cc: Serge E. Hallyn <serge.hallyn@canonical.com> Cc: Kees Cook <keescook@chromium.org> Cc: Steve Grubb <sgrubb@redhat.com> Cc: Dan Walsh <dwalsh@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: James Morris <james.l.morris@oracle.com>	2014-07-24 21:53:47 +10:00
James Morris	4ca332e11d	Merge tag 'keys-next-20140722' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into next	2014-07-24 21:36:19 +10:00
Jie Liu	74dc93a908	xfs: fix uflags detection at xfs_fs_rm_xquota We are intended to check up uflags against FS_PROJ_QUOTA rather than FS_USER_UQUOTA once more, it looks to me like a typo, but might cause the project quota metadata space can not be removed. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 21:27:17 +10:00
Jie Liu	7b409a7d6f	xfs: remove XFS_IS_OQUOTA_ON macros Remove the XFS_IS_OQUOTA_ON macros as it is obsoleted. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 21:27:16 +10:00
Eric Sandeen	54aa61f82d	xfs: tidy up xfs_set_inode32 xfs_set_inode32() caught my eye because it had weird spacing around the "-1's". In cleaning that up, I realized that the assignment in the declaration of "ino" is never used; it's rewritten before it gets read. Drop the ino initializer from its declaration since it's not used, and move the agino initialization into the body of the function, mostly so that we can have pretty whitespace and not exceed 80 columns. :) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 20:53:10 +10:00
Eric Sandeen	9de67c3ba9	xfs: allow inode allocations in post-growfs disk space Today, if we perform an xfs_growfs which adds allocation groups, mp->m_maxagi is not properly updated when the growfs is complete. Therefore inodes will continue to be allocated only in the AGs which existed prior to the growfs, and the new space won't be utilized. This is because of this path in xfs_growfs_data_private(): xfs_growfs_data_private xfs_initialize_perag(mp, nagcount, &nagimax); if (mp->m_flags & XFS_MOUNT_32BITINODES) index = xfs_set_inode32(mp); else index = xfs_set_inode64(mp); if (maxagi) maxagi = index; where xfs_set_inode iterates over the (old) agcount in mp->m_sb.sb_agblocks, which has not yet been updated in the growfs path. So "index" will be returned based on the old agcount, not the new one, and new AGs are not available for inode allocation. Fix this by explicitly passing the proper AG count (which xfs_initialize_perag() already has) down another level, so that xfs_set_inode* can make the proper decision about acceptable AGs for inode allocation in the potentially newly-added AGs. This has been broken since 3.7, when these two xfs_set_inode* functions were added in commit `2d2194f`. Prior to that, we looped over "agcount" not sb_agblocks in these calculations. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 20:51:54 +10:00
Jie Liu	eb866bbf09	xfs: mark xfs_qm_quotacheck as static xfs_qm_quotacheck() is not used outside of xfs_qm.c. Mark it static and move it around in the file to avoid a forward declaration. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 20:49:57 +10:00
Mark Tinguely	5c18717ea2	xfs: fix cil push sequence after log recovery When the CIL checkpoint is fully written to the log, the LSN of the checkpoint commit record is written into the CIL context structure. This allows log force waiters to correctly detect when the checkpoint they are waiting on have been fully written into the log buffers. However, the initial context after mount is initialised with a non-zero commit LSN, so appears to waiters as though it is complete even though it may not have even been pushed, let alone written to the log buffers. Hence a log force immediately after a filesystem is mounted may not behave correctly, nor does commit record ordering if multiple CIL pushes interleave immediately after mount. To fix this, make sure the initial context commit LSN is not touched until the first checkpointis actually pushed. [dchinner: rewrite commit message] Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 20:49:40 +10:00
Vasily Averin	295dc39d94	fs: umount on symlink leaks mnt count Currently umount on symlink blocks following umount: /vz is separate mount # ls /vz/ -al \| grep test drwxr-xr-x. 2 root root 4096 Jul 19 01:14 testdir lrwxrwxrwx. 1 root root 11 Jul 19 01:16 testlink -> /vz/testdir # umount -l /vz/testlink umount: /vz/testlink: not mounted (expected) # lsof /vz # umount /vz umount: /vz: device is busy. (unexpected) In this case mountpoint_last() gets an extra refcount on path->mnt Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Ian Kent <raven@themaw.net> Acked-by: Jeff Layton <jlayton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-07-24 06:18:12 -04:00
Boaz Harrosh	6fcc5420bf	direct-io: fix uninitialized warning in do_direct_IO() The following warnings: fs/direct-io.c: In function ‘__blockdev_direct_IO’: fs/direct-io.c:1011:12: warning: ‘to’ may be used uninitialized in this function [-Wmaybe-uninitialized] fs/direct-io.c:913:16: note: ‘to’ was declared here fs/direct-io.c:1011:12: warning: ‘from’ may be used uninitialized in this function [-Wmaybe-uninitialized] fs/direct-io.c:913:10: note: ‘from’ was declared here are false positive because dio_get_page() either fails, or sets both 'from' and 'to'. Paul Bolle said ... Maybe it's better to move initializing "to" and "from" out of dio_get_page(). That _might_ make it easier for both the the reader and the compiler to understand what's going on. Something like this: Christoph Hellwig said ... The fix of moving the code definitively looks nicer, while I think uninitialized_var is horrible wart that won't get anywhere near my code. Boaz Harrosh: I agree with Christoph and Paul Signed-off-by: Boaz Harrosh <boaz@plexistor.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-07-24 06:17:07 -04:00
Brian Foster	f074051ff5	xfs: squash prealloc while over quota free space as well From: Brian Foster <bfoster@redhat.com> Commit `4d559a3b` introduced heavy prealloc. squashing to catch the case of requesting too large a prealloc on smaller filesystems, leading to repeated flush and retry cycles that occur on ENOSPC. Now that we issue eofblocks scans on EDQUOT/ENOSPC, squash the prealloc against the minimum available free space across all applicable quotas as well to avoid a similar problem of repeated eofblocks scans. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 19:56:08 +10:00
Brian Foster	dc06f398f0	xfs: run an eofblocks scan on ENOSPC/EDQUOT From: Brian Foster <bfoster@redhat.com> Speculative preallocation and and the associated throttling metrics assume we're working with large files on large filesystems. Users have reported inefficiencies in these mechanisms when we happen to be dealing with large files on smaller filesystems. This can occur because while prealloc throttling is aggressive under low free space conditions, it is not active until we reach 5% free space or less. For example, a 40GB filesystem has enough space for several files large enough to have multi-GB preallocations at any given time. If those files are slow growing, they might reserve preallocation for long periods of time as well as avoid the background scanner due to frequent modification. If a new file is written under these conditions, said file has no access to this already reserved space and premature ENOSPC is imminent. To handle this scenario, modify the buffered write ENOSPC handling and retry sequence to invoke an eofblocks scan. In the smaller filesystem scenario, the eofblocks scan resets the usage of preallocation such that when the 5% free space threshold is met, throttling effectively takes over to provide fair and efficient preallocation until legitimate ENOSPC. The eofblocks scan is selective based on the nature of the failure. For example, an EDQUOT failure in a particular quota will use a filtered scan for that quota. Because we don't know which quota might have caused an allocation failure at any given time, we include each applicable quota determined to be under low free space conditions in the scan. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 19:49:28 +10:00
Brian Foster	f452639792	xfs: support a union-based filter for eofblocks scans From: Brian Foster <bfoster@redhat.com> The eofblocks scan inode filter uses intersection logic by default. E.g., specifying both user and group quota ids filters out inodes that are not covered by both the specified user and group quotas. This is suitable for behavior exposed to userspace. Scans that are initiated from within the kernel might require more broad semantics, such as scanning all inodes under each quota associated with an inode to alleviate low free space conditions in each. Create the XFS_EOF_FLAGS_UNION flag to support a conditional union-based filtering algorithm for eofblocks scans. This flag is intentionally left out of the valid mask as it is not supported for scans initiated from userspace. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 19:44:28 +10:00
Brian Foster	5400da7dc0	xfs: add scan owner field to xfs_eofblocks From: Brian Foster <bfoster@redhat.com> The scan owner field represents an optional inode number that is responsible for the current scan. The purpose is to identify that an inode is under iolock and as such, the iolock shouldn't be attempted when trimming eofblocks. This is an internal only field. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 19:40:22 +10:00
Jie Liu	f3d1e58743	xfs: introduce xfs_bulkstat_grab_ichunk From: Jie Liu <jeff.liu@oracle.com> Introduce xfs_bulkstat_grab_ichunk() to look up an inode chunk in where the given inode resides, then grab the record. Update the data for the pointed-to record if the inode was not the last in the chunk and there are some left allocated, return the grabbed inode count on success. Refactor xfs_bulkstat() with it. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 18:42:21 +10:00
Jie Liu	4b8fdfecd8	xfs: introduce xfs_bulkstat_ichunk_ra From: Jie Liu <jeff.liu@oracle.com> Introduce xfs_bulkstat_ichunk_ra() to loop over all clusters in the next inode chunk, then performs readahead if there are any allocated inodes in that cluster. Refactor xfs_bulkstat() with it. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 18:41:18 +10:00
Jie Liu	d4c2734875	xfs: fix error handling at xfs_bulkstat From: Jie Liu <jeff.liu@oracle.com> We should not ignore the btree operation errors at xfs_bulkstat() but to propagate them if any. This patch fix two places in this function and the remaining things will be fixed with code refactoring thereafter. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 18:40:43 +10:00
Jie Liu	296dfd7fdb	xfs: remove redundant user buffer count checks at xfs_bulkstat From: Jie Liu <jeff.liu@oracle.com> Remove the redundant user buffer and count checks as it has already been validated at xfs_ioc_bulkstat(). Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 18:40:26 +10:00
Himangi Saraogi	08a0f24e4c	ceph: replace comma with a semicolon Replace a comma between expression statements by a semicolon. This changes the semantics of the code, but given the current indentation appears to be what is intended. A simplified version of the Coccinelle semantic patch that performs this transformation is as follows: // <smpl> @r@ expression e1,e2; @@ e1 -, +; e2; // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>	2014-07-24 12:04:46 +04:00
Jie Liu	c7cb51dcb0	xfs: fix error handling at xfs_inumbers From: Jie Liu <jeff.liu@oracle.com> To fetch the file system number tables, we currently just ignore the errors and proceed to loop over the next AG or bump agino to the next chunk in case of btree operations failed, that is not properly because those errors might hint us potential file system problems. This patch rework xfs_inumbers() to handle the btree operation errors as well as the loop conditions. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 12:18:47 +10:00
Jie Liu	549fa00679	xfs: consolidate xfs_inumbers From: Jie Liu <jeff.liu@oracle.com> Consolidate xfs_inumbers() to make the formatter function return correct error and make the source code looks a bit neat. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 12:11:47 +10:00
Christoph Hellwig	d716f8eedb	xfs: remove xfs_bulkstat_single From: Christoph Hellwig <hch@lst.de> xfs_bukstat_one doesn't have any failure case that would go away when called through xfs_bulkstat, so remove the fallback and the now unessecary xfs_bulkstat_single function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 12:07:15 +10:00
Jie Liu	8fe657760d	xfs: remove redundant stat assignment in xfs_bulkstat_one_int From: Jie Liu <jeff.liu@oracle.com> Remove the redundant BULKSTAT_RV_NOTHING assignment in case of call xfs_iget() failed at xfs_bulkstat_one_int(). Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-07-24 11:33:28 +10:00
Linus Torvalds	82e13c71bc	Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linux Pull nfsd bugfix from Bruce Fields: "Another regression from the xdr encoding rewrite" * 'for-3.16' of git://linux-nfs.org/~bfields/linux: NFSD: Fix crash encoding lock reply on 32-bit	2014-07-23 17:55:11 -07:00
Hugh Dickins	4e66d445d0	simple_xattr: permit 0-size extended attributes If a filesystem uses simple_xattr to support user extended attributes, LTP setxattr01 and xfstests generic/062 fail with "Cannot allocate memory": simple_xattr_alloc()'s wrap-around test mistakenly excludes values of zero size. Fix that off-by-one (but apparently no filesystem needs them yet). Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-07-23 15:10:55 -07:00
Silesh C V	aed8adb768	coredump: fix the setting of PF_DUMPCORE Commit `079148b919` ("coredump: factor out the setting of PF_DUMPCORE") cleaned up the setting of PF_DUMPCORE by removing it from all the linux_binfmt->core_dump() and moving it to zap_threads().But this ended up clearing all the previously set flags. This causes issues during core generation when tsk->flags is checked again (eg. for PF_USED_MATH to dump floating point registers). Fix this. Signed-off-by: Silesh C V <svellattu@mvista.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Mandeep Singh Baines <msb@chromium.org> Cc: <stable@vger.kernel.org> [3.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-07-23 15:10:54 -07:00
Thomas Gleixner	5eaaed4fe2	fs: lockd: Use ktime_get_ns() Replace the ever recurring: ts = ktime_get_ts(); ns = timespec_to_ns(&ts); with ns = ktime_get_ns(); Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-07-23 15:01:44 -07:00
Jeff Layton	f9c00c3ab4	nfsd: Do not let nfs4_file pin the struct inode Remove the fi_inode field in struct nfs4_file in order to remove the possibility of struct nfs4_file pinning the inode when it does not have any open state. The only place we still need to get to an inode is in check_for_locks, so change it to use find_any_file and use the inode from any that it finds. If it doesn't find one, then just assume there aren't any. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-23 16:35:24 -04:00
Trond Myklebust	b07c54a4a3	nfsd: nfs4_check_fh - make it actually check the filehandle ...instead of just checking the inode that corresponds to it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-23 16:35:24 -04:00
Trond Myklebust	ca94321783	nfsd: Use the filehandle to look up the struct nfs4_file instead of inode This makes more sense anyway since an inode pointer value can change even when the filehandle doesn't. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-23 16:35:24 -04:00
Trond Myklebust	e2cf80d73f	nfsd: Store the filehandle with the struct nfs4_file For use when we may not have a struct inode. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-23 16:35:23 -04:00
Himangi Saraogi	fc8e5a644c	nfsd4: convert comma to semicolon Replace a comma between expression statements by a semicolon. This changes the semantics of the code, but given the current indentation appears to be what is intended. A simplified version of the Coccinelle semantic patch that performs this transformation is as follows: // <smpl> @r@ expression e1,e2; @@ e1 -, +; e2; // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-23 14:20:49 -04:00
Jeff Layton	2f6ce8e73c	nfsd: ensure that st_access_bmap and st_deny_bmap are initialized to 0 Open stateids must be initialized with the st_access_bmap and st_deny_bmap set to 0, so that nfs4_get_vfs_file can properly record their state in old_access_bmap and old_deny_bmap. This bug was introduced in commit `baeb4ff0e5` (nfsd: make deny mode enforcement more efficient and close races in it) and was causing the refcounts to end up incorrect when nfs4_get_vfs_file returned an error after bumping the refcounts. This made it impossible to unmount the underlying filesystem after running pynfs tests that involve deny modes. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-23 14:20:47 -04:00
Thomas Gleixner	57e0be041d	sched: Make task->real_start_time nanoseconds based Simplify the only user of this data by removing the timespec conversion. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-07-23 10:18:05 -07:00
Thomas Gleixner	53cc7bad37	timerfd: Use ktime_mono_to_real() We have a few other use cases of ktime_get_monotonic_offset() which can be optimized with ktime_mono_to_real(). The timerfd code uses the offset only for comparison, so we can use ktime_mono_to_real(0) for this as well. Funny enough text size shrinks with that on ARM and x8664 !? Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-07-23 10:18:02 -07:00
Kinglong Mee	f98bac5a30	NFSD: Fix crash encoding lock reply on 32-bit Commit `8c7424cff6` "nfsd4: don't try to encode conflicting owner if low on space" forgot to free conf->data in nfsd4_encode_lockt and before sign conf->data to NULL in nfsd4_encode_lock_denied, causing a leak. Worse, kfree() can be called on an uninitialized pointer in the case of a succesful lock (or one that fails for a reason other than a conflict). (Note that lock->lk_denied.ld_owner.data appears it should be zero here, until you notice that it's one arm of a union the other arm of which is written to in the succesful case by the memcpy(&lock->lk_resp_stateid, &lock_stp->st_stid.sc_stateid, sizeof(stateid_t)); in nfsd4_lock(). In the 32-bit case this overwrites ld_owner.data.) Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Fixes: `8c7424cff6` ""nfsd4: don't try to encode conflicting owner if low on space" Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-23 10:31:56 -04:00
David Howells	633706a2ee	Merge branch 'keys-fixes' into keys-next Signed-off-by: David Howells <dhowells@redhat.com>	2014-07-22 21:55:45 +01:00
David Howells	f9167789df	KEYS: user: Use key preparsing Make use of key preparsing in user-defined and logon keys so that quota size determination can take place prior to keyring locking when a key is being added. Also the idmapper key types need to change to match as they use the user-defined key type routines. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jlayton@primarydata.com>	2014-07-22 21:46:17 +01:00
Jeff Layton	d55a166c96	nfsd: bump dl_time when unhashing delegation There's a potential race between a lease break and DELEGRETURN call. Suppose a lease break comes in and queues the workqueue job for a delegation, but it doesn't run just yet. Then, a DELEGRETURN comes in finds the delegation and calls destroy_delegation on it to unhash it and put its primary reference. Next, the workqueue job runs and queues the delegation back onto the del_recall_lru list, issues the CB_RECALL and puts the final reference. With that, the final reference to the delegation is put, but it's still on the LRU list. When we go to unhash a delegation, it's because we intend to get rid of it soon afterward, so we don't want lease breaks to mess with it once that occurs. Fix this by bumping the dl_time whenever we unhash a delegation, to ensure that lease breaks don't monkey with it. I believe this is a regression due to commit `02e1215f9f` (nfsd: Avoid taking state_lock while holding inode lock in nfsd_break_one_deleg). Prior to that, the state_lock was held in the lm_break callback itself, and that would have prevented this race. Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-22 15:34:47 -04:00
Andrew Gallagher	d7afaec0b5	fuse: add FUSE_NO_OPEN_SUPPORT flag to INIT Here some additional changes to set a capability flag so that clients can detect when it's appropriate to return -ENOSYS from open. This amends the following commit introduced in 3.14: `7678ac5061` fuse: support clients that don't implement 'open' However we can only add the flag to 3.15 and later since there was no protocol version update in 3.14. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: <stable@vger.kernel.org> # v3.15+	2014-07-22 16:37:43 +02:00
Miklos Szeredi	a800bad366	fuse: s_time_gran fix Default s_time_gran is 1, don't overwrite that if userspace didn't explicitly specify one. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: <stable@vger.kernel.org> # v3.15+	2014-07-22 16:37:42 +02:00
Benjamin LaHaise	be6fb451a2	aio: remove no longer needed preempt_disable() Based on feedback from Jens Axboe on `263782c1c9`, clean up get/put_reqs_available() to remove the no longer needed preempt_disable() and preempt_enable() pair. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: Jens Axboe <axboe@kernel.dk>	2014-07-22 09:56:56 -04:00
Trond Myklebust	72c0b0fb9f	nfsd: Move the delegation reference counter into the struct nfs4_stid We will want to add reference counting to the lock stateid and open stateids too in later patches. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-21 17:03:00 -04:00
Jeff Layton	417c6629b2	nfsd: fix race that grants unrecallable delegation If nfs4_setlease succesfully acquires a new delegation, then another task breaks the delegation before we reach hash_delegation_locked, then the breaking task will see an empty fi_delegations list and do nothing. The client will receive an open reply incorrectly granting a delegation and will never receive a recall. Move more of the delegation fields to be protected by the fi_lock. It's more granular than the state_lock and in later patches we'll want to be able to rely on it in addition to the state_lock. Attempt to acquire a delegation. If that succeeds, take the spinlocks and then check to see if the file has had a conflict show up since then. If it has, then we assume that the lease is no longer valid and that we shouldn't hand out a delegation. There's also one more potential (but very unlikely) problem. If the lease is broken before the delegation is hashed, then it could leak. In the event that the fi_delegations list is empty, reset the fl_break_time to jiffies so that it's cleaned up ASAP by the normal lease handling code. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-21 16:31:17 -04:00
Greg Kroah-Hartman	90125edbc4	Merge 3.16-rc6 into driver-core-next We want the platform changes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-07-21 10:07:25 -07:00
J. Bruce Fields	57a3714421	nfsd4: CREATE_SESSION should update backchannel immediately nfsd4_probe_callback kicks off some work that will eventually run nfsd4_process_cb_update and update the session flags. In theory we could process a following SEQUENCE call before that update happens resulting in flags that don't accurately represent, for example, the lack of a backchannel. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-21 12:30:50 -04:00
Brian Norris	d0d5864676	Linux 3.16-rc6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJTzJFGAAoJEHm+PkMAQRiGNzQH/087gQch5K+A2HKvPzjUXq57 G82DJHLONMMq8+NY3Vqhp8g2V8zRbXGJEvMJMsyuscO37Vo7ADcrYo8lqY9w5bIl h+Zarhkqz0rqRs2SfMMIVzdd2W7MzL+lqj3GplGPxHztw0+qk7PRKILx6eRppGaH JaD4NfkD5+1vfve/2d1ze9D5pCiw6PFNzjesKZxScQhNhIyLdRamfSTY4r9XeURo CxpwjphEYfvAcgc39mwzEHPHyKSqULu0By6R8FXQpJ9QjVtzcGEiF+cPqGncpZOR 5ZSyU5e1CpBl9w8o6Lm9ewXmaCSnBU/VFrOwWvZrXfokZedXBOz7KdShU93XFjU= =0VJM -----END PGP SIGNATURE----- Merge tag 'v3.16-rc6' into MTD development branch Linux 3.16-rc6	2014-07-21 00:01:16 -07:00
Linus Torvalds	da83fc6e0f	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "We have two more fixes in my for-linus branch. I was hoping to also include a fix for a btrfs deadlock with compression enabled, but we're still nailing that one down" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: test for valid bdev before kobj removal in btrfs_rm_device Btrfs: fix abnormal long waiting in fsync	2014-07-20 20:21:05 -07:00
Linus Torvalds	90d51d5606	NFS client fixes for Linux 3.16 Highlights include; - Stable fix for an NFSv3 posix ACL regression - Multiple fixes for regressions to the NFS generic read/write code - Fix page splitting bugs that come into play when a small rsize/wsize read/write needs to be sent again (due to error conditions or page redirty). - Fix nfs_wb_page_cancel, which is called by the "invalidatepage" method - Fix 2 compile warnings about unused variables. - Fix a performance issue affecting unstable writes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTys5JAAoJEGcL54qWCgDygakP/1JOfQZzyHNRml5aDVRLtfp/ QQpn8ne3jjEov+0BhzSxqkpHP+8fCcF0UgD5asEnbM83ruoB8EUHErXdq7kRkAYf COpDZAww4sPXUszG/xGqIED503DKbf69Ds5pQMh8g71hfQpw04UmMQYbnRNBP2bI Z5eu0Xiwaf3MVRaxHMXVy9sl7in/cQBvrXnKLTIYOLA0U5bdCI9JWT5+qbOUCC/3 YuYm5EjM+OMyhvEWyVDXFp9kmw0vtBS8FwAXYKBjwfJLNl8dGuERWKlFDPNOxdpZ QrOBhUH73d2tgvTHzUg/RRtpA6mKOfznU3SQK3muP188/2/sbb4y/RChwv+Ignat YqWFgbDTz1idKvjj+Vzcd7eL3HO9Kono/YkAG8i5xmOlSeenzDra1lDAbuB+zNzd oLVY1AJp8+13c74gaCDurxJ7fq6Fth97eqxdo8fdQH8Mn6m6qRm2V57rOo59eFQS bX6EN4ja8ashrKprCvlhXDGJmQKZWJMEGoTXJCj5w+HBklONV6jdbyzGFKT+Zmhg IP+USsaLJDswZNpdWq8Zb9RthfFAURKEQEemwV5eLQNtTa+xQ22ZOF2ycrbBqsED etCCm8gyNukl2RCAxGfLrtpBytlcNMmcRKGktKqO1SAAua3IP33wGD7zql8rwGCZ m9XMXUkVKiHoErl3jZ4T =qklz -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.16-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client fixes from Trond Myklebust: "Apologies for the relative lateness of this pull request, however the commits fix some issues with the NFS read/write code updates in 3.16-rc1 that can cause serious Oopsing when using small r/wsize. The delay was mainly due to extra testing to make sure that the fixes behave correctly. Highlights include; - Stable fix for an NFSv3 posix ACL regression - Multiple fixes for regressions to the NFS generic read/write code: - Fix page splitting bugs that come into play when a small rsize/wsize read/write needs to be sent again (due to error conditions or page redirty) - Fix nfs_wb_page_cancel, which is called by the "invalidatepage" method - Fix 2 compile warnings about unused variables - Fix a performance issue affecting unstable writes" * tag 'nfs-for-3.16-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: Don't reset pg_moreio in __nfs_pageio_add_request NFS: Remove 2 unused variables nfs: handle multiple reqs in nfs_wb_page_cancel nfs: handle multiple reqs in nfs_page_async_flush nfs: change find_request to find_head_request nfs: nfs_page should take a ref on the head req nfs: mark nfs_page reqs with flag for extra ref nfs: only show Posix ACLs in listxattr if actually present	2014-07-20 19:55:44 -07:00
Yan, Zheng	d0d0db2268	ceph: check zero length in ceph_sync_read() Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-07-21 10:17:05 +08:00
Eric Sandeen	0bfaa9c5cb	btrfs: test for valid bdev before kobj removal in btrfs_rm_device commit `99994cd` btrfs: dev delete should remove sysfs entry added a btrfs_kobj_rm_device, which dereferences device->bdev... right after we check whether device->bdev might be NULL. I don't honestly know if it's possible to have a NULL device->bdev here, but assuming that it is (given the test), we need to move the kobject removal to be under that test. (Coverity spotted this) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-07-19 11:49:44 -07:00
Liu Bo	98ce2deda2	Btrfs: fix abnormal long waiting in fsync xfstests generic/127 detected this problem. With commit `7fc34a62ca`, now fsync will only flush data within the passed range. This is the cause of the above problem, -- btrfs's fsync has a stage called 'sync log' which will wait for all the ordered extents it've recorded to finish. In xfstests/generic/127, with mixed operations such as truncate, fallocate, punch hole, and mapwrite, we get some pre-allocated extents, and mapwrite will mmap, and then msync. And I find that msync will wait for quite a long time (about 20s in my case), thanks to ftrace, it turns out that the previous fallocate calls 'btrfs_wait_ordered_range()' to flush dirty pages, but as the range of dirty pages may be larger than 'btrfs_wait_ordered_range()' wants, there can be some ordered extents created but not getting corresponding pages flushed, then they're left in memory until we fsync which runs into the stage 'sync log', and fsync will just wait for the system writeback thread to flush those pages and get ordered extents finished, so the latency is inevitable. This adds a flush similar to btrfs_start_ordered_extent() in btrfs_wait_logged_extents() to fix that. Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-07-19 11:49:44 -07:00
Artem Bityutskiy	545f7fdf6d	UBIFS: add a log overlap assertion Add an assertion which checkes that the head of the log never overlaps with the tail of the log. Suggested-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:55:19 +03:00
Artem Bityutskiy	f1cb705acc	UBIFS: remove unnecessary check Remove the "if (c->lhead_offs == 0)" check because is unnecessary, since at that point the log head offset is guaranteed to be zero due to the previous operation. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:54:25 +03:00
Artem Bityutskiy	07e19dff63	UBIFS: remove mst_mutex The 'mst_mutex' is not needed since because 'ubifs_write_master()' is only called on the mount path and commit path. The mount path is sequential and there is no parallelism, and the commit path is also serialized - there is only one commit going on at a time. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:52 +03:00
Fabian Frederick	39274a1e8d	UBIFS: kernel-doc warning fix s/data/timer Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:52 +03:00
Fabian Frederick	d4eb08ff0a	UBIFS: replace seq_printf by seq_puts Fix checkpatch warnings: "WARNING: Prefer seq_puts to seq_printf" Andrew Morton wrote: " - puts is presumably faster - puts doesn't go rogue if you accidentally pass it a "%". - this patch actually made fs/ubifs/super.o 12 bytes smaller. Perhaps because seq_printf() is a varargs function, forcing the caller to pass args on the stack instead of in registers. " Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:52 +03:00
Fabian Frederick	86b4c14de3	UBIFS: replace countsize kzalloc by kcalloc kcalloc manages countsizeof overflow. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:52 +03:00
Fabian Frederick	ef13f01828	UBIFS: kernel-doc warning fix No grouped argument in drop_last_node. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:52 +03:00
hujianyang	6dcfb80264	UBIFS: fix error path in create_default_filesystem() In the end of 'create_default_filesystem()' we need to check the return value of 'ubifs_write_node()' to ensure that we have successfully written the 'cs_node'. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:52 +03:00
Artem Bityutskiy	f2b6521aa1	UBIFS: fix spelling of "scanned" Randy Dunlap pointed that we should use "scanned" instead of "scaned". This patch makes the correction. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:51 +03:00
Seunghun Lee	d685c41215	UBIFS: fix some comments This patch fixes some comments about return type. Signed-off-by: Seunghun Lee <waydi1@gmail.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:51 +03:00
hujianyang	c7b5bb0beb	UBIFS: remove useless @ecc in struct ubifs_scan_leb We set @ecc in ubifs_scan_leb only if leb_read returns EBADMSG and do not use it any more. This patch removes this variable and adds comments about EBADMSG handling. Artem: re-phrase commentaries Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:51 +03:00
hujianyang	b793a8c888	UBIFS: remove useless statements This patch removes useless and duplicate statements. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:51 +03:00
hujianyang	ce6ebdb87e	UBIFS: Add missing break statements in dbg_chk_pnode() This is a minor fix. These two branches in 'dbg_chk_pnode()' are dealing with different conditions. Although there is no fault in current state, I think adding "break"s in each end of branch is better. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:51 +03:00
hujianyang	5a95741a57	UBIFS: fix error handling in dump_lpt_leb() This patch checks the return value of 'ubifs_unpack_nnode()'. If this function returns an error, 'nnode' may not be initialized, so just print an error message and break. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2014-07-19 09:53:51 +03:00
Kees Cook	c2e1f2e30d	seccomp: implement SECCOMP_FILTER_FLAG_TSYNC Applying restrictive seccomp filter programs to large or diverse codebases often requires handling threads which may be started early in the process lifetime (e.g., by code that is linked in). While it is possible to apply permissive programs prior to process start up, it is difficult to further restrict the kernel ABI to those threads after that point. This change adds a new seccomp syscall flag to SECCOMP_SET_MODE_FILTER for synchronizing thread group seccomp filters at filter installation time. When calling seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC, filter) an attempt will be made to synchronize all threads in current's threadgroup to its new seccomp filter program. This is possible iff all threads are using a filter that is an ancestor to the filter current is attempting to synchronize to. NULL filters (where the task is running as SECCOMP_MODE_NONE) are also treated as ancestors allowing threads to be transitioned into SECCOMP_MODE_FILTER. If prctrl(PR_SET_NO_NEW_PRIVS, ...) has been set on the calling thread, no_new_privs will be set for all synchronized threads too. On success, 0 is returned. On failure, the pid of one of the failing threads will be returned and no filters will have been applied. The race conditions against another thread are: - requesting TSYNC (already handled by sighand lock) - performing a clone (already handled by sighand lock) - changing its filter (already handled by sighand lock) - calling exec (handled by cred_guard_mutex) The clone case is assisted by the fact that new threads will have their seccomp state duplicated from their parent before appearing on the tasklist. Holding cred_guard_mutex means that seccomp filters cannot be assigned while in the middle of another thread's exec (potentially bypassing no_new_privs or similar). The call to de_thread() may kill threads waiting for the mutex. Changes across threads to the filter pointer includes a barrier. Based on patches by Will Drewry. Suggested-by: Julien Tinnes <jln@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>	2014-07-18 12:13:40 -07:00
Kees Cook	1d4457f999	sched: move no_new_privs into new atomic flags Since seccomp transitions between threads requires updates to the no_new_privs flag to be atomic, the flag must be part of an atomic flag set. This moves the nnp flag into a separate task field, and introduces accessors. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>	2014-07-18 12:13:38 -07:00
Linus Torvalds	f839719122	This patch set contains two minor docs/spelling fixes, some fixes for flock, a change to use GFP_NOFS to avoid recursion on a rarely used code path and a fix for a race relating to the glock lru. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) iQIcBAABAgAGBQJTyPZQAAoJEMrg3m4a/8jSFBEQAKSnJQUP9MSxVwNBrgOiybXW kQd8RYs7cdt33i97C3Im9xSVktPz4HKTvuwHyvNV1oyWScfWSyqCgC//cU+/zlYV wJDZWIASNoQheY6UfxR6TeBPZo9Hgq7RQRGj4h1ttag9+b8Zz9aV5TCxcoh28ULF 629TyNwg4xdiEKX2xZusDwGCoHn5f5l9pAa5MyPrcyPzn1lOJP1lz++Lci2nqC4g DvA/KzQzDLQ2lKXdSd95avwQxnHqmeCTvClPmK9GgONrt66tqq6CcCLB1jPRE7/O J7f0VWy/PEeo8ot+9siiA380EvM6hWvJx5Fuen/Qb9dQ5sgsJMkvgbqlHK6zB/i3 Je6Qq+aVPz3qktmXdyEagpXfZAQAxy0PUWezQBQH8HIlhwKMGC1QaFgMoAFIks1Y S38IBHCwlymytWYdVaRhyUOnlzzaSyeYROzs7hZoxRRUilge5rPkrqtv4HWLSRtZ rGFEid181+qTO2TyoiMRY2oR3U0PHfbE9Dhv5Pu9caTl55kj9eAGwvqnOn6IpyvF eiUoWOnDYFO+8sxFKPYFndglEZx0zBU6B/7axyQ3qam3BojTJwKh+2+4TqauM0zo 4ehwJEzVmV21sbyMfUHCKTQEkW8OjQ+EkxAEmGhp4IODNwZ3vPfFBdhFi3fBipqO WhDmeDmOddb9cCoQG8WZ =VTve -----END PGP SIGNATURE----- Merge tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes Pull gfs2 fixes from Steven Whitehouse: "This patch set contains two minor docs/spelling fixes, some fixes for flock, a change to use GFP_NOFS to avoid recursion on a rarely used code path and a fix for a race relating to the glock lru" * tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes: GFS2: fs/gfs2/rgrp.c: kernel-doc warning fixes GFS2: memcontrol: Spelling s/invlidate/invalidate/ GFS2: Allow caching of glocks for flock GFS2: Allow flocks to use normal glock dq rather than dq_wait GFS2: replace count*size kzalloc by kcalloc GFS2: Use GFP_NOFS when allocating glocks GFS2: Fix race in glock lru glock disposal GFS2: Only wait for demote when last holder is dequeued	2014-07-18 06:26:04 -10:00
Linus Torvalds	847f56eb0e	xfs: fixes for 3.15-rc5 Fixes for low memory perforamnce regressions and a quota inode handling regression. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJTyFtkAAoJEK3oKUf0dfodAioP/0nsIq3be9zch0/Jjf9aduON 1aMaUE/9p4g/zu0f96ld3GI5/guRVllZ/0qJFyYAnAgOGs1hGARQfOuNnHBjgdLZ D261JbT9z8d8inQ7BMSc3EBBQ2CAZsAwRmg6UbeaWBE4hjlJ03RGWBE/aMMh10Wh t2fWUoeYWZWLi+Gfa+lPpnTESH7nBK5cW+daC16I1fU9Z/RcXAl+pCN2s5Ls6h7G 1mQNfhAuII+/ydB7UWXL6SQ7/sDvDwNvedMLljwg6oYu/riSG2kYPZKW6O9qwL6T z9onPg5lEQMjWlbl6qzNo6OT3pAs37vssG0zVw2ZS8GjZ84edRzw2PGctJAFu2Hh sWqmtYGNKBjtjnxJ4zlRfeBkwpHbZGlLyOwzKoDlyQ8j9KZ8v8lsKyEpJK6/XJNG 1rMJZV5twu+xvZUwf0zkg0tuxoT/3T3kbIHsFkEaJmQW7jrxTvdybW/rp6KHSbcb rzCpuZ5Ghh9qss+EeCv3k2nnWjysDP4kSwQMZ0zCedzvDTmga2TMw//MUwaM+i7M D7Raq4Qcs2updrFk2j9OyML2hi49KuPTtEu2OC7ObfxvBsZgSbTvyw1Vq/rsiDM5 FZMV/giKRoCFpRpp7xF+db0zkBC2xDU9tGz196dzGtg7rvp6Z5401mS8fAr9H/LJ D2Wf2OXx3oss9v4rrO7N =rnN8 -----END PGP SIGNATURE----- Merge tag 'xfs-for-linus-3.16-rc5' of git://oss.sgi.com/xfs/xfs Pull xfs fixes from Dave Chinner: "Fixes for low memory perforamnce regressions and a quota inode handling regression. These are regression fixes for issues recently introduced - the change in the stack switch location is fairly important, so I've held off sending this update until I was sure that it still addresses the stack usage problem the original solved. So while the commits in the xfs tree are recent, it has been under tested for several weeks now" * tag 'xfs-for-linus-3.16-rc5' of git://oss.sgi.com/xfs/xfs: xfs: null unused quota inodes when quota is on xfs: refine the allocation stack switch Revert "xfs: block allocation work needs to be kswapd aware"	2014-07-18 06:21:43 -10:00
Chuck Lever	3c45ddf823	svcrdma: Select NFSv4.1 backchannel transport based on forward channel The current code always selects XPRT_TRANSPORT_BC_TCP for the back channel, even when the forward channel was not TCP (eg, RDMA). When a 4.1 mount is attempted with RDMA, the server panics in the TCP BC code when trying to send CB_NULL. Instead, construct the transport protocol number from the forward channel transport or'd with XPRT_TRANSPORT_BC. Transports that do not support bi-directional RPC will not have registered a "BC" transport, causing create_backchannel_client() to fail immediately. Fixes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=265 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-18 11:35:45 -04:00
Fabian Frederick	27ff6a0f7f	GFS2: fs/gfs2/rgrp.c: kernel-doc warning fixes Cc: cluster-devel@redhat.com Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-07-18 11:15:14 +01:00
Geert Uytterhoeven	6b49d1d9c3	GFS2: memcontrol: Spelling s/invlidate/invalidate/ Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: cluster-devel@redhat.com Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-07-18 11:14:31 +01:00
Bob Peterson	97a4f1d765	GFS2: Allow caching of glocks for flock This patch removes the GLF_NOCACHE flag from the glocks associated with flocks. There should be no good reason not to cache glocks for flocks: they only force the glock to be demoted before they can be reacquired, which can slow down performance and even cause glock hangs, especially in cases where the flocks are held in Shared (SH) mode. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-07-18 11:14:12 +01:00
Bob Peterson	5bef3e7cf1	GFS2: Allow flocks to use normal glock dq rather than dq_wait This patch allows flock glocks to use a non-blocking dequeue rather than dq_wait. It also reverts the previous patch I had posted regarding dq_wait. The reverted patch isn't necessarily a bad idea, but I decided this might avoid unforeseen side effects, and was therefore safer. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-07-18 11:13:56 +01:00
Fabian Frederick	6ec43b1838	GFS2: replace countsize kzalloc by kcalloc kcalloc manages countsizeof overflow. Cc: cluster-devel@redhat.com Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-07-18 11:13:38 +01:00
Steven Whitehouse	fe0bbd2986	GFS2: Use GFP_NOFS when allocating glocks Normally GFP_KERNEL is ok here, but there is now a rarely used code path relating to deallocation of unlinked inodes (in certain corner cases) which if hit at times of memory shortage can cause recursion while trying to free memory. One solution would be to try and move the gfs2_glock_get() call so that it is no longer called while another glock is held, but that doesn't look at all easy, so GFP_NOFS is the best solution for the time being. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-07-18 11:13:12 +01:00
Steven Whitehouse	94a09a3999	GFS2: Fix race in glock lru glock disposal We must not leave items on the LRU list with GLF_LOCK set, since they can be removed if the glock is brought back into use, which may then potentially result in a hang, waiting for GLF_LOCK to clear. It doesn't happen very often, since it requires a glock that has not been used for a long time to be brought back into use at the same moment that the shrinker is part way through disposing of glocks. The fix is to set GLF_LOCK at a later time, when we already know that the other locks can be obtained. Also, we now only release the lru_lock in case a resched is needed, rather than on every iteration. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-07-18 11:12:51 +01:00
Bob Peterson	79272b3562	GFS2: Only wait for demote when last holder is dequeued Function gfs2_glock_dq_wait is supposed to dequeue a glock and then wait for the lock to be demoted. The problem is, if this is a shared lock, its demote will depend on the other holders, which means you might end up waiting forever because the other process is blocked. This problem is especially apparent when dealing with nested flocks. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-07-18 11:12:14 +01:00
Cyrill Gorcunov	5442e9fbd7	timerfd: Implement timerfd_ioctl method to restore timerfd_ctx::ticks, v3 The read() of timerfd files allows to fetch the number of timer ticks while there is no way to set it back from userspace. To restore the timer's state as it was at checkpoint moment we need a path to bring @ticks back. Initially I thought about writing ticks back via write() interface but it seems such API is somehow obscure. Instead implement timerfd_ioctl() method with TFD_IOC_SET_TICKS command which allows to adjust @ticks into non-zero value waking up the waiters. I wrapped code with CONFIG_CHECKPOINT_RESTORE which can be dropped off if there users except c/r camp appear. v2 (by akpm@): - Use define timerfd_ioctl NULL for non c/r config v3: - Use copy_from_user for @ticks fetching since not all arch support get_user for 8 byte argument Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christopher Covington <cov@codeaurora.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Link: http://lkml.kernel.org/r/20140715215703.285617923@openvz.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-07-18 11:49:57 +02:00
Cyrill Gorcunov	af9c4957cf	timerfd: Implement show_fdinfo method For checkpoint/restore of timerfd files we need to know how exactly the timer were armed, to be able to recreate it on restore stage. Thus implement show_fdinfo method which provides enough information for that. One of significant changes I think is the addition of @settime_flags member. Currently there are two flags TFD_TIMER_ABSTIME and TFD_TIMER_CANCEL_ON_SET, and the second can be found from @might_cancel variable but in case if the flags will be extended in future we most probably will have to somehow remember them explicitly anyway so I guss doing that right now won't hurt. To not bloat the timerfd_ctx structure I've converted @expired to short integer and defined @settime_flags as short too. v2 (by avagin@, vdavydov@ and tglx@): - Add it_value/it_interval fields - Save flags being used in timerfd_setup in context v3 (by tglx@): - don't forget to use CONFIG_PROC_FS v4 (by akpm@): -Use define timerfd_show NULL for non c/r config Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Link: http://lkml.kernel.org/r/20140715215703.114365649@openvz.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-07-18 11:49:57 +02:00
J. Bruce Fields	5d6031ca74	nfsd4: zero op arguments beyond the 8th compound op The first 8 ops of the compound are zeroed since they're a part of the argument that's zeroed by the memset(rqstp->rq_argp, 0, procp->pc_argsize); in svc_process_common(). But we handle larger compounds by allocating the memory on the fly in nfsd4_decode_compound(). Other than code recently fixed by `01529e3f81` "NFSD: Fix memory leak in encoding denied lock", I don't know of any examples of code depending on this initialization. But it definitely seems possible, and I'd rather be safe. Compounds this long are unusual so I'm much more worried about failure in this poorly tested cases than about an insignificant performance hit. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-17 16:20:39 -04:00
Jeff Layton	ae4b884fc6	nfsd: silence sparse warning about accessing credentials sparse says: fs/nfsd/auth.c:31:38: warning: incorrect type in argument 1 (different address spaces) fs/nfsd/auth.c:31:38: expected struct cred const cred fs/nfsd/auth.c:31:38: got struct cred const [noderef] <asn:4>real_cred Add a new accessor for the ->real_cred and use that to fetch the pointer. Accessing current->real_cred directly is actually quite safe since we know that they can't go away so this is mostly a cosmetic fixup to silence sparse. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-17 16:15:35 -04:00
David Howells	0c7774abb4	KEYS: Allow special keys (eg. DNS results) to be invalidated by CAP_SYS_ADMIN Special kernel keys, such as those used to hold DNS results for AFS, CIFS and NFS and those used to hold idmapper results for NFS, used to be 'invalidateable' with key_revoke(). However, since the default permissions for keys were reduced: Commit: `96b5c8fea6` KEYS: Reduce initial permissions on keys it has become impossible to do this. Add a key flag (KEY_FLAG_ROOT_CAN_INVAL) that will permit a key to be invalidated by root. This should not be used for system keyrings as the garbage collector will try and remove any invalidate key. For system keyrings, KEY_FLAG_ROOT_CAN_CLEAR can be used instead. After this, from userspace, keyctl_invalidate() and "keyctl invalidate" can be used by any possessor of CAP_SYS_ADMIN (typically root) to invalidate DNS and idmapper keys. Invalidated keys are immediately garbage collected and will be immediately rerequested if needed again. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Steve Dickson <steved@redhat.com>	2014-07-17 20:45:08 +01:00

... 3 4 5 6 7 ...

37446 Commits