The sequence operation is not cached; always encode the sequence operation on
a replay from the slot table and session values. This simplifies the sessions
replay logic in nfsd4_proc_compound.
If this is a replay of a compound that was specified not to be cached, return
NFS4ERR_RETRY_UNCACHED_REP.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This function is only used for SEQUENCE replay.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Instead of trying to share the generic 4.1 reply cache code for the
CREATE_SESSION reply cache, it's simpler to handle CREATE_SESSION
separately.
The nfs41 single slot clientid DRC holds the results of create session
processing. CREATE_SESSION can be preceeded by a SEQUENCE operation
(an embedded CREATE_SESSION) and the create session single slot cache must be
maintained. nfsd4_replay_cache_entry() and nfsd4_store_cache_entry() do not
implement the replay of an embedded CREATE_SESSION.
The clientid DRC slot does not need the inuse, cachethis or other fields that
the multiple slot session cache uses. Replace the clientid DRC cache struct
nfs4_slot cache with a new nfsd4_clid_slot cache. Save the xdr struct
nfsd4_create_session into the cache at the end of processing, and on a replay,
replace the struct for the replay request with the cached version all while
under the state lock.
nfsd4_proc_compound will handle both the solo and embedded CREATE_SESSION case
via the normal use of encode_operation.
Errors that do not change the create session cache:
A create session NFS4ERR_STALE_CLIENTID error means that a client record
(and associated create session slot) could not be found and therefore can't
be changed. NFSERR_SEQ_MISORDERED errors do not change the slot cache.
All other errors get cached.
Remove the clientid DRC specific check in nfs4svc_encode_compoundres to
put the session only if cstate.session is set which will now always be true.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
For separation of session slot and clientid slot processing.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
NFSD_SLOT_CACHE_SIZE is the size of all encoded operation responses
(excluding the sequence operation) that we want to cache.
For now, keep NFSD_SLOT_CACHE_SIZE at PAGE_SIZE. It will be reduced
when the DRC is changed from page based to memory based.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This fixes a leak which would eventually lock out new clients.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
kmemleak produces the following warning
unreferenced object 0xc9ec02a0 (size 8):
comm "cat", pid 19048, jiffies 730243
backtrace:
[<c01bf970>] create_object+0x100/0x240
[<c01bfadb>] kmemleak_alloc+0x2b/0x60
[<c01bcd4b>] __kmalloc+0x14b/0x270
[<c02fd027>] write_pool_threads+0x87/0x1d0
[<c02fcc08>] nfsctl_transaction_write+0x58/0x70
[<c02fcc6f>] nfsctl_transaction_read+0x4f/0x60
[<c01c2574>] vfs_read+0x94/0x150
[<c01c297d>] sys_read+0x3d/0x70
[<c0102d6b>] sysenter_do_call+0x12/0x32
[<ffffffff>] 0xffffffff
write_pool_threads() only frees nthreads on error paths, in the success case
we leak it.
Signed-off-by: Eric Sesterhenn <eric.sesterhenn@lsexperts.de>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The version 4.1 DRC memory limit and tracking variables are server wide and
session specific. Replace struct svc_serv fields with globals.
Stop using the svc_serv sv_lock.
Add a spinlock to serialize access to the DRC limit management variables which
change on session creation and deletion (usage counter) or (future)
administrative action to adjust the total DRC memory limit.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
ACL in operations 'open' and 'create' is decoded but never be used.
It should be set as the initial ACL for the object according to RFC3530.
If error occurs when setting the ACL, just clear the ACL bit in the
returned attr bitmap.
Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (23 commits)
switch xfs to generic acl caching helpers
helpers for acl caching + switch to those
switch shmem to inode->i_acl
switch reiserfs to inode->i_acl
switch reiserfs to usual conventions for caching ACLs
reiserfs: minimal fix for ACL caching
switch nilfs2 to inode->i_acl
switch btrfs to inode->i_acl
switch jffs2 to inode->i_acl
switch jfs to inode->i_acl
switch ext4 to inode->i_acl
switch ext3 to inode->i_acl
switch ext2 to inode->i_acl
add caching of ACLs in struct inode
fs: Add new pre-allocation ioctls to vfs for compatibility with legacy xfs ioctls
cleanup __writeback_single_inode
... and the same for vfsmount id/mount group id
Make allocation of anon devices cheaper
update Documentation/filesystems/Locking
devpts: remove module-related code
...
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
udf: remove redundant tests on unsigned
udf: Use device size when drive reported bogus number of written blocks
helpers: get_cached_acl(inode, type), set_cached_acl(inode, type, acl),
forget_cached_acl(inode, type).
ubifs/xattr.c needed includes reordered, the rest is a plain switchover.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
reiserfs uses NULL as "unknown" and ERR_PTR(-ENODATA) as "no ACL";
several codepaths store the former instead of the latter.
All those codepaths go through iset_acl() and all cases when it's
called with NULL acl are for the second variety, so the minimal
fix is to teach iset_acl() to deal with that.
Proper fix is to switch to more usual conventions and avoid back
and forth between internally used ERR_PTR(-ENODATA) and NULL
expected by the rest of the kernel.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch adds ioctls to vfs for compatibility with legacy XFS
pre-allocation ioctls (XFS_IOC_*RESVP*). The implementation
effectively invokes sys_fallocate for the new ioctls.
Also handles the compat_ioctl case.
Note: These legacy ioctls are also implemented by OCFS2.
[AV: folded fixes from hch]
Signed-off-by: Ankit Jain <me@ankitjain.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
There is no reason to for the split between __writeback_single_inode and
__sync_single_inode, the former just does a couple of checks before
tail-calling the latter. So merge the two, and while we're at it split
out the I_SYNC waiting case for data integrity writers, as it's
logically separate function. Finally rename __writeback_single_inode to
writeback_single_inode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Standard trick - add a new variable (start) such that
for each n < start n is known to be busy. Allocation can
skip checking everything in [0..start) and if it returns
n, we can set start to n + 1. Freeing below start sets
start to what we'd just freed.
Of course, it still sucks if we do something like
free 0
allocate
allocate
in a loop - still O(n^2) time. However, on saner loads it
improves the things a lot and the entire thing is not worth
the trouble of switching to something with better worst-case
behaviour.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
These days, the devpts filesystem is closely integrated with the pty
memory management, and cannot be built as a module, even less removed
from the kernel. Accordingly, remove all module-related stuff from
this filesystem.
[ v2: only remove code that's actually dead ]
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
commit 2a73787110 "Cache root in nameidata"
introduced a new member nd->root, but forgot to put it in do_filp_open().
Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reiserfs doesn't use lock_super anywhere internally, and ->remount_fs
which calls reiserfs_resize does have it currently but also expects it
to be held on return, so there's no business for the unlock_super here.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked by Edward Shishkin <edward.shishkin@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
first_block and goal are unsigned. When negative they are wrapped and caught by
the other test.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
ocfs2/trivial: Wrap ocfs2_sysfile_cluster_lock_key within define.
ocfs2: Add lockdep annotations
vfs: Set special lockdep map for dirs only if not set by fs
ocfs2: Disable orphan scanning for local and hard-ro mounts
ocfs2: Do not initialize lvb in ocfs2_orphan_scan_lock_res_init()
ocfs2: Stop orphan scan as early as possible during umount
ocfs2: Fix ocfs2_osb_dump()
ocfs2: Pin journal head before accessing jh->b_committed_data
ocfs2: Update atime in splice read if necessary.
ocfs2: Provide the ocfs2_dlm_lvb_valid() stack API.
As noted in the previous patch, the NFSv4 client mount code currently
has several limitations. If the mount path contains symlinks, or
referrals, or even if it just contains a '..', then the client code in
nfs4_path_walk() will fail with an error.
This patch replaces the nfs4_path_walk()-based lookup with a helper
function that sets up a private namespace to represent the namespace on the
server, then uses the ordinary VFS and NFS path lookup code to walk down the
mount path in that namespace.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The purpose of this patch is to improve the remote mount path lookup
support for distributed filesystems such as the NFSv4 client.
When given a mount command of the form "mount server:/foo/bar /mnt", the
NFSv4 client is required to look up the filehandle for "server:/", and
then look up each component of the remote mount path "foo/bar" in order
to find the directory that is actually going to be mounted on /mnt.
Following that remote mount path may involve following symlinks,
crossing server-side mount points and even following referrals to
filesystem volumes on other servers.
Since the standard VFS path lookup code already supports walking paths
that contain all these features (using in-kernel automounts for
following referrals) we would like to be able to reuse that rather than
duplicate the full path traversal functionality in the NFSv4 client code.
This patch therefore defines a VFS helper function create_mnt_ns(), that
sets up a temporary filesystem namespace and attaches a root filesystem to
it. It exports the create_mnt_ns() and put_mnt_ns() function for use by
filesystem modules.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In order to allow modules to use it without having to export vfsmount_lock.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.infradead.org/mtd-2.6: (63 commits)
mtd: OneNAND: Allow setting of boundary information when built as module
jffs2: leaking jffs2_summary in function jffs2_scan_medium
mtd: nand: Fix memory leak on txx9ndfmc probe failure.
mtd: orion_nand: use burst reads with double word accesses
mtd/nand: s3c6400 support for s3c2410 driver
[MTD] [NAND] S3C2410: Use DIV_ROUND_UP
[MTD] [NAND] S3C2410: Deal with unaligned lengths in S3C2440 buffer read/write
[MTD] [NAND] S3C2410: Allow the machine code to get the BBT table from NAND
[MTD] [NAND] S3C2410: Added a kerneldoc for s3c2410_nand_set
mtd: physmap_of: Add multiple regions and concatenation support
mtd: nand: max_retries off by one in mxc_nand
mtd: nand: s3c2410_nand_setrate(): use correct macros for 2412/2440
mtd: onenand: add bbt_wait & unlock_all as replaceable for some platform
mtd: Flex-OneNAND support
mtd: nand: add OMAP2/OMAP3 NAND driver
mtd: maps: Blackfin async: fix memory leaks in probe/remove funcs
mtd: uclinux: mark local stuff static
mtd: uclinux: do not allow to be built as a module
mtd: uclinux: allow systems to override map addr/size
mtd: blackfin NFC: fix hang when using NAND on BF527-EZKITs
...
Actually ocfs2_sysfile_cluster_lock_key is only used if we enable
CONFIG_DEBUG_LOCK_ALLOC. Wrap it so that we can avoid a building
warning.
fs/ocfs2/sysfile.c:53: warning: ‘ocfs2_sysfile_cluster_lock_key’
defined but not used
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Add lockdep support to OCFS2. The support also covers all of the cluster
locks except for open locks, journal locks, and local quotafile locks. These
are special because they are acquired for a node, not for a particular process
and lockdep cannot deal with such type of locking.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Some filesystems need to set lockdep map for i_mutex differently for
different directories. For example OCFS2 has system directories (for
orphan inode tracking and for gathering all system files like journal
or quota files into a single place) which have different locking
locking rules than standard directories. For a filesystem setting
lockdep map is naturaly done when the inode is read but we have to
modify unlock_new_inode() not to overwrite the lockdep map the filesystem
has set.
Acked-by: peterz@infradead.org
CC: mingo@redhat.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Local and Hard-RO mounts do not need orphan scanning.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>