linux

Author	SHA1	Message	Date
Olga Kornievskaia	608207e888	rpc: pass target name down to rpc level on callbacks The rpc client needs to know the principal that the setclientid was done as, so it can tell gssd who to authenticate to. Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:17:40 -05:00
Olga Kornievskaia	68e76ad0ba	nfsd: pass client principal name in rsc downcall Two principals are involved in krb5 authentication: the target, who we authenticate to (normally the name of the server, like nfs/server.citi.umich.edu@CITI.UMICH.EDU), and the source, we we authenticate as (normally a user, like bfields@UMICH.EDU) In the case of NFSv4 callbacks, the target of the callback should be the source of the client's setclientid call, and the source should be the nfs server's own principal. Therefore we allow svcgssd to pass down the name of the principal that just authenticated, so that on setclientid we can store that principal name with the new client, to be used later on callbacks. Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:17:15 -05:00
Andy Adamson	cf8cdbe5bd	NFS: remove unused status from encode routines Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:06:18 -05:00
Andy Adamson	d017931cff	NFS: increment number of operations in each encode routine Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:06:17 -05:00
Benny Halevy	49c2559e29	NFS: fix comment placement in nfs4xdr.c Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:06:16 -05:00
Andy Adamson	05d564fe00	NFS: fix tabs in nfs4xdr.c Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:06:15 -05:00
Andy Adamson	6c0195a468	NFS: remove white space from nfs4xdr.c Clean-up Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:06:15 -05:00
Benny Halevy	374130770e	nfs: remove incorrect usage of nfs4 compound response hdr.status 3 call sites look at hdr.status before returning success. hdr.status must be zero in this case so there's no point in this. Currently, hdr.status is correctly processed at decode_op_hdr time if the op status cannot be decoded. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:06:14 -05:00
Benny Halevy	aadf615211	nfs: return compound hdr.status when there are no op replies When there are no op replies encoded in the compound reply hdr.status still contains the overall status of the compound rpc. This can happen, e.g., when the server returns a NFS4ERR_MINOR_VERS_MISMATCH error. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:06:13 -05:00
Trond Myklebust	027b6ca021	NFSv4: Fix an infinite loop in the NFS state recovery code Marten Gajda <marten.gajda@fernuni-hagen.de> states: I tracked the problem down to the function nfs4_do_open_expired. Within this function _nfs4_open_expired is called and may return -NFS4ERR_DELAY. When a further call to _nfs4_open_expired is executed and does not return -NFS4ERR_DELAY the "exception.retry" variable is not reset to 0, causing the loop to iterate again (and as long as err != -NFS4ERR_DELAY, probably forever) Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 16:04:13 -05:00
Peter Staubach	64672d55d9	optimize attribute timeouts for "noac" and "actimeo=0" Hi. I've been looking at a bugzilla which describes a problem where a customer was advised to use either the "noac" or "actimeo=0" mount options to solve a consistency problem that they were seeing in the file attributes. It turned out that this solution did not work reliably for them because sometimes, the local attribute cache was believed to be valid and not timed out. (With an attribute cache timeout of 0, the cache should always appear to be timed out.) In looking at this situation, it appears to me that the problem is that the attribute cache timeout code has an off-by-one error in it. It is assuming that the cache is valid in the region, [read_cache_jiffies, read_cache_jiffies + attrtimeo]. The cache should be considered valid only in the region, [read_cache_jiffies, read_cache_jiffies + attrtimeo). With this change, the options, "noac" and "actimeo=0", work as originally expected. This problem was previously addressed by special casing the attrtimeo == 0 case. However, since the problem is only an off- by-one error, the cleaner solution is address the off-by-one error and thus, not require the special case. Thanx... ps Signed-off-by: Peter Staubach <staubach@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:56 -05:00
Trond Myklebust	dc0b027dfa	NFSv4: Convert the open and close ops to use fmode Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:56 -05:00
Trond Myklebust	7a50c60e46	NFS: Use delegations to optimise ACCESS calls Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:55 -05:00
Trond Myklebust	15860ab1d7	NFSv4: Ensure that we set the verifier when revalidating delegated dentries This ensures that we don't have to look up the dentry again after we return the delegation if we know that the directory didn't change. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:54 -05:00
Trond Myklebust	5584c30630	NFSv4: Clean up is_atomic_open() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:54 -05:00
Trond Myklebust	bd7bf9d540	NFSv4: Convert delegation->type field to fmode_t Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:53 -05:00
Trond Myklebust	9082a5cc1e	NFSv4: Fix up delegation callbacks Currently, the callback server is listening on IPv6 if it is enabled. This means that IPv4 addresses will always be mapped. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:53 -05:00
Trond Myklebust	b7391f44f2	NFSv4: Return unreferenced delegations more promptly If the client is not using a delegation, the right thing to do is to return it as soon as possible. This helps reduce the amount of state the server has to track, as well as reducing the potential for conflicts with other clients. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:52 -05:00
Trond Myklebust	6411bd4a47	NFSv4: Clean up the asynchronous delegation return Reuse the state management thread in order to return delegations when we get a callback. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:51 -05:00
Trond Myklebust	b0d3ded1a2	NFSv4: Clean up nfs_expire_all_delegations() Let the actual delegreturn stuff be run in the state manager thread rather than allocating a separate kthread. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:50 -05:00
Trond Myklebust	0d62f85a81	NFSv4: Fix a BAD_SEQUENCEID condition. We really shouldn't be resetting the sequence ids when doing state expiration recovery, since we don't know if the server still remembers our previous state owners. There are servers out there that do attempt to preserve client state even if the lease has expired. Such a server would only release that state if a conflicting OPEN request occurs. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:49 -05:00
Trond Myklebust	f3c76491e7	NFSv4: Don't exit the state management if there are still tasks to do Fix up a potential race... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:48 -05:00
Trond Myklebust	e005e8041c	NFSv4: Rename the state reclaimer thread It is really a more general purpose state management thread at this point. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:48 -05:00
Trond Myklebust	707fb4b324	NFSv4: Clean up NFS4ERR_CB_PATH_DOWN error management... Add a delegation cleanup phase to the state management loop, and do the NFS4ERR_CB_PATH_DOWN recovery there. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:47 -05:00
Trond Myklebust	515d861177	NFSv4: Clean up the support for returning multiple delegations Add a flag to mark delegations as requiring return, then run a garbage collector. In the future, this will allow for more flexible delegation management, where delegations may be marked for return if it turns out that they are not being referenced. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:46 -05:00
Trond Myklebust	9e33bed552	NFSv4: Add recovery for individual stateids NFSv4 defines a number of state errors which the client does not currently handle. Among those we should worry about are: NFS4ERR_ADMIN_REVOKED - the server's administrator revoked our locks and/or delegations. NFS4ERR_BAD_STATEID - the client and server are out of sync, possibly due to a delegation return racing with an OPEN request. NFS4ERR_OPENMODE - the client attempted to do something not sanctioned by the open mode of the stateid. Should normally just occur as a result of a delegation return race. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:46 -05:00
Trond Myklebust	95d35cb4c4	NFSv4: Remove nfs_client->cl_sem Now that we're using the flags to indicate state that needs to be recovered, as well as having implemented proper refcounting and spinlocking on the state and open_owners, we can get rid of nfs_client->cl_sem. The only remaining case that was dubious was the file locking, and that case is now covered by the nfsi->rwsem. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:45 -05:00
Trond Myklebust	19e03c570e	NFSv4: Ensure that file unlock requests don't conflict with state recovery The unlock path is currently failing to take the nfs_client->cl_sem read lock, and hence the recovery path may see locks disappear from underneath it. Also ensure that it takes the nfs_inode->rwsem read lock so that it there is no conflict with delegation recalls. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:44 -05:00
Trond Myklebust	65de872ed6	NFS: Remove the unnecessary argument to nfs4_wait_clnt_recover() ...and move some code around in order to clear out an unnecessary forward declaration. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:44 -05:00
Trond Myklebust	fe1d81952e	NFSv4: Ensure that nfs4_reclaim_open_state() doesn't depend on cl_sem Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:43 -05:00
Trond Myklebust	7eff03aec9	NFSv4: Add a recovery marking scheme for state owners Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:43 -05:00
Trond Myklebust	0f605b5600	NFSv4: Don't tell server we rebooted when not necessary Instead of doing a full setclientid, try doing a RENEW call first. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:42 -05:00
Trond Myklebust	e598d843c0	NFSv4: Remove redundant RENEW calls if we know the lease has expired Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:42 -05:00
Trond Myklebust	b79a4a1b45	NFSv4: Fix state recovery when the client runs over the grace period If the client for some reason is not able to recover all its state within the time allotted for the grace period, and the server reboots again, the client is not allowed to recover the state that was 'lost' using reboot recovery. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:41 -05:00
Trond Myklebust	6dc9d57af9	NFSv4: Callers to nfs4_get_renew_cred() need to hold nfs_client->cl_lock Ditto for nfs4_get_setclientid_cred(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:41 -05:00
Trond Myklebust	0286001430	NFSv4: Clean up for the state loss reclaimer Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:40 -05:00
Trond Myklebust	15c831bf1a	NFS: Use atomic bitops when changing struct nfs_delegation->flags Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:39 -05:00
Trond Myklebust	86e8948998	NFSv4: Fix up the dereferencing of delegation->inode Without an extra lock, we cannot just assume that the delegation->inode is valid when we're traversing the rcu-protected nfs_client lists. Use the delegation->lock to ensure that it is truly valid. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:39 -05:00
Trond Myklebust	343104308a	NFSv4: Fix up another delegation related race When we can update_open_stateid(), we need to be certain that we don't race with a delegation return. While we could do this by grabbing the nfs_client->cl_lock, a dedicated spin lock in the delegation structure will scale better. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:38 -05:00
Chuck Lever	0cb2659b81	NLM: allow lockd requests from an unprivileged port If the admin has specified the "noresvport" option for an NFS mount point, the kernel's NFS client uses an unprivileged source port for the main NFS transport. The kernel's lockd client should use an unprivileged port in this case as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:38 -05:00
Chuck Lever	50a737f86d	NFS: "[no]resvport" mount option changes mountd client too If the admin has specified the "noresvport" option for an NFS mount point, the kernel's NFS client uses an unprivileged source port for the main NFS transport. The kernel's mountd client should use an unprivileged port in this case as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:37 -05:00
Chuck Lever	d740351bf0	NFS: add "[no]resvport" mount option The standard default security setting for NFS is AUTH_SYS. An NFS client connects to NFS servers via a privileged source port and a fixed standard destination port (2049). The client sends raw uid and gid numbers to identify users making NFS requests, and the server assumes an appropriate authority on the client has vetted these values because the source port is privileged. On Linux, by default in-kernel RPC services use a privileged port in the range between 650 and 1023 to avoid using source ports of well- known IP services. Using such a small range limits the number of NFS mount points and the number of unique NFS servers to which a client can connect concurrently. An NFS client can use unprivileged source ports to expand the range of source port numbers, allowing more concurrent server connections and more NFS mount points. Servers must explicitly allow NFS connections from unprivileged ports for this to work. In the past, bumping the value of the sunrpc.max_resvport sysctl on the client would permit the NFS client to use unprivileged ports. Bumping this setting also changes the maximum port number used by other in-kernel RPC services, some of which still required a port number less than 1023. This is exacerbated by the way source port numbers are chosen by the Linux RPC client, which starts at the top of the range and works downwards. It means that bumping the maximum means all RPC services requesting a source port will likely get an unprivileged port instead of a privileged one. Changing this setting effects all NFS mount points on a client. A sysadmin could not selectively choose which mount points would use non-privileged ports and which could not. Lastly, this mechanism of expanding the limit on the number of NFS mount points was entirely undocumented. To address the need for the NFS client to use a large range of source ports without interfering with the activity of other in-kernel RPC services, we introduce a new NFS mount option. This option explicitly tells only the NFS client to use a non-privileged source port when communicating with the NFS server for one specific mount point. This new mount option is called "resvport," like the similar NFS mount option on FreeBSD and Mac OS X. A sister patch for nfs-utils will be submitted that documents this new option in nfs(5). The default setting for this new mount option requires the NFS client to use a privileged port, as before. Explicitly specifying the "noresvport" mount option allows the NFS client to use an unprivileged source port for this mount point when connecting to the NFS server port. This mount option is supported only for text-based NFS mounts. [ Sidebar: it is widely known that security mechanisms based on the use of privileged source ports are ineffective. However, the NFS client can combine the use of unprivileged ports with the use of secure authentication mechanisms, such as Kerberos. This allows a large number of connections and mount points while ensuring a useful level of security. Eventually we may change the default setting for this option depending on the security flavor used for the mount. For example, if the mount is using only AUTH_SYS, then the default setting will be "resvport;" if the mount is using a strong security flavor such as krb5, the default setting will be "noresvport." ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [Trond.Myklebust@netapp.com: Fixed a bug whereby nfs4_init_client() was being called with incorrect arguments.] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:37 -05:00
Chuck Lever	542fcc334a	NFS: move nfs_server flag initialization Make it possible for the NFSv4 mount set up logic to pass mount option flags down the stack to nfs_create_rpc_client(). This is immediately useful if we want NFS mount options to modulate settings of the underlying RPC transport, but it may be useful at some later point if other parts of the NFSv4 mount initialization logic want to know what the mount options are. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:36 -05:00
Chuck Lever	4a01b8a4ee	NFS: expand flags passed to nfs_create_rpc_client() The nfs_create_rpc_client() function sets up an RPC client for an NFS mount point. Add an option that allows it to set up an RPC transport from an unprivileged port. Instead of having nfs_create_rpc_client()'s callers retain local knowledge about how to set up an RPC client, create a couple of flag arguments to control the use of RPC_CLNT_CREATE flags. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:35 -05:00
Chuck Lever	c5d120f8e8	NFS: introduce nfs_mount_info struct for calling nfs_mount() Clean up: convert nfs_mount() to take a single data structure argument to make it simpler to add more arguments. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:35 -05:00
Chuck Lever	146ec944bb	NFS: Move declaration of nfs_mount() to fs/nfs/internal.h Clean up: The nfs_mount() function is not to be used outside of the NFS client. Move its public declaration to fs/nfs/internal.h. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:34 -05:00
Chuck Lever	7b5d2b98e1	NFS: rename nfs_path variable Clean up: I'm about to move the declaration of nfs_mount into fs/nfs/internal.h and include it in fs/nfs/nfsroot.c. There's a conflicting definition of nfs_path in fs/nfs/internal.h and fs/nfs/nfsroot.c, so rename the private one. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:34 -05:00
Jeff Layton	df94f000c4	lockd: convert reclaimer thread to kthread interface My understanding is that there is a push to turn the kernel_thread interface into a non-exported symbol and move all kernel threads to use the kthread API. This patch changes lockd to use kthread_run to spawn the reclaimer thread. I've made the assumption here that the extra module references taken when we spawn this thread are unnecessary and removed them. I've also added a KERN_ERR printk that pops if the thread can't be spawned to warn the admin that the locks won't be reclaimed. In the future, it would be nice to be able to notify userspace that locks have been lost (probably by implementing SIGLOST), and adding some good policies about how long we should reattempt to reclaim the locks. Finally, I removed a comment about memory leaks that I believe is obsolete and added a new one to clarify the result of sending a SIGKILL to the reclaimer thread. As best I can tell, doing so doesn't actually cause a memory leak. I consider this patch 2.6.29 material. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:33 -05:00
Trond Myklebust	2de59872a7	LOCKD: Make lockd_up() and lockd_down() exported GPL-only Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:33 -05:00
Trond Myklebust	d716f0b8a5	SUNRPC: nfsacl_encode/nfsacl_decode should be exported as GPL-only Again, this has never been intended as a public abi for out-of-tree modules. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:32 -05:00
Wu Fengguang	136221fc32	nfs: remove redundant tests on reading new pages aops->readpages() and its NFS helper readpage_async_filler() will only be called to do readahead I/O for newly allocated pages. So it's not necessary to test for the always 0 dirty/uptodate page flags. The removal of nfs_wb_page() call also fixes a readahead bug: the NFS readahead has been synchronous since 2.6.23, because that call will clear PG_readahead, which is the reminder for asynchronous readahead. More background: the PG_readahead page flag is shared with PG_reclaim, one for read path and the other for write path. clear_page_dirty_for_io() unconditionally clears PG_readahead to prevent possible readahead residuals, assuming itself to be always called in the write path. However, NFS is one and the only exception in that it _always_ calls clear_page_dirty_for_io() in the read path, i.e. for readpages()/readpage(). Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Wu Fengguang <wfg@linux.intel.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-12-23 15:21:30 -05:00
Christoph Hellwig	ad1ad968f4	[XFS] handle unaligned data in xfs_bmbt_disk_get_all In libxfs xfs_bmbt_disk_get_all needs to handle unaligned data and thus has been updated to use get_unaligned_be64. In kernelspace we don't strictly need it as the routine is only used for tracing and xfsidbg, but let's keep the two implementations in sync. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-23 11:54:46 +11:00
Christoph Hellwig	efc557570d	[XFS] avoid memory allocations in xfs_fs_vcmn_err xfs_fs_vcmn_err can be called under a spinlock, but does a sleeping memory allocation to create buffer for it's internal sprintf. Fortunately it's the only caller of icmn_err, so we can merge the two and have one single static buffer and spinlock protecting it. While we're at it make sure we proper __attribute__ format annotations so that the compiler can detect mismatched format strings. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-22 18:02:01 +11:00
Lachlan McIlroy	9f6c92b9cc	[XFS] Fix speculative allocation beyond eof Speculative allocation beyond eof doesn't work properly. It was broken some time ago after a code cleanup that moved what is now xfs_iomap_eof_align_last_fsb() and xfs_iomap_eof_want_preallocate() out of xfs_iomap_write_delay() into separate functions. The code used to use the current file size in various checks but got changed to be max(file_size, i_new_size). Since i_new_size is the result of 'offset + count' then in xfs_iomap_eof_want_preallocate() the check for '(offset + count) <= isize' will always be true. ie if 'offset + count' is > ip->i_size then isize will be i_new_size and equal to 'offset + count'. This change fixes all the places that used to use the current file size. Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-22 17:56:49 +11:00
Lachlan McIlroy	4fdc778179	[XFS] Remove XFS_BUF_SHUT() and friends Code does nothing so remove it. Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-22 17:52:58 +11:00
Lachlan McIlroy	d415867e0a	[XFS] Use the incore inode size in xfs_file_readdir() We should be using the incore inode size here not the linux inode size. The incore inode size is always up to date for directories whereas the linux inode size is not updated for directories. We've hit assertions in xfs_bmap() and traced it back to the linux inode size being zero but the incore size being correct. Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-22 17:50:56 +11:00
Ingo Molnar	826e08b015	sched: fix warning in fs/proc/base.c Stephen Rothwell reported this new (harmless) build warning on platforms that define u64 to long: fs/proc/base.c: In function 'proc_pid_schedstat': fs/proc/base.c:352: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'u64' asm-generic/int-l64.h platforms strike again: that file should be eliminated. Fix it by casting the parameters to long long. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-22 07:41:06 +01:00
Lachlan McIlroy	27a0464a6c	[XFS] Fix merge conflict in fs/xfs/xfs_rename.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: fs/xfs/xfs_rename.c Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-22 17:34:26 +11:00
Julia Lawall	f1d9e4586e	fs/9p: change simple_strtol to simple_strtoul Since v9ses->uid is unsigned, it would seem better to use simple_strtoul that simple_strtol. A simplified version of the semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @r2@ long e; position p; @@ e = simple_strtol@p(...) @@ position p != r2.p; type T; T e; @@ e = - simple_strtol@p + simple_strtoul (...) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Acked-by: Eric Van Hensbergen <ericvh@gmail.com>	2008-12-19 16:50:22 -06:00
Wu Fengguang	7dd0cdc51c	9p: convert d_iname references to d_name.name d_iname is rubbish for long file names. Use d_name.name in printks instead. Signed-off-by: Wu Fengguang <wfg@linux.intel.com> Acked-by: Eric Van Hensbergen <ericvh@gmail.com>	2008-12-19 16:47:40 -06:00
Duane Griffin	6ff232070a	9p: Remove potentially bad parameter from function entry debug print. Signed-off-by: Duane Griffin <duaneg@dghda.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2008-12-19 16:45:21 -06:00
James Morris	12204e24b1	security: pass mount flags to security_sb_kern_mount() Pass mount flags to security_sb_kern_mount(), so security modules can determine if a mount operation is being performed by the kernel. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>	2008-12-20 09:02:39 +11:00
Ingo Molnar	30cd324e97	Merge branches 'tracing/ftrace', 'tracing/ring-buffer' and 'tracing/urgent' into tracing/core Conflicts: include/linux/ftrace.h	2008-12-19 09:42:40 +01:00
Ken Chen	9c2c48020e	schedstat: consolidate per-task cpu runtime stats Impact: simplify code When we turn on CONFIG_SCHEDSTATS, per-task cpu runtime is accumulated twice. Once in task->se.sum_exec_runtime and once in sched_info.cpu_time. These two stats are exactly the same. Given that task->se.sum_exec_runtime is always accumulated by the core scheduler, sched_info can reuse that data instead of duplicate the accounting. Signed-off-by: Ken Chen <kenchen@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-18 13:54:01 +01:00
Paul Mackerras	c280266a32	Merge branch 'linux-2.6' into next	2008-12-18 11:06:12 +11:00
Linus Torvalds	0bc77ecbe4	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: ocfs2: Add JBD2 compat feature bit. ocfs2: Always update xattr search when creating bucket.	2008-12-17 15:01:23 -08:00
Jeff Layton	331c313510	cifs: fix buffer overrun in parse_DFS_referrals While testing a kernel with memory poisoning enabled, I saw some warnings about the redzone getting clobbered when chasing DFS referrals. The buffer allocation for the unicode converted version of the searchName is too small and needs to take null termination into account. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-17 14:59:55 -08:00
Joel Becker	a97721894a	ocfs2: Add JBD2 compat feature bit. Define the OCFS2_FEATURE_COMPAT_JBD2 bit in the filesystem header. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-12-16 18:26:16 -08:00
Tao Ma	83099bc647	ocfs2: Always update xattr search when creating bucket. When we create xattr bucket during the process of xattr set, we always need to update the ocfs2_xattr_search since even if the bucket size is the same as block size, the offset will change because of the removal of the ocfs2_xattr_block header. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-12-16 14:07:37 -08:00
Dave Kleikamp	d69e83d99c	jfs: ensure symlinks are NUL-terminated This is an alternate fix for a bug reported and fixed by Duane Griffin. Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Reported-by: Duane Griffin <duaneg@dghda.com>	2008-12-16 10:21:34 -06:00
KOSAKI Motohiro	13bd41bc22	proc: enclose desc variable of show_stat() in CONFIG_SPARSE_IRQ Impact: restructure code to fix compiler warning commit `240d367b4e` moved desc usage point into #ifdef CONFIG_SPARSE_IRQ. Eliminate the desc variable, otherwise following warning happens: fs/proc/stat.c: In function 'show_stat': fs/proc/stat.c:31: warning: unused variable 'desc' [ akpm: cleaned up the patch to remove #ifdef ] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-16 11:24:14 +01:00
Anton Vorontsov	6b82b3e4b5	powerpc: Remove `have_of' global variable The `have_of' variable is a relic from the arch/ppc time, it isn't useful nowadays. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-16 15:52:57 +11:00
David S. Miller	eb14f01959	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/e1000e/ich8lan.c	2008-12-15 20:03:50 -08:00
Oleg Nesterov	8187926bda	posix-timers: simplify de_thread()->exit_itimers() path Impact: simplify code de_thread() postpones release_task(leader) until after exit_itimers(). This was needed because !SIGEV_THREAD_ID timers could use ->group_leader without get_task_struct(). With the recent changes we can release the leader earlier and simplify the code. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-12-12 17:01:03 +01:00
Lachlan McIlroy	4d9d4ebf5d	Merge branch 'master' of git+ssh://git.melbourne.sgi.com/git/xfs	2008-12-12 15:28:02 +11:00
Lachlan McIlroy	cfbe52672f	[XFS] set b_error from bio error in xfs_buf_bio_end_io Preserve any error returned by the bio layer. Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-12 15:27:25 +11:00
Christoph Hellwig	c4cd747ee6	[XFS] use inode_change_ok for setattr permission checking Instead of implementing our own checks use inode_change_ok to check for necessary permission in setattr. There is a slight change in behaviour as inode_change_ok doesn't allow i_mode updates to add the suid or sgid without superuser privilegues while the old XFS code just stripped away those bits from the file mode. (First sent on Semptember 29th) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-11 13:15:10 +11:00
Christoph Hellwig	4d4be482a4	[XFS] add a FMODE flag to make XFS invisible I/O less hacky XFS has a mode called invisble I/O that doesn't update any of the timestamps. It's used for HSM-style applications and exposed through the nasty open by handle ioctl. Instead of doing directly assignment of file operations that set an internal flag for it add a new FMODE_NOCMTIME flag that we can check in the normal file operations. (addition of the generic VFS flag has been ACKed by Al as an interims solution) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-11 13:14:41 +11:00
Christoph Hellwig	6d73cf133c	[XFS] resync headers with libxfs - xfs_sb.h add the XFS_SB_VERSION2_PARENTBIT features2 that has been around in userspace for some time - xfs_inode.h: move a few things out of __KERNEL__ that are needed by userspace - xfs_mount.h: only include xfs_sync.h under __KERNEL__ - xfs_inode.c: minor whitespace fixup. I accidentaly changes this when importing this file for use by userspace. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-11 13:14:17 +11:00
Christoph Hellwig	2175dd9574	[XFS] simplify projid check in xfs_rename Check for the project ID after attaching all inodes to the transaction. That way the unlock in the error case is done by the transaction subsystem, which guaratees that is uses the right flags (which was wrong from day one of this check), and avoids having special code unlocking an array of inodes with potential duplicates. Attaching the inode first is the method used by xfs_rename and the other namespace methods all other error that require multiple locked inodes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-11 13:13:52 +11:00
Christoph Hellwig	15ac08a8b2	[XFS] replace b_fspriv with b_mount Replace the b_fspriv pointer and it's ugly accessors with a properly types xfs_mount pointer. Also switch log reocvery over to it instead of using b_fspriv for the mount pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-11 13:13:33 +11:00
Linus Torvalds	f4fd2c5b6f	Merge branch 'to-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland * 'to-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland: tracehook: exec double-reporting fix	2008-12-10 14:40:21 -08:00
Hugh Dickins	9c24624727	KSYM_SYMBOL_LEN fixes Miles Lane tailing /sys files hit a BUG which Pekka Enberg has tracked to my `966c8c12dc` sprint_symbol(): use less stack exposing a bug in slub's list_locations() - kallsyms_lookup() writes a 0 to namebuf[KSYM_NAME_LEN-1], but that was beyond the end of page provided. The 100 slop which list_locations() allows at end of page looks roughly enough for all the other stuff it might print after the symbol before it checks again: break out KSYM_SYMBOL_LEN earlier than before. Latencytop and ftrace and are using KSYM_NAME_LEN buffers where they need KSYM_SYMBOL_LEN buffers, and vmallocinfo a 2*KSYM_NAME_LEN buffer where it wants a KSYM_SYMBOL_LEN buffer: fix those before anyone copies them. [akpm@linux-foundation.org: ftrace.h needs module.h] Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc Miles Lane <miles.lane@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Steven Rostedt <srostedt@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-10 08:01:54 -08:00
Dmitri Monakhov	6ee5a399d6	inotify: fix IN_ONESHOT unmount event watcher On umount two event will be dispatched to watcher: 1: inotify_dev_queue_event(.., IN_UNMOUNT,..) 2: remove_watch(watch, dev) ->inotify_dev_queue_event(.., IN_IGNORED, ..) But if watcher has IN_ONESHOT bit set then the watcher will be released inside first event. Which result in accessing invalid object later. IMHO it is not pure regression. This bug wasn't triggered while initial inotify interface testing phase because of another bug in IN_ONESHOT handling logic :) commit `ac74c00e49` Author: Ulisses Furquim <ulissesf@gmail.com> Date: Fri Feb 8 04:18:16 2008 -0800 inotify: fix check for one-shot watches before destroying them As the IN_ONESHOT bit is never set when an event is sent we must check it in the watch's mask and not in the event's mask. TESTCASE: mkdir mnt mount -ttmpfs none mnt mkdir mnt/d ./inotify mnt/d& umount mnt ## << lockup or crash here TESTSOURCE: /* gcc -oinotify inotify.c / #include <stdio.h> #include <stdlib.h> #include <sys/inotify.h> int main(int argc, char argv) { char buf[1024]; struct inotify_event ie; char p; int i; ssize_t l; p = argv[1]; i = inotify_init(); inotify_add_watch(i, p, ~0); l = read(i, buf, sizeof(buf)); printf("read %d bytes\n", l); ie = (struct inotify_event ) buf; printf("event mask: %d\n", ie->mask); return 0; } Signed-off-by: Dmitri Monakhov <dmonakhov@openvz.org> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Robert Love <rlove@google.com> Cc: Ulisses Furquim <ulissesf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-10 08:01:53 -08:00
Matt Mackall	49c50342c7	pagemap: fix 32-bit pagemap regression The large pages fix from `bcf8039ed4` broke 32-bit pagemap by pulling the pagemap entry code out into a function with the wrong return type. Pagemap entries are 64 bits on all systems and unsigned long is only 32 bits on 32-bit systems. Signed-off-by: Matt Mackall <mpm@selenic.com> Reported-by: Doug Graham <dgraham@nortel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: <stable@kernel.org> [2.6.26.x, 2.6.27.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-10 08:01:53 -08:00
Andrew Morton	02d2116887	revert "percpu_counter: new function percpu_counter_sum_and_set" Revert commit `e8ced39d5e` Author: Mingming Cao <cmm@us.ibm.com> Date: Fri Jul 11 19:27:31 2008 -0400 percpu_counter: new function percpu_counter_sum_and_set As described in revert "percpu counter: clean up percpu_counter_sum_and_set()" the new percpu_counter_sum_and_set() is racy against updates to the cpu-local accumulators on other CPUs. Revert that change. This means that ext4 will be slow again. But correct. Reported-by: Eric Dumazet <dada1@cosmosbay.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mingming Cao <cmm@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Cc: <stable@kernel.org> [2.6.27.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-10 08:01:52 -08:00
Andrew Morton	71c5576fbd	revert "percpu counter: clean up percpu_counter_sum_and_set()" Revert commit `1f7c14c62c` Author: Mingming Cao <cmm@us.ibm.com> Date: Thu Oct 9 12:50:59 2008 -0400 percpu counter: clean up percpu_counter_sum_and_set() Before this patch we had the following: percpu_counter_sum(): return the percpu_counter's value percpu_counter_sum_and_set(): return the percpu_counter's value, copying that value into the central value and zeroing the per-cpu counters before returning. After this patch, percpu_counter_sum_and_set() has gone, and percpu_counter_sum() gets the old percpu_counter_sum_and_set() functionality. Problem is, as Eric points out, the old percpu_counter_sum_and_set() functionality was racy and wrong. It zeroes out counters on "other" cpus, without holding any locks which will prevent races agaist updates from those other CPUS. This patch reverts `1f7c14c62c`. This means that percpu_counter_sum_and_set() still has the race, but percpu_counter_sum() does not. Note that this is not a simple revert - ext4 has since started using percpu_counter_sum() for its dirty_blocks counter as well. Note that this revert patch changes percpu_counter_sum() semantics. Before the patch, a call to percpu_counter_sum() will bring the counter's central counter mostly up-to-date, so a following percpu_counter_read() will return a close value. After this patch, a call to percpu_counter_sum() will leave the counter's central accumulator unaltered, so a subsequent call to percpu_counter_read() can now return a significantly inaccurate result. If there is any code in the tree which was introduced after `e8ced39d5e` was merged, and which depends upon the new percpu_counter_sum() semantics, that code will break. Reported-by: Eric Dumazet <dada1@cosmosbay.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mingming Cao <cmm@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-10 08:01:52 -08:00
Roland McGrath	85f334666a	tracehook: exec double-reporting fix The patch `6341c39` "tracehook: exec" introduced a small regression in 2.6.27 regarding binfmt_misc exec event reporting. Since the reporting is now done in the common search_binary_handler() function, an exec of a misc binary will result in two (or possibly multiple) exec events being reported, instead of just a single one, because the misc handler contains a recursive call to search_binary_handler. To add to the confusion, if PTRACE_O_TRACEEXEC is not active, the multiple SIGTRAP signals will in fact cause only a single ptrace intercept, as the signals are not queued. However, if PTRACE_O_TRACEEXEC is on, the debugger will actually see multiple ptrace intercepts (PTRACE_EVENT_EXEC). The test program included below demonstrates the problem. This change fixes the bug by calling tracehook_report_exec() only in the outermost search_binary_handler() call (bprm->recursion_depth == 0). The additional change to restore bprm->recursion_depth after each binfmt load_binary call is actually superfluous for this bug, since we test the value saved on entry to search_binary_handler(). But it keeps the use of of the depth count to its most obvious expected meaning. Depending on what binfmt handlers do in certain cases, there could have been false-positive tests for recursion limits before this change. /* Test program using PTRACE_O_TRACEEXEC. This forks and exec's the first argument with the rest of the arguments, while ptrace'ing. It expects to see one PTRACE_EVENT_EXEC stop and then a successful exit, with no other signals or events in between. Test for kernel doing two PTRACE_EVENT_EXEC stops for a binfmt_misc exec: $ gcc -g traceexec.c -o traceexec $ sudo sh -c 'echo :test:M::foobar::/bin/cat: > /proc/sys/fs/binfmt_misc/register' $ echo 'foobar test' > ./foobar $ chmod +x ./foobar $ ./traceexec ./foobar; echo $? ==> good <== foobar test 0 $ ==> bad <== foobar test unexpected status 0x4057f != 0 3 $ / #include <stdio.h> #include <sys/types.h> #include <sys/wait.h> #include <sys/ptrace.h> #include <unistd.h> #include <signal.h> #include <stdlib.h> static void wait_for (pid_t child, int expect) { int status; pid_t p = wait (&status); if (p != child) { perror ("wait"); exit (2); } if (status != expect) { fprintf (stderr, "unexpected status %#x != %#x\n", status, expect); exit (3); } } int main (int argc, char argv) { pid_t child = fork (); if (child < 0) { perror ("fork"); return 127; } else if (child == 0) { ptrace (PTRACE_TRACEME); raise (SIGUSR1); execv (argv[1], &argv[1]); perror ("execve"); _exit (127); } wait_for (child, W_STOPCODE (SIGUSR1)); if (ptrace (PTRACE_SETOPTIONS, child, 0L, (void ) (long) PTRACE_O_TRACEEXEC) != 0) { perror ("PTRACE_SETOPTIONS"); return 4; } if (ptrace (PTRACE_CONT, child, 0L, 0L) != 0) { perror ("PTRACE_CONT"); return 5; } wait_for (child, W_STOPCODE (SIGTRAP \| (PTRACE_EVENT_EXEC << 8))); if (ptrace (PTRACE_CONT, child, 0L, 0L) != 0) { perror ("PTRACE_CONT"); return 6; } wait_for (child, W_EXITCODE (0, 0)); return 0; } Reported-by: Arnd Bergmann <arnd@arndb.de> CC: Ulrich Weigand <ulrich.weigand@de.ibm.com> Signed-off-by: Roland McGrath <roland@redhat.com>	2008-12-09 19:36:38 -08:00
Lachlan McIlroy	e055f13a6d	[XFS] Remove unused tracing code None of this code appears to be used anywhere so remove it. Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-10 11:51:54 +11:00
J. Bruce Fields	a4f4d6df53	EXPORTFS: handle NULL returns from fh_to_dentry()/fh_to_parent() While `440037287c` "[PATCH] switch all filesystems over to d_obtain_alias" removed some cases where fh_to_dentry() and fh_to_parent() could return NULL, there are still a few NULL returns left in individual filesystems. Thus it was a mistake for that commit to remove the handling of NULL returns in the callers. Revert those parts of `440037287c` which removed the NULL handling. (We could, alternatively, modify all implementations to return -ESTALE instead of NULL, but that proves to require fixing a number of filesystems, and in some cases it's arguably more natural to return NULL.) Thanks to David for original patch and Linus, Christoph, and Hugh for review. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: David Howells <dhowells@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-08 19:49:32 -08:00
Yinghai Lu	240d367b4e	sparseirq: fix Alpha build failure Impact: build fix on Alpha -tip testing found this build failure on the Alpha defconfig: /home/mingo/tip/fs/proc/stat.c: In function 'show_stat': /home/mingo/tip/fs/proc/stat.c:48: error: implicit declaration of function 'for_each_irq_desc' /home/mingo/tip/fs/proc/stat.c:48: error: expected ';' before '{' token can not use irq_desc() in stat.c on older architectures. Signed-off-by: Yinghai Lu <yinghai@kernel.orgg> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-09 04:16:54 +01:00
Yinghai Lu	0b8f1efad3	sparse irq_desc[] array: core kernel and x86 changes Impact: new feature Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with NR_CPUS set to large values. The goal is to be able to scale up to much larger NR_IRQS value without impacting the (important) common case. To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of irq_desc pointers. When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc, this also makes the IRQ descriptors NUMA-local (to the site that calls request_irq()). This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now uses desc->chip_data for x86 to store irq_cfg. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-08 14:31:51 +01:00
Jonathan Corbet	218d11a8b0	Fix a race condition in FASYNC handling Changeset `a238b790d5` (Call fasync() functions without the BKL) introduced a race which could leave file->f_flags in a state inconsistent with what the underlying driver/filesystem believes. Revert that change, and also fix the same races in ioctl_fioasync() and ioctl_fionbio(). This is a minimal, short-term fix; the real fix will not involve the BKL. Reported-by: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-05 15:35:10 -08:00
Ingo Molnar	970987beb9	Merge branches 'tracing/ftrace', 'tracing/function-graph-tracer' and 'tracing/urgent' into tracing/core	2008-12-05 14:45:22 +01:00
Linus Torvalds	bbeba4c35c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: [PATCH] fix bogus argument of blkdev_put() in pktcdvd [PATCH 2/2] documnt FMODE_ constants [PATCH 1/2] kill FMODE_NDELAY_NOW [PATCH] clean up blkdev_get a little bit [PATCH] Fix block dev compat ioctl handling [PATCH] kill obsolete temporary comment in swsusp_close()	2008-12-04 21:45:44 -08:00
Dave Chinner	576a488a27	[XFS] Fix hang after disallowed rename across directory quota domains When project quota is active and is being used for directory tree quota control, we disallow rename outside the current directory tree. This requires a check to be made after all the inodes involved in the rename are locked. We fail to unlock the inodes correctly if we disallow the rename when the target is outside the current directory tree. This results in a hang on the next access to the inodes involved in failed rename. Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: Dave Chinner <david@fromorbit.com> Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-05 15:39:13 +11:00
Lachlan McIlroy	14d676f56f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6	2008-12-05 15:27:43 +11:00
Lachlan McIlroy	797eaed40e	[XFS] Remove unnecessary assertion Hit this assert because an inode was tagged with XFS_ICI_RECLAIM_TAG but not XFS_IRECLAIMABLE\|XFS_IRECLAIM. This is because xfs_iget_cache_hit() first clears XFS_IRECLAIMABLE and then calls __xfs_inode_clear_reclaim_tag() while only holding the pag_ici_lock in read mode so we can race with xfs_reclaim_inodes_ag(). Looks like xfs_reclaim_inodes_ag() will do the right thing anyway so just remove the assert. Thanks to Christoph for pointing out where the problem was. Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Reviewed-by: Christoph Hellwig <hch@infradead.org>	2008-12-05 14:15:49 +11:00
Lachlan McIlroy	a5b429d41f	[XFS] Remove unused variable in ktrace_free() entries_size is probably left over from when we used to pass the size to kmem_free(). Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Reviewed-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Eric Sandeen <sandeen@sandeen.net>	2008-12-05 13:31:51 +11:00
Lachlan McIlroy	c6422617a1	[XFS] Check return value of xfs_buf_get_noaddr() We check the return value of all other calls to xfs_buf_get_noaddr(). Make sense to do it here too. Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Reviewed-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Eric Sandeen <sandeen@sandeen.net>	2008-12-05 13:16:15 +11:00
Dave Chinner	6a0775a991	[XFS] Fix hang after disallowed rename across directory quota domains When project quota is active and is being used for directory tree quota control, we disallow rename outside the current directory tree. This requires a check to be made after all the inodes involved in the rename are locked. We fail to unlock the inodes correctly if we disallow the rename when the target is outside the current directory tree. This results in a hang on the next access to the inodes involved in failed rename. Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: Dave Chinner <david@fromorbit.com> Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-05 12:50:04 +11:00
Christoph Hellwig	8bb57320f3	[XFS] Fix compile with CONFIG_COMPAT enabled Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-05 11:23:10 +11:00
Christoph Hellwig	fd4ce1acd0	[PATCH 1/2] kill FMODE_NDELAY_NOW Update FMODE_NDELAY before each ioctl call so that we can kill the magic FMODE_NDELAY_NOW. It would be even better to do this directly in setfl(), but for that we'd need to have FMODE_NDELAY for all files, not just block special files. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-12-04 04:22:57 -05:00
Christoph Hellwig	ebbefc011e	[PATCH] clean up blkdev_get a little bit The way the bd_claim for the FMODE_EXCL case is implemented is rather confusing. Clean it up to the most logical style. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-12-04 04:22:56 -05:00
Ingo Molnar	b8307db247	Merge commit 'v2.6.28-rc7' into tracing/core	2008-12-04 09:07:19 +01:00
James Morris	ec98ce480a	Merge branch 'master' into next Conflicts: fs/nfsd/nfs4recover.c Manually fixed above to use new creds API functions, e.g. nfs4_save_creds(). Signed-off-by: James Morris <jmorris@namei.org>	2008-12-04 17:16:36 +11:00
Christoph Hellwig	5a8d0f3c7a	move inode tracing out of xfs_vnode. Move the inode tracing into xfs_iget.c / xfs_inode.h and kill xfs_vnode.c now that it's empty. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:25 +11:00
Christoph Hellwig	25e41b3d52	move vn_iowait / vn_iowake into xfs_aops.c The whole machinery to wait on I/O completion is related to the I/O path and should be there instead of in xfs_vnode.c. Also give the functions more descriptive names. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:24 +11:00
Christoph Hellwig	583fa586f0	kill vn_ioerror There's just one caller of this helper, and it's much cleaner to just merge the xfs_do_force_shutdown call into it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:24 +11:00
Christoph Hellwig	f95099ba5a	kill xfs_unmount_flush There's almost nothing left in this function, instead remove the IRELE on the real times inodes and the call to XFS_QM_UNMOUNT into xfs_unmountfs. For the regular unmount case that means it now also happenes after dmapi notification, but otherwise there is no difference in behaviour. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:24 +11:00
Christoph Hellwig	e57481dc26	no explicit xfs_iflush for special inodes during unmount Currently we explicitly call xfs_iflush on the quota, real-time and root inodes from xfs_unmount_flush. But we just called xfs_sync_inodes with SYNC_ATTR and do an XFS_bflush aka xfs_flush_buftarg to make sure all inodes are on disk already, so there is no need for these special cases. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:23 +11:00
Christoph Hellwig	070c4616ec	use xfs_trans_ijoin in xfs_trans_iget Use xfs_trans_ijoin in xfs_trans_iget in case we need to join an inode into a transaction instead of opencoding it. Based on a discussion with and an incomplete patch from Niv Sardi. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:23 +11:00
Christoph Hellwig	b56757becf	remove leftovers of shared read-only support We never supported shared read-only filesystems, so remove the dead code left over from IRIX for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:23 +11:00
Christoph Hellwig	e88f11abe0	remove unused m_inode_quiesce member from struct xfs_mount Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:22 +11:00
Christoph Hellwig	6bd16ff270	kill dead inode flags There are a few inode flags around that aren't used anywhere, so remove them. Also update xfsidbg to display all used inode flags correctly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:22 +11:00
Christoph Hellwig	5efcbb853b	cleanup xfs_sb.h feature flag helpers The various inlines in xfs_sb.h that deal with the superblock version and fature flags were converted from macros a while ago, and this show by the odd coding style full of useless braces and backslashes and the avoidance of conditionals. Clean these up to look like normal C code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:22 +11:00
Christoph Hellwig	df6771bde1	kill dead quota flags Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:22 +11:00
Christoph Hellwig	63ad2a5c4c	remove dead code from sv_t implementation Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:21 +11:00
Christoph Hellwig	39e2defe73	reduce l_icloglock roundtrips All but one caller of xlog_state_want_sync drop and re-acquire l_icloglock around the call to it, just so that xlog_state_want_sync can acquire and drop it. Move all lock operation out of l_icloglock and assert that the lock is held when it is called. Note that it would make sense to extende this scheme to xlog_state_release_iclog, but the locking in there is more complicated and we'd like to keep the atomic_dec_and_lock optmization for those callers not having l_icloglock yet. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:21 +11:00
Christoph Hellwig	d9424b3c4a	stop using igrab in xfs_vn_link ->link is guranteed to get an already reference inode passed so we can do a simple increment of i_count instead of using igrab and thus avoid banging on the global inode_lock. This is what most filesystems already do. Also move the increment after the call to xfs_link to simplify error handling. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:21 +11:00
Christoph Hellwig	5d765b976c	kill xfs_buf_iostart xfs_buf_iostart is a "shared" helper for xfs_buf_read_flags, xfs_bawrite, and xfs_bdwrite - except that there isn't much shared code but rather special cases for each caller. So remove this function and move the functionality to the caller. xfs_bawrite and xfs_bdwrite are now big enough to be moved out of line and the xfs_buf_read_flags is moved into a new helper called _xfs_buf_read. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:20 +11:00
Christoph Hellwig	5cafdeb289	cleanup the inode reclaim path Merge xfs_iextract and xfs_idestroy into xfs_ireclaim as they are never called individually. Also rewrite most comments in this area as they were severly out of date. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:20 +11:00
Christoph Hellwig	ccd0be6cfc	remove unused prototypes for xfs_ihash_init / xfs_ihash_free Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:20 +11:00
Christoph Hellwig	73e6335c14	remove unused behvavior cruft in xfs_super.h Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:19 +11:00
Christoph Hellwig	2234d54d3d	remove useless mnt_want_write call in xfs_write When mnt_want_write was introduced a call to it was added around xfs_ichgtime, but there is no need for this because a file can't be open read/write on a r/o mount, and a mount can't degrade r/o while we still have files open for writing. As the mnt_want_write changes were never merged into the CVS tree this patch is for mainline only. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:19 +11:00
Christoph Hellwig	ddcd856d81	[XFS] fix compile on 32 bit systems The recent compat patches make xfs_file.c include xfs_ioctl32.h unconditional, which breaks the build on 32 bit systems which don't have the various compat defintions. Remove the include and move the defintion of xfs_file_compat_ioctl to xfs_ioctl.h so that we can avoid including all the compat defintions in xfs_file.c Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-04 13:07:29 +11:00
Linus Torvalds	2433c41789	Merge branch 'for-2.6.28' of git://linux-nfs.org/~bfields/linux * 'for-2.6.28' of git://linux-nfs.org/~bfields/linux: NLM: client-side nlm_lookup_host() should avoid matching on srcaddr nfsd: use of unitialized list head on error exit in nfs4recover.c Add a reference to sunrpc in svc_addsock nfsd: clean up grace period on early exit	2008-12-03 16:40:37 -08:00
David S. Miller	aa2ba5f108	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/ixgbe/ixgbe_main.c drivers/net/smc91x.c	2008-12-02 19:50:27 -08:00
Linus Torvalds	51eaaa6776	Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6 * 'linux-next' of git://git.infradead.org/ubifs-2.6: UBIFS: pre-allocate bulk-read buffer UBIFS: do not allocate too much UBIFS: do not print scary memory allocation warnings UBIFS: allow for gaps when dirtying the LPT UBIFS: fix compilation warnings MAINTAINERS: change UBI/UBIFS git tree URLs UBIFS: endian handling fixes and annotations UBIFS: remove printk	2008-12-02 15:56:55 -08:00
sandeen@sandeen.net	e5d412f178	[XFS] Reorder xfs_ioctl32.c for some tidiness Put things in IMHO a more readable order, now that it's all done; add some comments. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:18:21 +11:00
sandeen@sandeen.net	710d62aaaf	[XFS] Hook up compat XFS_IOC_FSSETDM_BY_HANDLE ioctl handler Add a compat handler for XFS_IOC_FSSETDM_BY_HANDLE. I haven't tested this, lacking dmapi tools to do so (unless xfsqa magically gets this somehow?) Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:17:43 +11:00
sandeen@sandeen.net	28750975ac	[XFS] Hook up compat XFS_IOC_ATTRMULTI_BY_HANDLE ioctl handler Add a compat handler for XFS_IOC_ATTRMULTI_BY_HANDLE Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:17:07 +11:00
sandeen@sandeen.net	ebeecd2b04	[XFS] Hook up compat XFS_IOC_ATTRLIST_BY_HANDLE ioctl handler Add a compat handler for XFS_IOC_ATTRLIST_BY_HANDLE Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:16:45 +11:00
sandeen@sandeen.net	af819d2763	[XFS] Fix compat XFS_IOC_FSBULKSTAT_SINGLE ioctl The XFS_IOC_FSBULKSTAT_SINGLE ioctl passes in the desired inode number, while XFS_IOC_FSBULKSTAT passes in the previous/last-stat'd inode number. The compat handler wasn't differentiating these, so when a XFS_IOC_FSBULKSTAT_SINGLE request for inode 128 was sent in, stat information for 131 was sent out. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:16:24 +11:00
sandeen@sandeen.net	65fbaf2489	[XFS] Fix xfs_bulkstat_one size checks & error handling The 32-bit xfs_blkstat_one handler was failing because a size check checked whether the remaining (32-bit) user buffer was less than the (64-bit) bulkstat buffer, and failed with ENOMEM if so. Move this check into the respective handlers so that they check the correct sizes. Also, the formatters were returning negative errors or positive bytes copied; this was odd in the positive error value world of xfs, and handled wrong by at least some of the callers, which treated the bytes returned as an error value. Move the bytes-used assignment into the formatters. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:16:03 +11:00
sandeen@sandeen.net	2ee4fa5cb7	[XFS] Make the bulkstat_one compat ioctl handling more sane Currently the compat formatter was handled by passing in "private_data" for the xfs_bulkstat_one formatter, which was really just another formatter... IMHO this got confusing. Instead, just make a new xfs_bulkstat_one_compat formatter for xfs_bulkstat, and call it via a wrapper. Also, don't translate the ioctl nrs into their native counterparts, that just clouds the issue; we're in a compat handler anyway, just switch on the 32-bit cmds. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:15:36 +11:00
sandeen@sandeen.net	471d591031	[XFS] Add compat handlers for data & rt growfs ioctls The args for XFS_IOC_FSGROWFSDATA and XFS_IOC_FSGROWFSRTA have padding on the end on intel, so add arg copyin functions, and then just call the growfs ioctl helpers. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:15:09 +11:00
sandeen@sandeen.net	e94fc4a43e	[XFS] Add compat handlers for swapext ioctl The big hitter here was the bstat field, which contains different sized time_t on 32 vs. 64 bit. Add a copyin function to translate the 32-bit arg to 64-bit, and call the swapext ioctl helper. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:10:04 +11:00
sandeen@sandeen.net	d5547f9fee	[XFS] Clean up some existing compat ioctl calls Create a new xfs_ioctl.h file which has prototypes for ioctl helpers that may be called in compat mode. Change several compat ioctl cases which are IOW to simply copy in the userspace argument, then call the common ioctl helper. This also fixes xfs_compat_ioc_fsgeometry_v1(), which had it backwards before; it copied in an (empty) arg, then copied out the native result, which probably corrupted userspace. It should be translating on the copyout. Also, a bit of formatting cleanup for consistency, and conversion of all error returns to use XFS_ERROR(). Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:09:43 +11:00
sandeen@sandeen.net	ffae263a64	[XFS] Move compat ioctl structs & numbers into xfs_ioctl32.h This makes the c file less cluttered and a bit more readable. Consistently name the ioctl number macros with "_32" and the compatibility stuctures with "_compat." Rename the helpers which simply copy in the arg with "_copyin" for easy identification. Finally, for a few of the existing helpers, modify them so that they directly call the native ioctl helper after userspace argument fixup. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:08:44 +11:00
sandeen@sandeen.net	743bb4650d	[XFS] Move copy_from_user calls out of ioctl helpers into ioctl switch. Moving the copy_from_user out of some of the ioctl helpers will make it easier for the compat ioctl switch to copy in the right struct, then just pass to the underlying helper. Also, move common access checks into the helpers themselves, and out of the native ioctl switch code, to reduce code duplication between native & compat ioctl callers. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-02 17:08:01 +11:00
Randy Dunlap	0380155363	ntfs: don't fool kernel-doc kernel-doc handles macros now (it has for quite some time), so change the ntfs_debug() macro's kernel-doc to be just before the macro instead of before a phony function prototype. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-01 19:55:25 -08:00
Davide Libenzi	7ef9964e6d	epoll: introduce resource usage limits It has been thought that the per-user file descriptors limit would also limit the resources that a normal user can request via the epoll interface. Vegard Nossum reported a very simple program (a modified version attached) that can make a normal user to request a pretty large amount of kernel memory, well within the its maximum number of fds. To solve such problem, default limits are now imposed, and /proc based configuration has been introduced. A new directory has been created, named /proc/sys/fs/epoll/ and inside there, there are two configuration points: max_user_instances = Maximum number of devices - per user max_user_watches = Maximum number of "watched" fds - per user The current default for "max_user_watches" limits the memory used by epoll to store "watches", to 1/32 of the amount of the low RAM. As example, a 256MB 32bit machine, will have "max_user_watches" set to roughly 90000. That should be enough to not break existing heavy epoll users. The default value for "max_user_instances" is set to 128, that should be enough too. This also changes the userspace, because a new error code can now come out from EPOLL_CTL_ADD (-ENOSPC). The EMFILE from epoll_create() was already listed, so that should be ok. [akpm@linux-foundation.org: use get_current_user()] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: <stable@kernel.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Reported-by: Vegard Nossum <vegardno@ifi.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-01 19:55:24 -08:00
Mark Fasheh	d6b58f89f7	ocfs2: fix regression in ocfs2_read_blocks_sync() We're panicing in ocfs2_read_blocks_sync() if a jbd-managed buffer is seen. At first glance, this seems ok but in reality it can happen. My test case was to just run 'exorcist'. A struct inode is being pushed out of memory but is then re-read at a later time, before the buffer has been checkpointed by jbd. This causes a BUG to be hit in ocfs2_read_blocks_sync(). Reviewed-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-12-01 14:46:58 -08:00
Coly Li	07d9a3954a	ocfs2: fix return value set in init_dlmfs_fs() In init_dlmfs_fs(), if calling kmem_cache_create() failed, the code will use return value from calling bdi_init(). The correct behavior should be set status as -ENOMEM before going to "bail:". Signed-off-by: Coly Li <coyli@suse.de> Acked-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-12-01 14:46:55 -08:00
David Teigland	07f9eebcdf	ocfs2: fix wake_up in unlock_ast In ocfs2_unlock_ast(), call wake_up() on lockres before releasing the spin lock on it. As soon as the spin lock is released, the lockres can be freed. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-12-01 14:46:45 -08:00
David Teigland	66f502a416	ocfs2: initialize stack_user lvbptr The locking_state dump, ocfs2_dlm_seq_show, reads the lvb on locks where it has not yet been initialized by a lock call. Signed-off-by: David Teigland <teigland@redhat.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-12-01 14:46:39 -08:00
Coly Li	3b5da0189c	ocfs2: comments typo fix This patch fixes two typos in comments of ocfs2. Signed-off-by: Coly Li <coyli@suse.de> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-12-01 14:46:31 -08:00
Christoph Hellwig	0e446673a1	[XFS] fix error handling in xlog_recover_process_one_iunlink If we fail after xfs_iget we have to drop the reference count, spotted by Dave Chinner. Also remove some useless asserts and stop trying to deal with di_mode == 0 inodes because never gets those without passing the IGET_CREATE flag to xfs_iget. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:38:22 +11:00
Christoph Hellwig	24f211bad0	[XFS] move inode allocation out xfs_iread Allocate the inode in xfs_iget_cache_miss and pass it into xfs_iread. This simplifies the error handling and allows xfs_iread to be shared with userspace which already uses these semantics. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:38:17 +11:00

1 2 3 4 5 ...

11148 Commits