Commit Graph

34219 Commits

Author SHA1 Message Date
Pavel Emelyanov
48f6a7a511 posix-timers: Introduce /proc/PID/timers file
Currently kernel doesn't provide any API for getting info about what
posix timers are configured by processes. It's implied, that a process
which configured some timers, knows what it did. However, for external
tools it's impossible to get this information. In particular, this is
critical for checkpoint-restore project to have this info.

Introduce a per-pid proc file with information about posix
timers. Since these timers are shared between threads, this file is
present on tgid level only, no such thing in tid subdirs.

The file format is expected to be the "/proc/<pid>/smaps"-like,
i.e. each timer will occupy seveal lines to allow for future
extending.

Each new timer entry starts with the

ID: <number>

line which is added by this patch.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Matthew Helsley <matt.helsley@gmail.com>
Link: http://lkml.kernel.org/r/513DA00D.6070009@parallels.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-04-17 20:51:01 +02:00
Tom Gundersen
a9499fa7cd efi: split efisubsystem from efivars
This registers /sys/firmware/efi/{,systab,efivars/} whenever EFI is enabled
and the system is booted with EFI.

This allows
 *) userspace to check for the existence of /sys/firmware/efi as a way
    to determine whether or it is running on an EFI system.
 *) 'mount -t efivarfs none /sys/firmware/efi/efivars' without manually
    loading any modules.

[ Also, move the efivar API into vars.c and unconditionally compile it.
  This allows us to move efivars.c, which now only contains the sysfs
  variable code, into the firmware/efi directory. Note that the efivars.c
  filename is kept to maintain backwards compatability with the old
  efivars.ko module. With this patch it is now possible for efivarfs
  to be built without CONFIG_EFI_VARS - Matt ]

Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Mike Waychison <mikew@google.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Chun-Yi Lee <jlee@suse.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Tobias Powalowski <tpowa@archlinux.org>
Signed-off-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-04-17 13:27:06 +01:00
Matt Fleming
d68772b7c8 efivarfs: Move to fs/efivarfs
Now that efivarfs uses the efivar API, move it out of efivars.c and
into fs/efivarfs where it belongs. This move will eventually allow us
to enable the efivarfs code without having to also enable
CONFIG_EFI_VARS built, and vice versa.

Furthermore, things like,

    mount -t efivarfs none /sys/firmware/efi/efivars

will now work if efivarfs is built as a module without requiring the
use of MODULE_ALIAS(), which would have been necessary when the
efivarfs code was part of efivars.c.

Cc: Matthew Garrett <matthew.garrett@nebula.com>
Cc: Jeremy Kerr <jk@ozlabs.org>
Reviewed-by: Tom Gundersen <teg@jklm.no>
Tested-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-04-17 13:25:09 +01:00
Maxim Patlasov
722d2bea8c fuse: implement exclusive wakeup for blocked_waitq
The patch solves thundering herd problem. So far as previous patches ensured
that only allocations for background may block, it's safe to wake up one
waiter. Whoever it is, it will wake up another one in request_end() afterwards.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-04-17 12:31:45 +02:00
Maxim Patlasov
0aada88476 fuse: skip blocking on allocations of synchronous requests
A task may have at most one synchronous request allocated. So these
requests need not be otherwise limited.

The patch re-works fuse_get_req() to follow this idea.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-04-17 12:31:45 +02:00
Maxim Patlasov
796523fb24 fuse: add flag fc->initialized
Existing flag fc->blocked is used to suspend request allocation both in case
of many background request submitted and period of time before init_reply
arrives from userspace. Next patch will skip blocking allocations of
synchronous request (disregarding fc->blocked). This is mostly OK, but
we still need to suspend allocations if init_reply is not arrived yet. The
patch introduces flag fc->initialized which will serve this purpose.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-04-17 12:31:44 +02:00
Maxim Patlasov
8b41e6715e fuse: make request allocations for background processing explicit
There are two types of processing requests in FUSE: synchronous (via
fuse_request_send()) and asynchronous (via adding to fc->bg_queue).

Fortunately, the type of processing is always known in advance, at the time
of request allocation. This preparatory patch utilizes this fact making
fuse_get_req() aware about the type. Next patches will use it.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-04-17 12:31:44 +02:00
Fengguang Wu
ba138435d1 nfsd4: put_client_renew_locked can be static
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-16 22:15:00 -04:00
J. Bruce Fields
9aeb5aeeb0 nfsd4: remove unused macro
Cleanup a piece I forgot to remove in
9411b1d4c7 "nfsd4: cleanup handling of
nfsv4.0 closed stateid's".

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-16 21:51:55 -04:00
Trond Myklebust
549b19cc9f NFSv4: Record the OPEN create mode used in the nfs4_opendata structure
If we're doing NFSv4.1 against a server that has persistent sessions,
then we should not need to call SETATTR in order to reset the file
attributes immediately after doing an exclusive create.

Note that since the create mode depends on the type of session that
has been negotiated with the server, we should not choose the
mode until after we've got a session slot.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-16 18:58:26 -04:00
Jeff Liu
7fe3258c50 xfs: Update xfs_log_commit_cil() comments
xfs_log_commit_iclog() function has been removed by commits 93b8a585:
	xfs: remove the deprecated nodelaylog option

Beginning from Linux 3.3, only delayed logging is supported so that
we call xfs_log_commit_cil() at xfs_trans_commit() only, remove the
useless comments so.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-16 13:20:03 -05:00
Jeff Liu
d4fd0e92fb xfs: Remove the obsolete XLOG_CIL_HARD_SPACE_LIMIT() macros
There is no more users of this Macro, so it's time to kill it dead.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-16 13:18:33 -05:00
fanchaoting
53584f6652 nfsd4: remove some useless code
The "list_empty(&oo->oo_owner.so_stateids)" is aways true, so remove it.

Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-16 10:59:31 -04:00
J. Bruce Fields
3bd64a5ba1 nfsd4: implement SEQ4_STATUS_RECALLABLE_STATE_REVOKED
A 4.1 server must notify a client that has had any state revoked using
the SEQ4_STATUS_RECALLABLE_STATE_REVOKED flag.  The client can figure
out exactly which state is the problem using CHECK_STATEID and then free
it using FREE_STATEID.  The status flag will be unset once all such
revoked stateids are freed.

Our server's only recallable state is delegations.  So we keep with each
4.1 client a list of delegations that have timed out and been recalled,
but haven't yet been freed by FREE_STATEID.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-16 10:59:30 -04:00
Linus Torvalds
bb33db7a07 Merge branches 'timers-urgent-for-linus', 'irq-urgent-for-linus' and 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull {timer,irq,core} fixes from Thomas Gleixner:

 - timer: bug fix for a cpu hotplug race.

 - irq: single bugfix for a wrong return value, which prevents the
   calling function to invoke the software fallback.

 - core: bugfix which plugs two race confitions which can cause hotplug
   per cpu threads to end up on the wrong cpu.

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hrtimer: Don't reinitialize a cpu_base lock on CPU_UP

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip: gic: fix irq_trigger return

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kthread: Prevent unpark race which puts threads on the wrong cpu
2013-04-15 07:03:01 -07:00
Greg Kroah-Hartman
0d1d392f01 Merge 3.9-rc7 into driver-core-next
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-14 18:37:05 -07:00
Linus Torvalds
3792a64fde Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull one more btrfs fix from Chris Mason:
 "This has a recent fix from Josef for our tree log replay code.  It
  fixes problems where the inode counter for the number of bytes in the
  file wasn't getting updated properly during fsync replay.

  The commit did get rebased this morning, but it was only to clean up
  the subject line.  The code hasn't changed."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: make sure nbytes are right after log replay
2013-04-14 10:52:54 -07:00
Trond Myklebust
98f98cf571 NFSv4.1: Set the RPC_CLNT_CREATE_INFINITE_SLOTS flag for NFSv4.1 transports
This ensures that the RPC layer doesn't override the NFS session
negotiation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-14 12:59:28 -04:00
Suleiman Souhlal
5b55d70833 vfs: Revert spurious fix to spinning prevention in prune_icache_sb
Revert commit 62a3ddef61 ("vfs: fix spinning prevention in prune_icache_sb").

This commit doesn't look right: since we are looking at the tail of the
list (sb->s_inode_lru.prev) if we want to skip an inode, we should put
it back at the head of the list instead of the tail, otherwise we will
keep spinning on it.

Discovered when investigating why prune_icache_sb came top in perf
reports of a swapping load.

Signed-off-by: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v3.2+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-13 16:13:55 -07:00
Josef Bacik
4bc4bee459 Btrfs: make sure nbytes are right after log replay
While trying to track down a tree log replay bug I noticed that fsck was always
complaining about nbytes not being right for our fsynced file.  That is because
the new fsync stuff doesn't wait for ordered extents to complete, so the inodes
nbytes are not necessarily updated properly when we log it.  So to fix this we
need to set nbytes to whatever it is on the inode that is on disk, so when we
replay the extents we can just add the bytes that are being added as we replay
the extent.  This makes it work for the case that we have the wrong nbytes or
the case that we logged everything and nbytes is actually correct.  With this
I'm no longer getting nbytes errors out of btrfsck.

Cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-04-13 07:35:06 -04:00
Linus Torvalds
0b1fd266bf Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fix from Steve French:
 "Fixes a regression in cifs in which a password which begins with a
  comma is parsed incorrectly as a blank password"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Allow passwords which begin with a delimitor
2013-04-12 15:18:20 -07:00
Trond Myklebust
b570a975ed NFSv4: Fix handling of revoked delegations by setattr
Currently, _nfs4_do_setattr() will use the delegation stateid if no
writeable open file stateid is available.
If the server revokes that delegation stateid, then the call to
nfs4_handle_exception() will fail to handle the error due to the
lack of a struct nfs4_state, and will just convert the error into
an EIO.

This patch just removes the requirement that we must have a
struct nfs4_state in order to invalidate the delegation and
retry.

Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-12 15:21:15 -04:00
Masanari Iida
a895d57da0 treewide: Fix typo in printks
Correct spelling typos in printk and comments.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-04-12 15:21:36 +02:00
Thomas Gleixner
f2530dc71c kthread: Prevent unpark race which puts threads on the wrong cpu
The smpboot threads rely on the park/unpark mechanism which binds per
cpu threads on a particular core. Though the functionality is racy:

CPU0	       	 	CPU1  	     	    CPU2
unpark(T)				    wake_up_process(T)
  clear(SHOULD_PARK)	T runs
			leave parkme() due to !SHOULD_PARK  
  bind_to(CPU2)		BUG_ON(wrong CPU)						    

We cannot let the tasks move themself to the target CPU as one of
those tasks is actually the migration thread itself, which requires
that it starts running on the target cpu right away.

The solution to this problem is to prevent wakeups in park mode which
are not from unpark(). That way we can guarantee that the association
of the task to the target cpu is working correctly.

Add a new task state (TASK_PARKED) which prevents other wakeups and
use this state explicitly for the unpark wakeup.

Peter noticed: Also, since the task state is visible to userspace and
all the parked tasks are still in the PID space, its a good hint in ps
and friends that these tasks aren't really there for the moment.

The migration thread has another related issue.

CPU0	      	     	 CPU1
Bring up CPU2
create_thread(T)
park(T)
 wait_for_completion()
			 parkme()
			 complete()
sched_set_stop_task()
			 schedule(TASK_PARKED)

The sched_set_stop_task() call is issued while the task is on the
runqueue of CPU1 and that confuses the hell out of the stop_task class
on that cpu. So we need the same synchronizaion before
sched_set_stop_task().

Reported-by: Dave Jones <davej@redhat.com>
Reported-and-tested-by: Dave Hansen <dave@sr71.net>
Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Acked-by: Peter Ziljstra <peterz@infradead.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: dhillf@gmail.com
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304091635430.21884@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-04-12 14:18:43 +02:00
Jan Kara
7b001d6a0c ext4: clear buffer_uninit flag when submitting IO
Currently noone cleared buffer_uninit flag. This results in writeback
needlessly marking io_end as needing extent conversion scanning extent
tree for extents to convert. So clear the buffer_uninit flag once the
buffer is submitted for IO and the flag is transformed into
EXT4_IO_END_UNWRITTEN flag.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
2013-04-12 00:03:19 -04:00
Jan Kara
4eec708d26 ext4: use io_end for multiple bios
Change writeback path to create just one io_end structure for the
extent to which we submit IO and share it among bios writing that
extent. This prevents needless splitting and joining of unwritten
extents when they cannot be submitted as a single bio.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
2013-04-11 23:56:53 -04:00
Jan Kara
0058f9658c ext4: make ext4_bio_write_page() use BH_Async_Write flags
So far ext4_bio_write_page() attached all the pages to ext4_io_end
structure.  This makes that structure pretty heavy (1 KB for pointers
+ 16 bytes per page attached to the bio).  Also later we would like to
share ext4_io_end structure among several bios in case IO to a single
extent needs to be split among several bios and pointing to pages from
ext4_io_end makes this complex.

We remove page pointers from ext4_io_end and use pointers from bio
itself instead.  This isn't as easy when blocksize < pagesize because
then we can have several bios in flight for a single page and we have
to be careful when to call end_page_writeback().  However this is a
known problem already solved by block_write_full_page() /
end_buffer_async_write() so we mimic its behavior here.  We mark
buffers going to disk with BH_Async_Write flag and in
ext4_bio_end_io() we check whether there are any buffers with
BH_Async_Write flag left.  If there are not, we can call
end_page_writeback().

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
2013-04-11 23:48:32 -04:00
Lukas Czerner
e1091b157c ext4: Use kstrtoul() instead of parse_strtoul()
In parse_strtoul() we're still using deprecated simple_strtoul().  Remove
parse_strtoul() altogether and replace it with kstrtoul()

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-04-11 23:37:19 -04:00
Dmitry Monakhov
7e8b12c60a ext4: defragmentation code cleanup
- grab_cache_page_write_begin() may not wait on page's writeback since
  (1d1d1a7672). But it is still reasonable to wait on page's writeback
  here in order to be on the safe side.

- Fix miss typo: pass 'length' instead of 'end' to __block_write_begin()
  https://bugzilla.kernel.org/show_bug.cgi?id=56241

TESTCASE: git://oss.sgi.com/xfs/cmds/xfstests.git
MKFS_OPTIONS="-b1024" ; ./check ext4/304

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Akira Fujita <a-fujita.rs.jp.nec.com>
2013-04-11 23:24:58 -04:00
Lukas Czerner
43e50f5086 ext4: do not convert to indirect with bigalloc enabled
With bigalloc feature enabled we do not support indirect addressing at all
so we have to prevent extent addressing to indirect addressing
conversion in this case. The problem has been introduced with the commit
"ext4: support simple conversion of extent-mapped inodes to use i_blocks"

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-04-11 10:54:46 -04:00
Andy Adamson
b9536ad521 NFSv4 release the sequence id in the return on close case
Otherwise we deadlock if state recovery is initiated while we
sleep.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-11 09:39:53 -04:00
Lukas Czerner
0d14b098ce ext4: move ext4_ind_migrate() into migrate.c
Move ext4_ind_migrate() into migrate.c file since it makes much more
sense and ext4_ext_migrate() is there as well.

Also fix tiny style problem - add spaces around "=" in "i=0".

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-04-10 23:32:52 -04:00
Sachin Prabhu
c369c9a4a7 cifs: Allow passwords which begin with a delimitor
Fixes a regression in cifs_parse_mount_options where a password
which begins with a delimitor is parsed incorrectly as being a blank
password.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-04-10 15:54:14 -05:00
Jeff Layton
314d7cc05d nfs: remove unnecessary check for NULL inode->i_flock from nfs_delegation_claim_locks
The second check was added in commit 65b62a29 but it will never be true.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-10 15:40:31 -04:00
Linus Torvalds
51de017007 Merge tag 'nfs-for-3.9-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull another nfs fixlet from Trond Myklebust:
 "I suddenly noticed that a one-line issue that I _thought_ I had fixed
  with the nfs41_walk_client_list patch was apparently still there in
  the pull request I sent earlier today.  I'm very sorry for not
  catching that in time.

   - Fix a brain fart in nfs41_walk_client_list"

* tag 'nfs-for-3.9-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Doh! Typo in the fix to nfs41_walk_client_list
2013-04-10 10:26:49 -07:00
Trond Myklebust
eb04e0ac19 NFSv4: Doh! Typo in the fix to nfs41_walk_client_list
Make sure that we set the status to 0 on success. Missed in testing
because it never appears when doing multiple mounts to _different_
servers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: <stable@vger.kernel.org> # 3.7.x: 7b1f1fd: NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list
2013-04-10 12:57:29 -04:00
Linus Torvalds
f94eeb423b Merge tag 'nfs-for-3.9-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
 - fix for memory corruption issues in nfs4[01]_walk_client_list (stable)
 - fix for an Oopsable bug in rpc_clone_client (stable)
 - another state manager deadlock in the NFSv4 open code
 - memory leaks in nfs4_discover_server_trunking and rpc_new_client

* tag 'nfs-for-3.9-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Fix another potential state manager deadlock
  SUNRPC: Fix a potential memory leak in rpc_new_client
  NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list
  NFSv4: Fix a memory leak in nfs4_discover_server_trunking
  SUNRPC: Remove extra xprt_put()
2013-04-10 09:00:51 -07:00
Steven Whitehouse
7bd8b2eb32 GFS2: Add origin indicator to glock demote tracing
This adds the origin indicator to the trace point for glock
demotion, so that it is possible to see where demote requests
have come from.

Note that requests generated from the demote_rq sysfs interface
will show as remote, since they are intended to replicate
exactly the effect of a demote reuqest from a remote node. It
is still possible to tell these apart by looking at the process
which initiated the demote request.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-10 10:32:05 +01:00
Steven Whitehouse
81ffbf654f GFS2: Add origin indicator to glock callbacks
This patch adds a bool indicating whether the demote
request was originated locally or remotely. This is then
used by the iopen ->go_callback() to make 100% sure that
it will only respond to remote callbacks.

Since ->evict_inode() uses GL_NOCACHE when it attempts to
get an exclusive lock on the iopen lock, this may result
in extra scheduling of the workqueue in case that the
exclusive promotion request failed. This patch prevents
that from happening.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-10 10:26:55 +01:00
Theodore Ts'o
d6a771056b ext4: fix miscellaneous big endian warnings
None of these result in any bug, but they makes sparse complain.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-04-09 23:59:55 -04:00
Dmitry Monakhov
171a7f21a7 ext4: fix big-endian bug in metadata checksum calculations
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2013-04-09 23:56:48 -04:00
Dmitry Monakhov
0b65349ebc ext4: fix big-endian bug in extent migration code
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2013-04-09 23:56:44 -04:00
Dmitri Monakho
8c8e0ca622 ext4: fix usless declarations
This patch should fix sparse complains about shadow declatations.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-04-09 22:48:36 -04:00
Lukas Czerner
27dd438542 ext4: introduce reserved space
Currently in ENOSPC condition when writing into unwritten space, or
punching a hole, we might need to split the extent and grow extent tree.
However since we can not allocate any new metadata blocks we'll have to
zero out unwritten part of extent or punched out part of extent, or in
the worst case return ENOSPC even though use actually does not allocate
any space.

Also in delalloc path we do reserve metadata and data blocks for the
time we're going to write out, however metadata block reservation is
very tricky especially since we expect that logical connectivity implies
physical connectivity, however that might not be the case and hence we
might end up allocating more metadata blocks than previously reserved.
So in future, metadata reservation checks should be removed since we can
not assure that we do not under reserve.

And this is where reserved space comes into the picture. When mounting
the file system we slice off a little bit of the file system space (2%
or 4096 clusters, whichever is smaller) which can be then used for the
cases mentioned above to prevent costly zeroout, or unexpected ENOSPC.

The number of reserved clusters can be set via sysfs, however it can
never be bigger than number of free clusters in the file system.

Note that this patch fixes the failure of xfstest 274 as expected.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2013-04-09 22:11:22 -04:00
J. Bruce Fields
23340032e6 nfsd4: clean up validate_stateid
The logic here is better expressed with a switch statement.

While we're here, CLOSED stateids (or stateids of an unkown type--which
would indicate a server bug) should probably return nfserr_bad_stateid,
though this behavior shouldn't affect any non-buggy client.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-09 17:42:28 -04:00
J. Bruce Fields
06b332a522 nfsd4: check backchannel attributes on create_session
Make sure the client gives us an adequate backchannel.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-09 16:53:56 -04:00
J. Bruce Fields
55c760cfc4 nfsd4: fix forechannel attribute negotiation
Negotiation of the 4.1 session forechannel attributes is a mess.  Fix:

	- Move it all into check_forechannel_attrs instead of spreading
	  it between that, alloc_session, and init_forechannel_attrs.
	- set a minimum "slotsize" so that our drc memory limits apply
	  even for small maxresponsesize_cached.  This also fixes some
	  bugs when slotsize becomes <= 0.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-09 16:43:44 -04:00
J. Bruce Fields
373cd4098a nfsd4: cleanup check_forechannel_attrs
Pass this struct by reference, not by value, and return an error instead
of a boolean to allow for future additions.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-09 15:49:50 -04:00
Linus Torvalds
e8f2b548de Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "A nasty bug in fs/namespace.c caught by Andrey + a couple of less
  serious unpleasantness - ecryptfs misc device playing hopeless games
  with try_module_get() and palinfo procfs support being...  not quite
  correctly done, to be polite."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  mnt: release locks on error path in do_loopback
  palinfo fixes
  procfs: add proc_remove_subtree()
  ecryptfs: close rmmod race
2013-04-09 12:22:49 -07:00
Al Viro
05c0ae21c0 try a saner locking for pde_opener...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 15:16:52 -04:00