Commit Graph

11446 Commits

Author SHA1 Message Date
Oleg Nesterov
8187926bda posix-timers: simplify de_thread()->exit_itimers() path
Impact: simplify code

de_thread() postpones release_task(leader) until after exit_itimers().
This was needed because !SIGEV_THREAD_ID timers could use ->group_leader
without get_task_struct(). With the recent changes we can release the
leader earlier and simplify the code.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-12-12 17:01:03 +01:00
Lachlan McIlroy
4d9d4ebf5d Merge branch 'master' of git+ssh://git.melbourne.sgi.com/git/xfs 2008-12-12 15:28:02 +11:00
Lachlan McIlroy
cfbe52672f [XFS] set b_error from bio error in xfs_buf_bio_end_io
Preserve any error returned by the bio layer.

Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-12 15:27:25 +11:00
Christoph Hellwig
c4cd747ee6 [XFS] use inode_change_ok for setattr permission checking
Instead of implementing our own checks use inode_change_ok to check for
necessary permission in setattr.  There is a slight change in behaviour
as inode_change_ok doesn't allow i_mode updates to add the suid or sgid
without superuser privilegues while the old XFS code just stripped away
those bits from the file mode.

(First sent on Semptember 29th)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-11 13:15:10 +11:00
Christoph Hellwig
4d4be482a4 [XFS] add a FMODE flag to make XFS invisible I/O less hacky
XFS has a mode called invisble I/O that doesn't update any of the
timestamps.  It's used for HSM-style applications and exposed through
the nasty open by handle ioctl.

Instead of doing directly assignment of file operations that set an
internal flag for it add a new FMODE_NOCMTIME flag that we can check
in the normal file operations.

(addition of the generic VFS flag has been ACKed by Al as an interims
 solution)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-11 13:14:41 +11:00
Christoph Hellwig
6d73cf133c [XFS] resync headers with libxfs
- xfs_sb.h add the XFS_SB_VERSION2_PARENTBIT features2 that has been
   around in userspace for some time
 - xfs_inode.h: move a few things out of __KERNEL__ that are needed by
   userspace
 - xfs_mount.h: only include xfs_sync.h under __KERNEL__
 - xfs_inode.c: minor whitespace fixup.  I accidentaly changes this when
   importing this file for use by userspace.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-11 13:14:17 +11:00
Christoph Hellwig
2175dd9574 [XFS] simplify projid check in xfs_rename
Check for the project ID after attaching all inodes to the transaction.
That way the unlock in the error case is done by the transaction subsystem,
which guaratees that is uses the right flags (which was wrong from day one
of this check), and avoids having special code unlocking an array of inodes
with potential duplicates.  Attaching the inode first is the method used
by xfs_rename and the other namespace methods all other error that require
multiple locked inodes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-11 13:13:52 +11:00
Christoph Hellwig
15ac08a8b2 [XFS] replace b_fspriv with b_mount
Replace the b_fspriv pointer and it's ugly accessors with a properly types
xfs_mount pointer.  Also switch log reocvery over to it instead of using
b_fspriv for the mount pointer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-11 13:13:33 +11:00
Linus Torvalds
f4fd2c5b6f Merge branch 'to-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland
* 'to-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland:
  tracehook: exec double-reporting fix
2008-12-10 14:40:21 -08:00
Hugh Dickins
9c24624727 KSYM_SYMBOL_LEN fixes
Miles Lane tailing /sys files hit a BUG which Pekka Enberg has tracked
to my 966c8c12dc sprint_symbol(): use
less stack exposing a bug in slub's list_locations() -
kallsyms_lookup() writes a 0 to namebuf[KSYM_NAME_LEN-1], but that was
beyond the end of page provided.

The 100 slop which list_locations() allows at end of page looks roughly
enough for all the other stuff it might print after the symbol before
it checks again: break out KSYM_SYMBOL_LEN earlier than before.

Latencytop and ftrace and are using KSYM_NAME_LEN buffers where they
need KSYM_SYMBOL_LEN buffers, and vmallocinfo a 2*KSYM_NAME_LEN buffer
where it wants a KSYM_SYMBOL_LEN buffer: fix those before anyone copies
them.

[akpm@linux-foundation.org: ftrace.h needs module.h]
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc Miles Lane <miles.lane@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10 08:01:54 -08:00
Dmitri Monakhov
6ee5a399d6 inotify: fix IN_ONESHOT unmount event watcher
On umount two event will be dispatched to watcher:

1: inotify_dev_queue_event(.., IN_UNMOUNT,..)
2: remove_watch(watch, dev)
    ->inotify_dev_queue_event(.., IN_IGNORED, ..)

But if watcher has IN_ONESHOT bit set then the watcher will be released
inside first event.  Which result in accessing invalid object later.  IMHO
it is not pure regression.  This bug wasn't triggered while initial
inotify interface testing phase because of another bug in IN_ONESHOT
handling logic :)

  commit ac74c00e49
  Author: Ulisses Furquim <ulissesf@gmail.com>
  Date:   Fri Feb 8 04:18:16 2008 -0800
    inotify: fix check for one-shot watches before destroying them
    As the IN_ONESHOT bit is never set when an event is sent we must check it
    in the watch's mask and not in the event's mask.

TESTCASE:
mkdir mnt
mount -ttmpfs none mnt
mkdir mnt/d
./inotify mnt/d&
umount mnt ## << lockup or crash here

TESTSOURCE:
/* gcc -oinotify inotify.c */
#include <stdio.h>
#include <stdlib.h>
#include <sys/inotify.h>

int main(int argc, char **argv)
{
        char buf[1024];
        struct inotify_event *ie;
        char *p;
        int i;
        ssize_t l;

        p = argv[1];
        i = inotify_init();
        inotify_add_watch(i, p, ~0);

        l = read(i, buf, sizeof(buf));
        printf("read %d bytes\n", l);
        ie = (struct inotify_event *) buf;
        printf("event mask: %d\n", ie->mask);
	return 0;
}

Signed-off-by: Dmitri Monakhov <dmonakhov@openvz.org>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Robert Love <rlove@google.com>
Cc: Ulisses Furquim <ulissesf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10 08:01:53 -08:00
Matt Mackall
49c50342c7 pagemap: fix 32-bit pagemap regression
The large pages fix from bcf8039ed4 broke 32-bit pagemap by pulling the
pagemap entry code out into a function with the wrong return type.
Pagemap entries are 64 bits on all systems and unsigned long is only 32
bits on 32-bit systems.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Reported-by: Doug Graham <dgraham@nortel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: <stable@kernel.org>		[2.6.26.x, 2.6.27.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10 08:01:53 -08:00
Andrew Morton
02d2116887 revert "percpu_counter: new function percpu_counter_sum_and_set"
Revert

    commit e8ced39d5e
    Author: Mingming Cao <cmm@us.ibm.com>
    Date:   Fri Jul 11 19:27:31 2008 -0400

        percpu_counter: new function percpu_counter_sum_and_set

As described in

	revert "percpu counter: clean up percpu_counter_sum_and_set()"

the new percpu_counter_sum_and_set() is racy against updates to the
cpu-local accumulators on other CPUs.  Revert that change.

This means that ext4 will be slow again.  But correct.

Reported-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: <stable@kernel.org>		[2.6.27.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10 08:01:52 -08:00
Andrew Morton
71c5576fbd revert "percpu counter: clean up percpu_counter_sum_and_set()"
Revert

    commit 1f7c14c62c
    Author: Mingming Cao <cmm@us.ibm.com>
    Date:   Thu Oct 9 12:50:59 2008 -0400

        percpu counter: clean up percpu_counter_sum_and_set()

Before this patch we had the following:

percpu_counter_sum(): return the percpu_counter's value

percpu_counter_sum_and_set(): return the percpu_counter's value, copying
that value into the central value and zeroing the per-cpu counters before
returning.

After this patch, percpu_counter_sum_and_set() has gone, and
percpu_counter_sum() gets the old percpu_counter_sum_and_set()
functionality.

Problem is, as Eric points out, the old percpu_counter_sum_and_set()
functionality was racy and wrong.  It zeroes out counters on "other" cpus,
without holding any locks which will prevent races agaist updates from
those other CPUS.

This patch reverts 1f7c14c62c.  This means
that percpu_counter_sum_and_set() still has the race, but
percpu_counter_sum() does not.

Note that this is not a simple revert - ext4 has since started using
percpu_counter_sum() for its dirty_blocks counter as well.

Note that this revert patch changes percpu_counter_sum() semantics.

Before the patch, a call to percpu_counter_sum() will bring the counter's
central counter mostly up-to-date, so a following percpu_counter_read()
will return a close value.

After this patch, a call to percpu_counter_sum() will leave the counter's
central accumulator unaltered, so a subsequent call to
percpu_counter_read() can now return a significantly inaccurate result.

If there is any code in the tree which was introduced after
e8ced39d5e was merged, and which depends
upon the new percpu_counter_sum() semantics, that code will break.

Reported-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10 08:01:52 -08:00
Roland McGrath
85f334666a tracehook: exec double-reporting fix
The patch 6341c39 "tracehook: exec" introduced a small regression in
2.6.27 regarding binfmt_misc exec event reporting.  Since the reporting
is now done in the common search_binary_handler() function, an exec
of a misc binary will result in two (or possibly multiple) exec events
being reported, instead of just a single one, because the misc handler
contains a recursive call to search_binary_handler.

To add to the confusion, if PTRACE_O_TRACEEXEC is not active, the multiple
SIGTRAP signals will in fact cause only a single ptrace intercept, as the
signals are not queued.  However, if PTRACE_O_TRACEEXEC is on, the debugger
will actually see multiple ptrace intercepts (PTRACE_EVENT_EXEC).

The test program included below demonstrates the problem.

This change fixes the bug by calling tracehook_report_exec() only in the
outermost search_binary_handler() call (bprm->recursion_depth == 0).

The additional change to restore bprm->recursion_depth after each binfmt
load_binary call is actually superfluous for this bug, since we test the
value saved on entry to search_binary_handler().  But it keeps the use of
of the depth count to its most obvious expected meaning.  Depending on what
binfmt handlers do in certain cases, there could have been false-positive
tests for recursion limits before this change.

    /* Test program using PTRACE_O_TRACEEXEC.
       This forks and exec's the first argument with the rest of the arguments,
       while ptrace'ing.  It expects to see one PTRACE_EVENT_EXEC stop and
       then a successful exit, with no other signals or events in between.

       Test for kernel doing two PTRACE_EVENT_EXEC stops for a binfmt_misc exec:

       $ gcc -g traceexec.c -o traceexec
       $ sudo sh -c 'echo :test:M::foobar::/bin/cat: > /proc/sys/fs/binfmt_misc/register'
       $ echo 'foobar test' > ./foobar
       $ chmod +x ./foobar
       $ ./traceexec ./foobar; echo $?
       ==> good <==
       foobar test
       0
       $
       ==> bad <==
       foobar test
       unexpected status 0x4057f != 0
       3
       $

    */

    #include <stdio.h>
    #include <sys/types.h>
    #include <sys/wait.h>
    #include <sys/ptrace.h>
    #include <unistd.h>
    #include <signal.h>
    #include <stdlib.h>

    static void
    wait_for (pid_t child, int expect)
    {
      int status;
      pid_t p = wait (&status);
      if (p != child)
	{
	  perror ("wait");
	  exit (2);
	}
      if (status != expect)
	{
	  fprintf (stderr, "unexpected status %#x != %#x\n", status, expect);
	  exit (3);
	}
    }

    int
    main (int argc, char **argv)
    {
      pid_t child = fork ();

      if (child < 0)
	{
	  perror ("fork");
	  return 127;
	}
      else if (child == 0)
	{
	  ptrace (PTRACE_TRACEME);
	  raise (SIGUSR1);
	  execv (argv[1], &argv[1]);
	  perror ("execve");
	  _exit (127);
	}

      wait_for (child, W_STOPCODE (SIGUSR1));

      if (ptrace (PTRACE_SETOPTIONS, child,
		  0L, (void *) (long) PTRACE_O_TRACEEXEC) != 0)
	{
	  perror ("PTRACE_SETOPTIONS");
	  return 4;
	}

      if (ptrace (PTRACE_CONT, child, 0L, 0L) != 0)
	{
	  perror ("PTRACE_CONT");
	  return 5;
	}

      wait_for (child, W_STOPCODE (SIGTRAP | (PTRACE_EVENT_EXEC << 8)));

      if (ptrace (PTRACE_CONT, child, 0L, 0L) != 0)
	{
	  perror ("PTRACE_CONT");
	  return 6;
	}

      wait_for (child, W_EXITCODE (0, 0));

      return 0;
    }

Reported-by: Arnd Bergmann <arnd@arndb.de>
CC: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: Roland McGrath <roland@redhat.com>
2008-12-09 19:36:38 -08:00
Lachlan McIlroy
e055f13a6d [XFS] Remove unused tracing code
None of this code appears to be used anywhere so remove it.

Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-10 11:51:54 +11:00
J. Bruce Fields
a4f4d6df53 EXPORTFS: handle NULL returns from fh_to_dentry()/fh_to_parent()
While 440037287c "[PATCH] switch all filesystems over to
d_obtain_alias" removed some cases where fh_to_dentry() and
fh_to_parent() could return NULL, there are still a few NULL returns
left in individual filesystems.  Thus it was a mistake for that commit
to remove the handling of NULL returns in the callers.

Revert those parts of 440037287c which removed the NULL handling.

(We could, alternatively, modify all implementations to return -ESTALE
instead of NULL, but that proves to require fixing a number of
filesystems, and in some cases it's arguably more natural to return
NULL.)

Thanks to David for original patch and Linus, Christoph, and Hugh for
review.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: David Howells <dhowells@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-08 19:49:32 -08:00
Yinghai Lu
240d367b4e sparseirq: fix Alpha build failure
Impact: build fix on Alpha

-tip testing found this build failure on the Alpha defconfig:

/home/mingo/tip/fs/proc/stat.c: In function 'show_stat':
/home/mingo/tip/fs/proc/stat.c:48: error: implicit declaration of function 'for_each_irq_desc'
/home/mingo/tip/fs/proc/stat.c:48: error: expected ';' before '{' token

can not use irq_desc() in stat.c on older architectures.

Signed-off-by: Yinghai Lu <yinghai@kernel.orgg>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-09 04:16:54 +01:00
Yinghai Lu
0b8f1efad3 sparse irq_desc[] array: core kernel and x86 changes
Impact: new feature

Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with
NR_CPUS set to large values. The goal is to be able to scale up to much
larger NR_IRQS value without impacting the (important) common case.

To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of
irq_desc pointers.

When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc,
this also makes the IRQ descriptors NUMA-local (to the site that calls
request_irq()).

This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now
uses desc->chip_data for x86 to store irq_cfg.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 14:31:51 +01:00
Jonathan Corbet
218d11a8b0 Fix a race condition in FASYNC handling
Changeset a238b790d5 (Call fasync()
functions without the BKL) introduced a race which could leave
file->f_flags in a state inconsistent with what the underlying
driver/filesystem believes.  Revert that change, and also fix the same
races in ioctl_fioasync() and ioctl_fionbio().

This is a minimal, short-term fix; the real fix will not involve the
BKL.

Reported-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-05 15:35:10 -08:00
Ingo Molnar
970987beb9 Merge branches 'tracing/ftrace', 'tracing/function-graph-tracer' and 'tracing/urgent' into tracing/core 2008-12-05 14:45:22 +01:00
Linus Torvalds
bbeba4c35c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev:
  [PATCH] fix bogus argument of blkdev_put() in pktcdvd
  [PATCH 2/2] documnt FMODE_ constants
  [PATCH 1/2] kill FMODE_NDELAY_NOW
  [PATCH] clean up blkdev_get a little bit
  [PATCH] Fix block dev compat ioctl handling
  [PATCH] kill obsolete temporary comment in swsusp_close()
2008-12-04 21:45:44 -08:00
Dave Chinner
576a488a27 [XFS] Fix hang after disallowed rename across directory quota domains
When project quota is active and is being used for directory tree
quota control, we disallow rename outside the current directory
tree. This requires a check to be made after all the inodes
involved in the rename are locked. We fail to unlock the inodes
correctly if we disallow the rename when the target is outside the
current directory tree. This results in a hang on the next access
to the inodes involved in failed rename.

Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-05 15:39:13 +11:00
Lachlan McIlroy
14d676f56f Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-12-05 15:27:43 +11:00
Lachlan McIlroy
797eaed40e [XFS] Remove unnecessary assertion
Hit this assert because an inode was tagged with XFS_ICI_RECLAIM_TAG but
not XFS_IRECLAIMABLE|XFS_IRECLAIM.  This is because xfs_iget_cache_hit()
first clears XFS_IRECLAIMABLE and then calls __xfs_inode_clear_reclaim_tag()
while only holding the pag_ici_lock in read mode so we can race with
xfs_reclaim_inodes_ag().  Looks like xfs_reclaim_inodes_ag() will do the
right thing anyway so just remove the assert.

Thanks to Christoph for pointing out where the problem was.

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
2008-12-05 14:15:49 +11:00
Lachlan McIlroy
a5b429d41f [XFS] Remove unused variable in ktrace_free()
entries_size is probably left over from when we used to pass the
size to kmem_free().

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
2008-12-05 13:31:51 +11:00
Lachlan McIlroy
c6422617a1 [XFS] Check return value of xfs_buf_get_noaddr()
We check the return value of all other calls to xfs_buf_get_noaddr().
Make sense to do it here too.

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
2008-12-05 13:16:15 +11:00
Dave Chinner
6a0775a991 [XFS] Fix hang after disallowed rename across directory quota domains
When project quota is active and is being used for directory tree
quota control, we disallow rename outside the current directory
tree. This requires a check to be made after all the inodes
involved in the rename are locked. We fail to unlock the inodes
correctly if we disallow the rename when the target is outside the
current directory tree. This results in a hang on the next access
to the inodes involved in failed rename.

Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-05 12:50:04 +11:00
Christoph Hellwig
8bb57320f3 [XFS] Fix compile with CONFIG_COMPAT enabled
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-05 11:23:10 +11:00
Alexey Dobriyan
995be04548 UBIFS: fix section mismatch
This patch fixes the following section mismatch:

WARNING: fs/ubifs/ubifs.o(.init.text+0xec): Section mismatch in reference from the function init_module() to the function .exit.text:ubifs_compressors_exit()

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-04 17:53:10 +02:00
Christoph Hellwig
fd4ce1acd0 [PATCH 1/2] kill FMODE_NDELAY_NOW
Update FMODE_NDELAY before each ioctl call so that we can kill the
magic FMODE_NDELAY_NOW.  It would be even better to do this directly
in setfl(), but for that we'd need to have FMODE_NDELAY for all files,
not just block special files.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-04 04:22:57 -05:00
Christoph Hellwig
ebbefc011e [PATCH] clean up blkdev_get a little bit
The way the bd_claim for the FMODE_EXCL case is implemented is rather
confusing.  Clean it up to the most logical style.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-04 04:22:56 -05:00
Ingo Molnar
b8307db247 Merge commit 'v2.6.28-rc7' into tracing/core 2008-12-04 09:07:19 +01:00
James Morris
ec98ce480a Merge branch 'master' into next
Conflicts:
	fs/nfsd/nfs4recover.c

Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().

Signed-off-by: James Morris <jmorris@namei.org>
2008-12-04 17:16:36 +11:00
Christoph Hellwig
5a8d0f3c7a move inode tracing out of xfs_vnode.
Move the inode tracing into xfs_iget.c / xfs_inode.h and kill xfs_vnode.c
now that it's empty.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:25 +11:00
Christoph Hellwig
25e41b3d52 move vn_iowait / vn_iowake into xfs_aops.c
The whole machinery to wait on I/O completion is related to the I/O path
and should be there instead of in xfs_vnode.c.  Also give the functions
more descriptive names.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:24 +11:00
Christoph Hellwig
583fa586f0 kill vn_ioerror
There's just one caller of this helper, and it's much cleaner to just merge
the xfs_do_force_shutdown call into it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:24 +11:00
Christoph Hellwig
f95099ba5a kill xfs_unmount_flush
There's almost nothing left in this function, instead remove the IRELE
on the real times inodes and the call to XFS_QM_UNMOUNT into xfs_unmountfs.

For the regular unmount case that means it now also happenes after dmapi
notification, but otherwise there is no difference in behaviour.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:24 +11:00
Christoph Hellwig
e57481dc26 no explicit xfs_iflush for special inodes during unmount
Currently we explicitly call xfs_iflush on the quota, real-time and root
inodes from xfs_unmount_flush.  But we just called xfs_sync_inodes with
SYNC_ATTR and do an XFS_bflush aka xfs_flush_buftarg to make sure all inodes
are on disk already, so there is no need for these special cases.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:23 +11:00
Christoph Hellwig
070c4616ec use xfs_trans_ijoin in xfs_trans_iget
Use xfs_trans_ijoin in xfs_trans_iget in case we need to join an inode into
a transaction instead of opencoding it.  Based on a discussion with and an
incomplete patch from Niv Sardi.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:23 +11:00
Christoph Hellwig
b56757becf remove leftovers of shared read-only support
We never supported shared read-only filesystems, so remove the dead
code left over from IRIX for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:23 +11:00
Christoph Hellwig
e88f11abe0 remove unused m_inode_quiesce member from struct xfs_mount
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:22 +11:00
Christoph Hellwig
6bd16ff270 kill dead inode flags
There are a few inode flags around that aren't used anywhere, so remove
them.  Also update xfsidbg to display all used inode flags correctly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:22 +11:00
Christoph Hellwig
5efcbb853b cleanup xfs_sb.h feature flag helpers
The various inlines in xfs_sb.h that deal with the superblock version
and fature flags were converted from macros a while ago, and this
show by the odd coding style full of useless braces and backslashes
and the avoidance of conditionals.

Clean these up to look like normal C code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:22 +11:00
Christoph Hellwig
df6771bde1 kill dead quota flags
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:22 +11:00
Christoph Hellwig
63ad2a5c4c remove dead code from sv_t implementation
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:21 +11:00
Christoph Hellwig
39e2defe73 reduce l_icloglock roundtrips
All but one caller of xlog_state_want_sync drop and re-acquire
l_icloglock around the call to it, just so that xlog_state_want_sync can
acquire and drop it.

Move all lock operation out of l_icloglock and assert that the lock is
held when it is called.

Note that it would make sense to extende this scheme to
xlog_state_release_iclog, but the locking in there is more complicated
and we'd like to keep the atomic_dec_and_lock optmization for those
callers not having l_icloglock yet.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:21 +11:00
Christoph Hellwig
d9424b3c4a stop using igrab in xfs_vn_link
->link is guranteed to get an already reference inode passed so we
can do a simple increment of i_count instead of using igrab and thus
avoid banging on the global inode_lock.  This is what most filesystems
already do.

Also move the increment after the call to xfs_link to simplify error
handling.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:21 +11:00
Christoph Hellwig
5d765b976c kill xfs_buf_iostart
xfs_buf_iostart is a "shared" helper for xfs_buf_read_flags,
xfs_bawrite, and xfs_bdwrite - except that there isn't much shared
code but rather special cases for each caller.

So remove this function and move the functionality to the caller.
xfs_bawrite and xfs_bdwrite are now big enough to be moved out of
line and the xfs_buf_read_flags is moved into a new helper called
_xfs_buf_read.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:20 +11:00
Christoph Hellwig
5cafdeb289 cleanup the inode reclaim path
Merge xfs_iextract and xfs_idestroy into xfs_ireclaim as they are never
called individually.  Also rewrite most comments in this area as they
were severly out of date.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:20 +11:00
Christoph Hellwig
ccd0be6cfc remove unused prototypes for xfs_ihash_init / xfs_ihash_free
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:20 +11:00
Christoph Hellwig
73e6335c14 remove unused behvavior cruft in xfs_super.h
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:19 +11:00
Christoph Hellwig
2234d54d3d remove useless mnt_want_write call in xfs_write
When mnt_want_write was introduced a call to it was added around
xfs_ichgtime, but there is no need for this because a file can't be open
read/write on a r/o mount, and a mount can't degrade r/o while we still
have files open for writing.  As the mnt_want_write changes were never
merged into the CVS tree this patch is for mainline only.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-04 15:39:19 +11:00
Christoph Hellwig
ddcd856d81 [XFS] fix compile on 32 bit systems
The recent compat patches make xfs_file.c include xfs_ioctl32.h unconditional,
which breaks the build on 32 bit systems which don't have the various compat
defintions.

Remove the include and move the defintion of xfs_file_compat_ioctl to
xfs_ioctl.h so that we can avoid including all the compat defintions in
xfs_file.c

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-04 13:07:29 +11:00
Linus Torvalds
2433c41789 Merge branch 'for-2.6.28' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.28' of git://linux-nfs.org/~bfields/linux:
  NLM: client-side nlm_lookup_host() should avoid matching on srcaddr
  nfsd: use of unitialized list head on error exit in nfs4recover.c
  Add a reference to sunrpc in svc_addsock
  nfsd: clean up grace period on early exit
2008-12-03 16:40:37 -08:00
Artem Bityutskiy
2ba5f7ae81 UBIFS: introduce LPT dump function
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-03 13:14:34 +02:00
Artem Bityutskiy
787845bdea UBIFS: dump stack in LPT check functions
It is useful to know how we got to the checking function when
hunting the bugs.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-03 13:14:34 +02:00
Artem Bityutskiy
45e12d901f UBIFS: run debugging checks only if they are enabled
Do not forget to check whether lpt debugging is enabled before
running the check functions. This commit also makes some spelling
fixes.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-03 13:14:34 +02:00
Artem Bityutskiy
552ff3179d UBIFS: add debugfs support
We need to have a possibility to see various UBIFS variables
and ask UBIFS to dump various information. Debugfs is what
we need.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-03 13:14:33 +02:00
Artem Bityutskiy
17c2f9f85c UBIFS: separate debugging fields out
Introduce a new data structure which contains all debugging
stuff inside. This is cleaner than having debugging stuff
directly in 'c'.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-03 13:14:33 +02:00
Kukkonen Mika
5dd7cbc083 UBIFS: avoid unnecessary checks
I have a habit of compiling kernel with
EXTRA_CFLAGS="-Wextra -Wno-unused -Wno-sign-compare -Wno-missing-field-initializers"
and so fs/ubifs/key.h give lots (~10) of these every time:

CC      fs/ubifs/tnc_misc.o
In file included from fs/ubifs/ubifs.h:1725,
from fs/ubifs/tnc_misc.c:30:
fs/ubifs/key.h: In function 'key_r5_hash':
fs/ubifs/key.h:64: warning: comparison of unsigned expression >= 0 is always true
fs/ubifs/key.h: In function 'key_test_hash':
fs/ubifs/key.h:81: warning: comparison of unsigned expression >= 0 is always true

This patch fixes the warnings.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-03 13:14:11 +02:00
Artem Bityutskiy
553dea4dd5 UBIFS: introduce compression mount options
It is very handy to be able to change default UBIFS compressor
via mount options. Introduce -o compr=<name> mount option support.
Currently only "none", "lzo" and "zlib" compressors are supported.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-03 13:14:05 +02:00
Artem Bityutskiy
a1dc080c27 UBIFS: use bit-fields to store compression type
Save a 4 bytes of RAM per 'struct inode' by stroring inode
compression type in bit-filed, instead of using 'int'.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-03 13:13:59 +02:00
Artem Bityutskiy
062e4fee44 UBIFS: slight compression optimization
If data does not compress, it is better to leave it uncompressed
because we'll read it faster then. So do not compress data if we
save less than 64 bytes.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-03 13:13:50 +02:00
David S. Miller
aa2ba5f108 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/ixgbe/ixgbe_main.c
	drivers/net/smc91x.c
2008-12-02 19:50:27 -08:00
Linus Torvalds
51eaaa6776 Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6:
  UBIFS: pre-allocate bulk-read buffer
  UBIFS: do not allocate too much
  UBIFS: do not print scary memory allocation warnings
  UBIFS: allow for gaps when dirtying the LPT
  UBIFS: fix compilation warnings
  MAINTAINERS: change UBI/UBIFS git tree URLs
  UBIFS: endian handling fixes and annotations
  UBIFS: remove printk
2008-12-02 15:56:55 -08:00
Harvey Harrison
5d9ec854bf fuse: clean up annotations of fc->lock
Makes the existing annotations match the more common one per line style
and adds a few missing annotations.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-12-02 14:49:42 +01:00
Miklos Szeredi
c9f0523d88 fuse: fix sparse warning in ioctl
Fix sparse warning:

  CHECK   fs/fuse/file.c
fs/fuse/file.c:1615:17: warning: incorrect type in assignment (different address spaces)
fs/fuse/file.c:1615:17:    expected void [noderef] <asn:1>*iov_base
fs/fuse/file.c:1615:17:    got void *<noident>

This was introduced by "fuse: implement ioctl support".

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-12-02 14:49:42 +01:00
sandeen@sandeen.net
e5d412f178 [XFS] Reorder xfs_ioctl32.c for some tidiness
Put things in IMHO a more readable order, now
that it's all done; add some comments.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-02 17:18:21 +11:00
sandeen@sandeen.net
710d62aaaf [XFS] Hook up compat XFS_IOC_FSSETDM_BY_HANDLE ioctl handler
Add a compat handler for XFS_IOC_FSSETDM_BY_HANDLE.

I haven't tested this, lacking dmapi tools to do so
(unless xfsqa magically gets this somehow?)

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-02 17:17:43 +11:00
sandeen@sandeen.net
28750975ac [XFS] Hook up compat XFS_IOC_ATTRMULTI_BY_HANDLE ioctl handler
Add a compat handler for XFS_IOC_ATTRMULTI_BY_HANDLE

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-02 17:17:07 +11:00
sandeen@sandeen.net
ebeecd2b04 [XFS] Hook up compat XFS_IOC_ATTRLIST_BY_HANDLE ioctl handler
Add a compat handler for XFS_IOC_ATTRLIST_BY_HANDLE

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-02 17:16:45 +11:00
sandeen@sandeen.net
af819d2763 [XFS] Fix compat XFS_IOC_FSBULKSTAT_SINGLE ioctl
The XFS_IOC_FSBULKSTAT_SINGLE ioctl passes in the
desired inode number, while XFS_IOC_FSBULKSTAT passes
in the previous/last-stat'd inode number.  The
compat handler wasn't differentiating these, so
when a XFS_IOC_FSBULKSTAT_SINGLE request for inode
128 was sent in, stat information for 131 was sent out.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-02 17:16:24 +11:00
sandeen@sandeen.net
65fbaf2489 [XFS] Fix xfs_bulkstat_one size checks & error handling
The 32-bit xfs_blkstat_one handler was failing because
a size check checked whether the remaining (32-bit)
user buffer was less than the (64-bit) bulkstat buffer,
and failed with ENOMEM if so.  Move this check
into the respective handlers so that they check the
correct sizes.

Also, the formatters were returning negative errors
or positive bytes copied; this was odd in the positive
error value world of xfs, and handled wrong by at least
some of the callers, which treated the bytes returned
as an error value.  Move the bytes-used assignment
into the formatters.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-02 17:16:03 +11:00
sandeen@sandeen.net
2ee4fa5cb7 [XFS] Make the bulkstat_one compat ioctl handling more sane
Currently the compat formatter was handled by passing
in "private_data" for the xfs_bulkstat_one formatter,
which was really just another formatter... IMHO this
got confusing.

Instead, just make a new xfs_bulkstat_one_compat
formatter for xfs_bulkstat, and call it via a wrapper.

Also, don't translate the ioctl nrs into their native
counterparts, that just clouds the issue; we're in a
compat handler anyway, just switch on the 32-bit cmds.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-02 17:15:36 +11:00
sandeen@sandeen.net
471d591031 [XFS] Add compat handlers for data & rt growfs ioctls
The args for XFS_IOC_FSGROWFSDATA and XFS_IOC_FSGROWFSRTA
have padding on the end on intel, so add arg copyin functions,
and then just call the growfs ioctl helpers.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-02 17:15:09 +11:00
sandeen@sandeen.net
e94fc4a43e [XFS] Add compat handlers for swapext ioctl
The big hitter here was the bstat field, which contains
different sized time_t on 32 vs. 64 bit.  Add a copyin
function to translate the 32-bit arg to 64-bit, and
call the swapext ioctl helper.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-02 17:10:04 +11:00
sandeen@sandeen.net
d5547f9fee [XFS] Clean up some existing compat ioctl calls
Create a new xfs_ioctl.h file which has prototypes for
ioctl helpers that may be called in compat mode.

Change several compat ioctl cases which are IOW to simply copy
in the userspace argument, then call the common ioctl helper.

This also fixes xfs_compat_ioc_fsgeometry_v1(), which had
it backwards before; it copied in an (empty) arg, then copied
out the native result, which probably corrupted userspace.  It
should be translating on the copyout.

Also, a bit of formatting cleanup for consistency, and conversion
of all error returns to use XFS_ERROR().

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-02 17:09:43 +11:00
sandeen@sandeen.net
ffae263a64 [XFS] Move compat ioctl structs & numbers into xfs_ioctl32.h
This makes the c file less cluttered and a bit more
readable.   Consistently name the ioctl number
macros with "_32" and the compatibility stuctures
with "_compat."  Rename the helpers which simply
copy in the arg with "_copyin" for easy identification.

Finally, for a few of the existing helpers, modify them
so that they directly call the native ioctl helper
after userspace argument fixup.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-02 17:08:44 +11:00
sandeen@sandeen.net
743bb4650d [XFS] Move copy_from_user calls out of ioctl helpers into ioctl switch.
Moving the copy_from_user out of some of the ioctl helpers will
make it easier for the compat ioctl switch to copy in the right
struct, then just pass to the underlying helper.

Also, move common access checks into the helpers themselves,
and out of the native ioctl switch code, to reduce code
duplication between native & compat ioctl callers.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-02 17:08:01 +11:00
Randy Dunlap
0380155363 ntfs: don't fool kernel-doc
kernel-doc handles macros now (it has for quite some time), so change the
ntfs_debug() macro's kernel-doc to be just before the macro instead of
before a phony function prototype.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-01 19:55:25 -08:00
Davide Libenzi
7ef9964e6d epoll: introduce resource usage limits
It has been thought that the per-user file descriptors limit would also
limit the resources that a normal user can request via the epoll
interface.  Vegard Nossum reported a very simple program (a modified
version attached) that can make a normal user to request a pretty large
amount of kernel memory, well within the its maximum number of fds.  To
solve such problem, default limits are now imposed, and /proc based
configuration has been introduced.  A new directory has been created,
named /proc/sys/fs/epoll/ and inside there, there are two configuration
points:

  max_user_instances = Maximum number of devices - per user

  max_user_watches   = Maximum number of "watched" fds - per user

The current default for "max_user_watches" limits the memory used by epoll
to store "watches", to 1/32 of the amount of the low RAM.  As example, a
256MB 32bit machine, will have "max_user_watches" set to roughly 90000.
That should be enough to not break existing heavy epoll users.  The
default value for "max_user_instances" is set to 128, that should be
enough too.

This also changes the userspace, because a new error code can now come out
from EPOLL_CTL_ADD (-ENOSPC).  The EMFILE from epoll_create() was already
listed, so that should be ok.

[akpm@linux-foundation.org: use get_current_user()]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: <stable@kernel.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Reported-by: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-01 19:55:24 -08:00
Mark Fasheh
d6b58f89f7 ocfs2: fix regression in ocfs2_read_blocks_sync()
We're panicing in ocfs2_read_blocks_sync() if a jbd-managed buffer is seen.
At first glance, this seems ok but in reality it can happen. My test case
was to just run 'exorcist'. A struct inode is being pushed out of memory but
is then re-read at a later time, before the buffer has been checkpointed by
jbd. This causes a BUG to be hit in ocfs2_read_blocks_sync().

Reviewed-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-01 14:46:58 -08:00
Coly Li
07d9a3954a ocfs2: fix return value set in init_dlmfs_fs()
In init_dlmfs_fs(), if calling kmem_cache_create() failed, the code will use return value from
calling bdi_init(). The correct behavior should be set status as -ENOMEM before going to "bail:".

Signed-off-by: Coly Li <coyli@suse.de>
Acked-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-01 14:46:55 -08:00
David Teigland
07f9eebcdf ocfs2: fix wake_up in unlock_ast
In ocfs2_unlock_ast(), call wake_up() on lockres before releasing
the spin lock on it.  As soon as the spin lock is released, the
lockres can be freed.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-01 14:46:45 -08:00
David Teigland
66f502a416 ocfs2: initialize stack_user lvbptr
The locking_state dump, ocfs2_dlm_seq_show, reads the lvb on locks where it
has not yet been initialized by a lock call.

Signed-off-by: David Teigland <teigland@redhat.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-01 14:46:39 -08:00
Coly Li
3b5da0189c ocfs2: comments typo fix
This patch fixes two typos in comments of ocfs2.

Signed-off-by: Coly Li <coyli@suse.de>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-01 14:46:31 -08:00
Christoph Hellwig
0e446673a1 [XFS] fix error handling in xlog_recover_process_one_iunlink
If we fail after xfs_iget we have to drop the reference count, spotted
by Dave Chinner.  Also remove some useless asserts and stop trying to
deal with di_mode == 0 inodes because never gets those without passing
the IGET_CREATE flag to xfs_iget.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:38:22 +11:00
Christoph Hellwig
24f211bad0 [XFS] move inode allocation out xfs_iread
Allocate the inode in xfs_iget_cache_miss and pass it into xfs_iread.  This
simplifies the error handling and allows xfs_iread to be shared with userspace
which already uses these semantics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:38:17 +11:00
Christoph Hellwig
b48d8d6437 [XFS] kill the XFS_IMAP_BULKSTAT flag
Just pass down the XFS_IGET_* flags all the way down to xfs_imap instead
of translating them mid-way.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:38:13 +11:00
Christoph Hellwig
92bfc6e7c4 [XFS] embededd struct xfs_imap into xfs_inode
Most uses of struct xfs_imap are to map and inode to a buffer.  To avoid
copying around the inode location information we should just embedd a
strcut xfs_imap into the xfs_inode.  To make sure it doesn't bloat an
inode the im_len is changed to a ushort, which is fine as that's what
the users exepect anyway.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:38:08 +11:00
Christoph Hellwig
94e1b69d1a [XFS] merge xfs_imap into xfs_dilocate
xfs_imap is the only caller of xfs_dilocate and doesn't add any significant
value.  Merge the two functions and document the various cases we have for
inode cluster lookup in the new xfs_imap.

Also remove the unused im_agblkno and im_ioffset fields from struct xfs_imap
while we're at it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:38:03 +11:00
Christoph Hellwig
a194189503 [XFS] remove dead code for old inode item recovery
We have removed the support for old-style inode items a while ago and
xlog_recover_do_inode_trans is now only called for XFS_LI_INODE items.
That means we can remove the call to xfs_imap there and with it the
XFS_IMAP_LOOKUP that is set by all other callers.  We can also mark
xfs_imap static now.

(First sent on October 21st)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:37:58 +11:00
Christoph Hellwig
76d8b277f7 [XFS] stop using xfs_itobp in xfs_iread
The only caller of xfs_itobp that doesn't have i_blkno setup is now
the initial inode read.  It needs access to the whole xfs_imap so using
xfs_inotobp is not an option.  Instead opencode the buffer lookup in
xfs_iread and kill all the functionality for the initial map from
xfs_itobp.

(First sent on October 21st)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:37:52 +11:00
Christoph Hellwig
23fac50f95 [XFS] split up xlog_recover_process_iunlinks
Split out the body of the main loop into a separate helper to make the
code readable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:37:48 +11:00
Christoph Hellwig
51ce16d519 [XFS] kill XFS_DINODE_VERSION_ defines
These names don't add any value at all over just using the numerical
values.

(First sent on October 9th)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:37:42 +11:00
Christoph Hellwig
81591fe2db [XFS] kill xfs_dinode_core_t
Now that we have a separate xfs_icdinode_t for the in-core inode which
gets logged there is no need anymore for the xfs_dinode vs xfs_dinode_core
split - the fact that part of the structure gets logged through the inode
log item and a small part not can better be described in a comment.

All sizeof operations on the dinode_core either really wanted the
icdinode and are switched to that one, or had already added the size
of the agi unlinked list pointer.  Later both will be replaced with
helpers once we get the larger CRC-enabled dinode.

Removing the data and attribute fork unions also has the advantage that
xfs_dinode.h doesn't need to pull in every header under the sun.

While we're at it also add some more comments describing the dinode
structure.

(First sent on October 7th)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:37:35 +11:00
Christoph Hellwig
d42f08f61c [XFS] kill xfs_ialloc_log_di
xfs_ialloc_log_di is only used to log the full inode core + di_next_unlinked.
That means all the offset magic is not nessecary and we can simply use
xfs_trans_log_buf directly.  Also add a comment describing what we should do
here instead.

(First sent on October 7th)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:37:31 +11:00
Christoph Hellwig
b28708d6a0 [XFS] sanitize xlog_in_core_t definition
Move all fields from xlog_iclog_fields_t into xlog_in_core_t instead of having
them in a substructure and the using #defines to make it look like they were
directly in xlog_in_core_t.  Also document that xlog_in_core_2_t is grossly
misnamed, and make all references to it typesafe.

(First sent on Semptember 15th)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:37:25 +11:00
From: Christoph Hellwig
4805621a37 [XFS] factor out xfs_read_agf helper
Add a helper to read the AGF header and perform basic verification.
Based on hunks from a larger patch from Dave Chinner.

(First sent on Juli 23rd)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:37:20 +11:00
Christoph Hellwig
5e1be0fb1a [XFS] factor out xfs_read_agi helper
Add a helper to read the AGI header and perform basic verification.
Based on hunks from a larger patch from Dave Chinner.

(First sent on Juli 23rd)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:37:15 +11:00
Dave Chinner
26c5295135 [XFS] remove i_gen from incore inode
i_gen is incremented in directory operations when the
directory is changed. It is never read or otherwise used
so it should be removed to help reduce the size of the
struct xfs_inode.

The patch also removes a duplicate logging of the directory
inode core. We only need to do this once per transaction
so kill the one associated with the i_gen increment.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:37:10 +11:00
Christoph Hellwig
207fcfad58 [XFS] remove xfs_vfsops.h
The only thing left is xfs_do_force_shutdown which already has a defintion
in xfs_mount.h.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:37:06 +11:00
Christoph Hellwig
2b5decd09e [XFS] remove xfs_vfs.h
The only thing left are the forced shutdown flags and freeze macros which
fit into xfs_mount.h much better.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:36:59 +11:00
Christoph Hellwig
00dd4029e9 [XFS] remove bhv_statvfs_t typedef
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:36:46 +11:00
Eric Sandeen
f35642e2f8 [XFS] Hook up the fiemap ioctl.
This adds the fiemap inode_operation, which for us converts the
fiemap values & flags into a getbmapx structure which can be sent
to xfs_getbmap.  The formatter then copies the bmv array back into
the user's fiemap buffer via the fiemap helpers.

If we wanted to be more clever, we could also return mapping data
for in-inode attributes, but I'm not terribly motivated to do that
just yet.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:29:42 +11:00
Eric Sandeen
5af317c942 [XFS] Add new getbmap flags.
This adds a new output flag, BMV_OF_LAST to indicate if we've hit
the last extent in the inode.  This potentially saves an extra call
from userspace to see when the whole mapping is done.

It also adds BMV_IF_DELALLOC and BMV_OF_DELALLOC to request, and
indicate, delayed-allocation extents.  In this case bmv_block
is set to -2 (-1 was already taken for HOLESTARTBLOCK; unfortunately
these are the reverse of the in-kernel constants.)

These new flags facilitate addition of the new fiemap interface.

Rather than adding sh_delalloc, remove sh_unwritten & just test
the flags directly.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:29:28 +11:00
Eric Sandeen
8a7141a8b9 [XFS] convert xfs_getbmap to take formatter functions
Preliminary work to hook up fiemap, this allows us to pass in an
arbitrary formatter to copy extent data back to userspace.

The formatter takes info for 1 extent, a pointer to the user "thing*"
and a pointer to a "filled" variable to indicate whether a userspace
buffer did get filled in (for fiemap, hole "extents" are skipped).

I'm just using the getbmapx struct as a "common denominator" because
as far as I can see, it holds all info that any formatters will care
about.

("*thing" because fiemap doesn't pass the user pointer around, but rather
has a pointer to a fiemap info structure, and helpers associated with it)

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:29:00 +11:00
Dave Chinner
0924b585fc [XFS] fix uninitialised variable bug in dquot release.
gcc is warning about an uninitialised variable in xfs_growfs_rt().
This is a false positive. Fix it by changing the scope of the
transaction pointer to wholly within the internal loop inside
the function.

While there, preemptively change xfs_growfs_rt_alloc() in the
same way as it has exactly the same structure as xfs_growfs_rt()
but gcc is not warning about it. Yet.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:11:36 +11:00
Dave Chinner
2e6560929d [XFS] fix error inversion problems with data flushing
XFS gets the sign of the error wrong in several places when
gathering the error from generic linux functions. These functions
return negative error values, while the core XFS code returns
positive error values. Hence when XFS inverts the error to be
returned to the VFS, it can incorrectly invert a negative
error and this error will be ignored by the syscall return.

Fix all the problems related to calling filemap_* functions.

Problem initially identified by Nick Piggin in xfs_fsync().

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:11:10 +11:00
Christoph Hellwig
65795910c1 [XFS] fix spurious gcc warnings
Some recent gcc warnings don't like passing string variables to
printf-like functions without using at least a "%s" format string.
Change the two occurances of that in xfs to please gcc.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:07:37 +11:00
Christoph Hellwig
6c31b93a14 [XFS] allow inode64 mount option on 32 bit systems
Now that we've stopped using the Linux inode cache when can trivally
support the inode64 mount option on 32bit architectures.  As far as the
kernel and most userspace is concerned this works perfectly, but
applications still using really old stat and readdir interfaces will get
an EOVERFLOW error when hitting an inode number not fitting into 32
bits (that problem of course also exists when using these applications
on a 64bit kernel).

Note that because inode64 is simply a mount option we can currently
mount a filesystem having > 32 bit inode numbers and cause a variety of
problems, all this is solved but this patch which enables XFS_BIG_INUMS,
even when inode64 is not used.

(First sent on October 18th)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:07:20 +11:00
Christoph Hellwig
f999a5bf3f [XFS] wire up ->open for directories
Currently there's no ->open method set for directories on XFS.  That
means we don't perform any check for opening too large directories
without O_LARGEFILE, we don't check for shut down filesystems, and we
don't actually do the readahead for the first block in the directory.

Instead of just setting the directories open routine to xfs_file_open
we merge the shutdown check directly into xfs_file_open and create
a new xfs_dir_open that first calls xfs_file_open and then performs
the readahead for block 0.

(First sent on September 29th)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:07:08 +11:00
Christoph Hellwig
bac8dca9f9 [XFS] fix NULL pointer dereference in xfs_log_force_umount
xfs_log_force_umount may be called very early during log recovery where

If we fail a buffer read in xlog_recover_do_inode_trans we abort the mount.
But at that point log recovery has started delayed writeback of inode
buffers.   As part of the aborted mount we try to flush out all delwri
buffers, but at that point we have already freed the superblock, and set
mp->m_sb_bp to NULL, and xfs_log_force_umount which gets called after
the inode buffer writeback trips over it.

Make xfs_log_force_umount a little more careful when accessing mp->m_sb_bp
to avoid this.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
2008-12-01 11:06:44 +11:00
Linus Torvalds
8e36a5d6ad Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  [CIFS] fix regression in cifs_write_begin/cifs_write_end
2008-11-30 14:04:02 -08:00
Ingo Molnar
604094f461 vfs, seqfile: export mangle_path() generally
mangle_path() is trivial enough to make  export restrictions on it
pointless - so change the export from EXPORT_SYMBOL_GPL to EXPORT_SYMBOL.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
2008-11-28 18:07:10 +01:00
Lachlan McIlroy
b5a20aa265 Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-11-28 15:23:52 +11:00
Jan Kara
52b19ac993 udf: Fix BUG_ON() in destroy_inode()
udf_clear_inode() can leave behind buffers on mapping's i_private list (when
we truncated preallocation). Call invalidate_inode_buffers() so that the list
is properly cleaned-up before we return from udf_clear_inode(). This is ugly
and suggest that we should cleanup preallocation earlier than in clear_inode()
but currently there's no such call available since drop_inode() is called under
inode lock and thus is unusable for disk operations.

Signed-off-by: Jan Kara <jack@suse.cz>
2008-11-27 17:38:28 +01:00
Jeff Layton
a98ee8c1c7 [CIFS] fix regression in cifs_write_begin/cifs_write_end
The conversion to write_begin/write_end interfaces had a bug where we
were passing a bad parameter to cifs_readpage_worker. Rather than
passing the page offset of the start of the write, we needed to pass the
offset of the beginning of the page. This was reliably showing up as
data corruption in the fsx-linux test from LTP.

It also became evident that this code was occasionally doing unnecessary
read calls. Optimize those away by using the PG_checked flag to indicate
that the unwritten part of the page has been initialized.

CC: Nick Piggin <npiggin@suse.de>
Acked-by: Dave Kleikamp <shaggy@us.ibm.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-11-26 19:32:33 +00:00
Ingo Molnar
0bfc24559d blktrace: port to tracepoints, update
Port to the new tracepoints API: split DEFINE_TRACE() and DECLARE_TRACE()
sites. Spread them out to the usage sites, as suggested by
Mathieu Desnoyers.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
2008-11-26 13:04:35 +01:00
Arnaldo Carvalho de Melo
5f3ea37c77 blktrace: port to tracepoints
This was a forward port of work done by Mathieu Desnoyers, I changed it to
encode the 'what' parameter on the tracepoint name, so that one can register
interest in specific events and not on classes of events to then check the
'what' parameter.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 12:13:34 +01:00
Tejun Heo
43901aabd7 fuse: add fuse_conn->release()
Add fuse_conn->release() so that fuse_conn can be embedded in other
structures.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-11-26 12:03:56 +01:00
Tejun Heo
0d179aa592 fuse: separate out fuse_conn_init() from new_conn()
Separate out fuse_conn_init() from new_conn() and while at it
initialize fuse_conn->entry during conn initialization.

This will be used by CUSE.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-11-26 12:03:55 +01:00
Tejun Heo
b93f858ab2 fuse: add fuse_ prefix to several functions
Add fuse_ prefix to request_send*() and get_root_inode() as some of
those functions will be exported for CUSE.  With or without CUSE
export, having the function names scoped is a good idea for
debuggability.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-11-26 12:03:55 +01:00
Tejun Heo
95668a69a4 fuse: implement poll support
Implement poll support.  Polled files are indexed using kh in a RB
tree rooted at fuse_conn->polled_files.

Client should send FUSE_NOTIFY_POLL notification once after processing
FUSE_POLL which has FUSE_POLL_SCHEDULE_NOTIFY set.  Sending
notification unconditionally after the latest poll or everytime file
content might have changed is inefficient but won't cause malfunction.

fuse_file_poll() can sleep and requires patches from the following
thread which allows f_op->poll() to sleep.

  http://thread.gmane.org/gmane.linux.kernel/726176

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-11-26 12:03:55 +01:00
Tejun Heo
8599396b50 fuse: implement unsolicited notification
Clients always used to write only in response to read requests.  To
implement poll efficiently, clients should be able to issue
unsolicited notifications.  This patch implements basic notification
support.

Zero fuse_out_header.unique is now accepted and considered unsolicited
notification and the error field contains notification code.  This
patch doesn't implement any actual notification.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-11-26 12:03:55 +01:00
Tejun Heo
acf99433d9 fuse: add file kernel handle
The file handle, fuse_file->fh, is opaque value supplied by userland
FUSE server and uniqueness is not guaranteed.  Add file kernel handle,
fuse_file->kh, which is allocated by the kernel on file allocation and
guaranteed to be unique.

This will be used by poll to match notification to the respective file
but can be used for other purposes where unique file handle is
necessary.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-11-26 12:03:55 +01:00
Tejun Heo
59efec7b90 fuse: implement ioctl support
Generic ioctl support is tricky to implement because only the ioctl
implementation itself knows which memory regions need to be read
and/or written.  To support this, fuse client can request retry of
ioctl specifying memory regions to read and write.  Deep copying
(nested pointers) can be implemented by retrying multiple times
resolving one depth of dereference at a time.

For security and cleanliness considerations, ioctl implementation has
restricted mode where the kernel determines data transfer directions
and sizes using the _IOC_*() macros on the ioctl command.  In this
mode, retry is not allowed.

For all FUSE servers, restricted mode is enforced.  Unrestricted ioctl
will be used by CUSE.

Plese read the comment on top of fs/fuse/file.c::fuse_file_do_ioctl()
for more information.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-11-26 12:03:55 +01:00
Tejun Heo
e9bb09dd6c fuse: don't let fuse_req->end() put the base reference
fuse_req->end() was supposed to be put the base reference but there's
no reason why it should.  It only makes things more complex.  Move it
out of ->end() and make it the responsibility of request_end().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-11-26 12:03:54 +01:00
Miklos Szeredi
1729a16c2c fuse: style fixes
Fix coding style errors reported by checkpatch and others.  Uptdate
copyright date to 2008.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-11-26 12:03:54 +01:00
Ingo Molnar
180b65df7b fix warning in fs/dlm/netlink.c
this warning:

  fs/dlm/netlink.c: In function ‘dlm_timeout_warn’:
  fs/dlm/netlink.c:131: warning: ‘send_skb’ may be used uninitialized in this function

triggers because GCC does not recognize the (correct) error flow
between prepare_data() and send_skb.

Annotate it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 16:51:45 -08:00
Serge Hallyn
18b6e0414e User namespaces: set of cleanups (v2)
The user_ns is moved from nsproxy to user_struct, so that a struct
cred by itself is sufficient to determine access (which it otherwise
would not be).  Corresponding ecryptfs fixes (by David Howells) are
here as well.

Fix refcounting.  The following rules now apply:
        1. The task pins the user struct.
        2. The user struct pins its user namespace.
        3. The user namespace pins the struct user which created it.

User namespaces are cloned during copy_creds().  Unsharing a new user_ns
is no longer possible.  (We could re-add that, but it'll cause code
duplication and doesn't seem useful if PAM doesn't need to clone user
namespaces).

When a user namespace is created, its first user (uid 0) gets empty
keyrings and a clean group_info.

This incorporates a previous patch by David Howells.  Here
is his original patch description:

>I suggest adding the attached incremental patch.  It makes the following
>changes:
>
> (1) Provides a current_user_ns() macro to wrap accesses to current's user
>     namespace.
>
> (2) Fixes eCryptFS.
>
> (3) Renames create_new_userns() to create_user_ns() to be more consistent
>     with the other associated functions and because the 'new' in the name is
>     superfluous.
>
> (4) Moves the argument and permission checks made for CLONE_NEWUSER to the
>     beginning of do_fork() so that they're done prior to making any attempts
>     at allocation.
>
> (5) Calls create_user_ns() after prepare_creds(), and gives it the new creds
>     to fill in rather than have it return the new root user.  I don't imagine
>     the new root user being used for anything other than filling in a cred
>     struct.
>
>     This also permits me to get rid of a get_uid() and a free_uid(), as the
>     reference the creds were holding on the old user_struct can just be
>     transferred to the new namespace's creator pointer.
>
> (6) Makes create_user_ns() reset the UIDs and GIDs of the creds under
>     preparation rather than doing it in copy_creds().
>
>David

>Signed-off-by: David Howells <dhowells@redhat.com>

Changelog:
	Oct 20: integrate dhowells comments
		1. leave thread_keyring alone
		2. use current_user_ns() in set_user()

Signed-off-by: Serge Hallyn <serue@us.ibm.com>
2008-11-24 18:57:41 -05:00
Chuck Lever
a8d82d9b95 NLM: client-side nlm_lookup_host() should avoid matching on srcaddr
Since commit c98451bd, the loop in nlm_lookup_host() unconditionally
compares the host's h_srcaddr field to the incoming source address.
For client-side nlm_host entries, both are always AF_UNSPEC, so this
check is unnecessary.

Since commit 781b61a6, which added support for AF_INET6 addresses to
nlm_cmp_addr(), nlm_cmp_addr() now returns FALSE for AF_UNSPEC
addresses, which causes nlm_lookup_host() to create a fresh nlm_host
entry every time it is called on the client.

These extra entries will eventually expire once the server is
unmounted, so the impact of this regression, introduced with lockd
IPv6 support in 2.6.28, should be minor.

We could fix this by adding an arm in nlm_cmp_addr() for AF_UNSPEC
addresses, but really, nlm_lookup_host() shouldn't be matching on the
srcaddr field for client-side nlm_host lookups.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-11-24 13:29:07 -06:00
J. Bruce Fields
e4625eb826 nfsd: use of unitialized list head on error exit in nfs4recover.c
Thanks to Matthew Dodd for this bug report:

A file label issue while running SELinux in MLS mode provoked the
following bug, which is a result of use before init on a 'struct list_head'.

In nfsd4_list_rec_dir() if the call to dentry_open() fails the 'goto
out' skips INIT_LIST_HEAD() which results in the normally improbable
case where list_entry() returns NULL.

Trace follows.

NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
SELinux:  Context unconfined_t:object_r:var_lib_nfs_t:s0 is not valid
(left unmapped).
type=1400 audit(1227298063.609:282): avc:  denied  { read } for
pid=1890 comm="rpc.nfsd" name="v4recovery" dev=dm-0 ino=148726
scontext=system_u:system_r:nfsd_t:s0-s15:c0.c1023
tcontext=system_u:object_r:unlabeled_t:s15:c0.c1023 tclass=dir
BUG: unable to handle kernel NULL pointer dereference at 00000004
IP: [<c050894e>] list_del+0x6/0x60
*pde = 0d9ce067 *pte = 00000000
Oops: 0000 [#1] SMP
Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs autofs4
sunrpc ipv6 dm_multipath scsi_dh ppdev parport_pc sg parport floppy
ata_piix pata_acpi ata_generic libata pcnet32 i2c_piix4 mii pcspkr
i2c_core dm_snapshot dm_zero dm_mirror dm_log dm_mod BusLogic sd_mod
scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last
unloaded: microcode]

Pid: 1890, comm: rpc.nfsd Not tainted (2.6.27.5-37.fc9.i686 #1)
EIP: 0060:[<c050894e>] EFLAGS: 00010217 CPU: 0
EIP is at list_del+0x6/0x60
EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: cd99e480
ESI: cf9caed8 EDI: 00000000 EBP: cf9caebc ESP: cf9caeb8
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process rpc.nfsd (pid: 1890, ti=cf9ca000 task=cf4de580 task.ti=cf9ca000)
Stack: 00000000 cf9caef0 d0a9f139 c0496d04 d0a9f217 fffffff3 00000000
00000000
        00000000 00000000 cf32b220 00000000 00000008 00000801 cf9caefc
d0a9f193
        00000000 cf9caf08 d0a9b6ea 00000000 cf9caf1c d0a874f2 cf9c3004
00000008
Call Trace:
  [<d0a9f139>] ? nfsd4_list_rec_dir+0xf3/0x13a [nfsd]
  [<c0496d04>] ? do_path_lookup+0x12d/0x175
  [<d0a9f217>] ? load_recdir+0x0/0x26 [nfsd]
  [<d0a9f193>] ? nfsd4_recdir_load+0x13/0x34 [nfsd]
  [<d0a9b6ea>] ? nfs4_state_start+0x2a/0xc5 [nfsd]
  [<d0a874f2>] ? nfsd_svc+0x51/0xff [nfsd]
  [<d0a87f2d>] ? write_svc+0x0/0x1e [nfsd]
  [<d0a87f48>] ? write_svc+0x1b/0x1e [nfsd]
  [<d0a87854>] ? nfsctl_transaction_write+0x3a/0x61 [nfsd]
  [<c04b6a4e>] ? sys_nfsservctl+0x116/0x154
  [<c04975c1>] ? putname+0x24/0x2f
  [<c04975c1>] ? putname+0x24/0x2f
  [<c048d49f>] ? do_sys_open+0xad/0xb7
  [<c048d337>] ? filp_close+0x50/0x5a
  [<c048d4eb>] ? sys_open+0x1e/0x26
  [<c0403cca>] ? syscall_call+0x7/0xb
  [<c064007b>] ? init_cyrix+0x185/0x490
  =======================
Code: 75 e1 8b 53 08 8d 4b 04 8d 46 04 e8 75 00 00 00 8b 53 10 8d 4b 0c
8d 46 0c e8 67 00 00 00 5b 5e 5f 5d c3 90 90 55 89 e5 53 89 c3 <8b> 40
04 8b 00 39 d8 74 16 50 53 68 3e d6 6f c0 6a 30 68 78 d6
EIP: [<c050894e>] list_del+0x6/0x60 SS:ESP 0068:cf9caeb8
---[ end trace a89c4ad091c4ad53 ]---

Cc: Matthew N. Dodd <Matthew.Dodd@spart.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-11-24 10:36:09 -06:00
J. Bruce Fields
2c5e76158f nfsd: clean up grace period on early exit
If nfsd was shut down before the grace period ended, we could end up
with a freed object still on grace_list.  Thanks to Jeff Moyer for
reporting the resulting list corruption warnings.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Tested-by: Jeff Moyer <jmoyer@redhat.com>
2008-11-24 10:12:48 -06:00
Török Edwin
958086d178 vfs, seqfile: fix comment style on mangle_path
Impact: use standard docbook tags

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 22:36:35 +01:00
Török Edwin
74e2f334f4 vfs, seqfile: make mangle_path() global
Impact: expose new VFS API

make mangle_path() available, as per the suggestions of Christoph Hellwig
and Al Viro:

  http://lkml.org/lkml/2008/11/4/338

Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 09:45:39 +01:00
Artem Bityutskiy
3477d20465 UBIFS: pre-allocate bulk-read buffer
To avoid memory allocation failure during bulk-read, pre-allocate
a bulk-read buffer, so that if there is only one bulk-reader at
a time, it would just use the pre-allocated buffer and would not
do any memory allocation. However, if there are more than 1 bulk-
reader, then only one reader would use the pre-allocated buffer,
while the other reader would allocate the buffer for itself.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-11-21 18:59:33 +02:00
Artem Bityutskiy
6c0c42cdfd UBIFS: do not allocate too much
Bulk-read allocates 128KiB or more using kmalloc. The allocation
starts failing often when the memory gets fragmented. UBIFS still
works fine in this case, because it falls-back to standard
(non-optimized) read method, though. This patch teaches bulk-read
to allocate exactly the amount of memory it needs, instead of
allocating 128KiB every time.

This patch is also a preparation to the further fix where we'll
have a pre-allocated bulk-read buffer as well. For example, now
the @bu object is prepared in 'ubifs_bulk_read()', so we could
path either pre-allocated or allocated information to
'ubifs_do_bulk_read()' later. Or teaching 'ubifs_do_bulk_read()'
not to allocate 'bu->buf' if it is already there.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-11-21 18:59:25 +02:00
Artem Bityutskiy
39ce81ce71 UBIFS: do not print scary memory allocation warnings
Bulk-read allocates a lot of memory with 'kmalloc()', and when it
is/gets fragmented 'kmalloc()' fails with a scarry warning. But
because bulk-read is just an optimization, UBIFS keeps working fine.
Supress the warning by passing __GFP_NOWARN option to 'kmalloc()'.

This patch also introduces a macro for the magic 128KiB constant.
This is just neater.

Note, this is not really fixes the problem we had, but just hides
the warnings. The further patches fix the problem.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-11-21 18:59:16 +02:00
David S. Miller
6ab33d5171 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/ixgbe/ixgbe_main.c
	include/net/mac80211.h
	net/phonet/af_phonet.c
2008-11-20 16:44:00 -08:00
Linus Torvalds
0cb39aa0ac Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  [CIFS] Do not attempt to close invalidated file handles
  [CIFS] fix check for dead tcon in smb_init
2008-11-20 13:14:16 -08:00
Steve French
ddb4cbfc53 [CIFS] Do not attempt to close invalidated file handles
If a connection with open file handles has gone down
and come back up and reconnected without reopening
the file handle yet, do not attempt to send an SMB close
request for this handle in cifs_close.  We were
checking for the connection being invalid in cifs_close
but since the connection may have been reconnected
we also need to check whether the file handle
was marked invalid (otherwise we could close the
wrong file handle by accident).

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-11-20 20:14:13 +00:00
WANG Cong
ea7e743e49 hostfs: fix a duplicated global function name
fs/hostfs/hostfs_user.c defines do_readlink() as non-static, and so does
fs/xfs/linux-2.6/xfs_ioctl.c when CONFIG_XFS_DEBUG=y.  So rename
do_readlink() in hostfs to hostfs_do_readlink().

I think it's better if XFS guys will also rename their do_readlink(),
it's not necessary to use such a general name.

Signed-off-by: WANG Cong <wangcong@zeuux.org>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-19 18:50:00 -08:00
Hugh Dickins
f9454548e1 don't unlink an active swapfile
Peter Cordes is sorry that he rm'ed his swapfiles while they were in use,
he then had no pathname to swapoff.  It's a curious little oversight, but
not one worth a lot of hackery.  Kudos to Willy Tarreau for turning this
around from a discussion of synthetic pathnames to how to prevent unlink.
Mimic immutable: prohibit unlinking an active swapfile in may_delete()
(and don't worry my little head over the tiny race window).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Willy Tarreau <w@1wt.eu>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Peter Cordes <peter@cordes.ca>
Cc: Bodo Eggert <7eggert@gmx.de>
Cc: David Newall <davidn@davidnewall.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-19 18:49:59 -08:00
Michael Halcrow
ac97b9f9a2 eCryptfs: Allocate up to two scatterlists for crypto ops on keys
I have received some reports of out-of-memory errors on some older AMD
architectures.  These errors are what I would expect to see if
crypt_stat->key were split between two separate pages.  eCryptfs should
not assume that any of the memory sent through virt_to_scatterlist() is
all contained in a single page, and so this patch allocates two
scatterlist structs instead of one when processing keys.  I have received
confirmation from one person affected by this bug that this patch resolves
the issue for him, and so I am submitting it for inclusion in a future
stable release.

Note that virt_to_scatterlist() runs sg_init_table() on the scatterlist
structs passed to it, so the calls to sg_init_table() in
decrypt_passphrase_encrypted_session_key() are redundant.

Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Reported-by: Paulo J. S. Silva <pjssilva@ime.usp.br>
Cc: "Leon Woestenberg" <leon.woestenberg@gmail.com>
Cc: Tim Gardner <tim.gardner@canonical.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-19 18:49:58 -08:00
Randy Dunlap
9789cfe22e nfsctl: add headers for credentials
Needs headers help for current_cred:

Adding only cred.h wasn't enough.

linux-next-20081023/fs/nfsctl.c:45: error: implicit declaration of function 'current_cred'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-20 08:11:55 +11:00
Randy Dunlap
7596b27dbd coda: fix creds reference
Needs a header file for credentials struct:

linux-next-20081023/fs/coda/file.c:177: error: dereferencing pointer to incomplete type

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-20 08:11:52 +11:00
David S. Miller
198d6ba4d7 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/isdn/i4l/isdn_net.c
	fs/cifs/connect.c
2008-11-18 23:38:23 -08:00
Steve French
bfb59820ee [CIFS] fix check for dead tcon in smb_init
This was recently changed to check for need_reconnect, but should
actually be a check for a tidStatus of CifsExiting.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-11-18 16:33:48 +00:00
Tejun Heo
55e8e30c38 block/md: fix md autodetection
Block ext devt conversion missed md_autodetect_dev() call in
rescan_partitions() leaving md autodetect unable to see partitions.
Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-11-18 15:08:56 +01:00
Tejun Heo
ba32929a91 block: make add_partition() return pointer to hd_struct
Make add_partition() return pointer to the new hd_struct on success
and ERR_PTR() value on failure.  This change will be used to fix md
autodetection bug.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-11-18 15:08:56 +01:00
Tejun Heo
eb60fa1066 block: fix add_partition() error path
Partition stats structure was not freed on devt allocation failure
path.  Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-11-18 15:08:55 +01:00
James Morris
f3a5c54701 Merge branch 'master' into next
Conflicts:
	fs/cifs/misc.c

Merge to resolve above, per the patch below.

Signed-off-by: James Morris <jmorris@namei.org>

diff --cc fs/cifs/misc.c
index ec36410,addd1dc..0000000
--- a/fs/cifs/misc.c
+++ b/fs/cifs/misc.c
@@@ -347,13 -338,13 +338,13 @@@ header_assemble(struct smb_hdr *buffer
  		/*  BB Add support for establishing new tCon and SMB Session  */
  		/*      with userid/password pairs found on the smb session   */
  		/*	for other target tcp/ip addresses 		BB    */
 -				if (current->fsuid != treeCon->ses->linux_uid) {
 +				if (current_fsuid() != treeCon->ses->linux_uid) {
  					cFYI(1, ("Multiuser mode and UID "
  						 "did not match tcon uid"));
- 					read_lock(&GlobalSMBSeslock);
- 					list_for_each(temp_item, &GlobalSMBSessionList) {
- 						ses = list_entry(temp_item, struct cifsSesInfo, cifsSessionList);
+ 					read_lock(&cifs_tcp_ses_lock);
+ 					list_for_each(temp_item, &treeCon->ses->server->smb_ses_list) {
+ 						ses = list_entry(temp_item, struct cifsSesInfo, smb_ses_list);
 -						if (ses->linux_uid == current->fsuid) {
 +						if (ses->linux_uid == current_fsuid()) {
  							if (ses->server == treeCon->ses->server) {
  								cFYI(1, ("found matching uid substitute right smb_uid"));
  								buffer->Uid = ses->Suid;
2008-11-18 18:52:37 +11:00
Linus Torvalds
4e14e833ac Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  prevent cifs_writepages() from skipping unwritten pages
  Fixed parsing of mount options when doing DFS submount
  [CIFS] Fix check for tcon seal setting and fix oops on failed mount from earlier patch
  [CIFS] Fix build break
  cifs: reinstate sharing of tree connections
  [CIFS] minor cleanup to cifs_mount
  cifs: reinstate sharing of SMB sessions sans races
  cifs: disable sharing session and tcon and add new TCP sharing code
  [CIFS] clean up server protocol handling
  [CIFS] remove unused list, add new cifs sock list to prepare for mount/umount fix
  [CIFS] Fix cifs reconnection flags
  [CIFS] Can't rely on iov length and base when kernel_recvmsg returns error
2008-11-17 20:53:31 -08:00
Dave Kleikamp
b066a48c95 prevent cifs_writepages() from skipping unwritten pages
Fixes a data corruption under heavy stress in which pages could be left
dirty after all open instances of a inode have been closed.

In order to write contiguous pages whenever possible, cifs_writepages()
asks pagevec_lookup_tag() for more pages than it may write at one time.
Normally, it then resets index just past the last page written before calling
pagevec_lookup_tag() again.

If cifs_writepages() can't write the first page returned, it wasn't resetting
index, and the next call to pagevec_lookup_tag() resulted in skipping all of
the pages it previously returned, even though cifs_writepages() did nothing
with them.  This can result in data loss when the file descriptor is about
to be closed.

This patch ensures that index gets set back to the next returned page so
that none get skipped.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: Shirish S Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-11-18 04:30:07 +00:00
Igor Mammedov
2c55608f28 Fixed parsing of mount options when doing DFS submount
Since these hit the same routines, and are relatively small, it is easier to review
them as one patch.

Fixed incorrect handling of the last option in some cases
Fixed prefixpath handling convert path_consumed into host depended string length (in bytes)
Use non default separator if it is provided in the original mount options

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Igor Mammedov <niallain@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-11-18 04:29:06 +00:00
Steve French
ab3f992983 [CIFS] Fix check for tcon seal setting and fix oops on failed mount from earlier patch
set tcon->ses earlier

If the inital tree connect fails, we'll end up calling cifs_put_smb_ses
with a NULL pointer. Fix it by setting the tcon->ses earlier.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-11-17 16:03:00 +00:00
Dave Chinner
cc09c0dc57 [XFS] Fix double free of log tickets
When an I/O error occurs during an intermediate commit on a rolling
transaction, xfs_trans_commit() will free the transaction structure
and the related ticket. However, the duplicate transaction that
gets used as the transaction continues still contains a pointer
to the ticket. Hence when the duplicate transaction is cancelled
and freed, we free the ticket a second time.

Add reference counting to the ticket so that we hold an extra
reference to the ticket over the transaction commit. We drop the
extra reference once we have checked that the transaction commit
did not return an error, thus avoiding a double free on commit
error.

Credit to Nick Piggin for tripping over the problem.

SGI-PV: 989741

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-11-17 17:37:10 +11:00
Steve French
c2b3382cd4 [CIFS] Fix build break
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-11-17 03:57:13 +00:00
Jeff Layton
f1987b44f6 cifs: reinstate sharing of tree connections
Use a similar approach to the SMB session sharing. Add a list of tcons
attached to each SMB session. Move the refcount to non-atomic. Protect
all of the above with the cifs_tcp_ses_lock. Add functions to
properly find and put references to the tcons.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-11-17 03:14:12 +00:00
Al Viro
5c06fe772d Fix broken ownership of /proc/sys/ files
D'oh...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-and-tested-by: Peter Palfrader <peter@palfrader.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-16 15:09:52 -08:00
Al Viro
8f7b0ba1c8 Fix inotify watch removal/umount races
Inotify watch removals suck violently.

To kick the watch out we need (in this order) inode->inotify_mutex and
ih->mutex.  That's fine if we have a hold on inode; however, for all
other cases we need to make damn sure we don't race with umount.  We can
*NOT* just grab a reference to a watch - inotify_unmount_inodes() will
happily sail past it and we'll end with reference to inode potentially
outliving its superblock.

Ideally we just want to grab an active reference to superblock if we
can; that will make sure we won't go into inotify_umount_inodes() until
we are done.  Cleanup is just deactivate_super().

However, that leaves a messy case - what if we *are* racing with
umount() and active references to superblock can't be acquired anymore?
We can bump ->s_count, grab ->s_umount, which will almost certainly wait
until the superblock is shut down and the watch in question is pining
for fjords.  That's fine, but there is a problem - we might have hit the
window between ->s_active getting to 0 / ->s_count - below S_BIAS (i.e.
the moment when superblock is past the point of no return and is heading
for shutdown) and the moment when deactivate_super() acquires
->s_umount.

We could just do drop_super() yield() and retry, but that's rather
antisocial and this stuff is luser-triggerable.  OTOH, having grabbed
->s_umount and having found that we'd got there first (i.e.  that
->s_root is non-NULL) we know that we won't race with
inotify_umount_inodes().

So we could grab a reference to watch and do the rest as above, just
with drop_super() instead of deactivate_super(), right? Wrong.  We had
to drop ih->mutex before we could grab ->s_umount.  So the watch
could've been gone already.

That still can be dealt with - we need to save watch->wd, do idr_find()
and compare its result with our pointer.  If they match, we either have
the damn thing still alive or we'd lost not one but two races at once,
the watch had been killed and a new one got created with the same ->wd
at the same address.  That couldn't have happened in inotify_destroy(),
but inotify_rm_wd() could run into that.  Still, "new one got created"
is not a problem - we have every right to kill it or leave it alone,
whatever's more convenient.

So we can use idr_find(...) == watch && watch->inode->i_sb == sb as
"grab it and kill it" check.  If it's been our original watch, we are
fine, if it's a newcomer - nevermind, just pretend that we'd won the
race and kill the fscker anyway; we are safe since we know that its
superblock won't be going away.

And yes, this is far beyond mere "not very pretty"; so's the entire
concept of inotify to start with.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-15 12:26:44 -08:00
Steve French
d82c2df54e [CIFS] minor cleanup to cifs_mount
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-11-15 00:07:26 +00:00
Jeff Layton
14fbf50d69 cifs: reinstate sharing of SMB sessions sans races
We do this by abandoning the global list of SMB sessions and instead
moving to a per-server list. This entails adding a new list head to the
TCP_Server_Info struct. The refcounting for the cifsSesInfo is moved to
a non-atomic variable. We have to protect it by a lock anyway, so there's
no benefit to making it an atomic. The list and refcount are protected
by the global cifs_tcp_ses_lock.

The patch also adds a new routines to find and put SMB sessions and
that properly take and put references under the lock.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-11-14 23:56:55 +00:00
Jeff Layton
e7ddee9037 cifs: disable sharing session and tcon and add new TCP sharing code
The code that allows these structs to be shared is extremely racy.
Disable the sharing of SMB and tcon structs for now until we can
come up with a way to do this that's race free.

We want to continue to share TCP sessions, however since they are
required for multiuser mounts. For that, implement a new (hopefully
race-free) scheme. Add a new global list of TCP sessions, and take
care to get a reference to it whenever we're dealing with one.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-11-14 23:42:32 +00:00
Steve French
3ec332ef7a [CIFS] clean up server protocol handling
We're currently declaring both a sockaddr_in and sockaddr6_in on the
stack, but we really only need storage for one of them. Declare a
sockaddr struct and cast it to the proper type. Also, eliminate the
protocolType field in the TCP_Server_Info struct. It's redundant since
we have a sa_family field in the sockaddr anyway.

We may need to revisit this if SCTP is ever implemented, but for now
this will simplify the code.

CIFS over IPv6 also has a number of problems currently. This fixes all
of them that I found. Eventually, it would be nice to move more of the
code to be protocol independent, but this is a start.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-11-14 03:35:10 +00:00
James Morris
2b82892565 Merge branch 'master' into next
Conflicts:
	security/keys/internal.h
	security/keys/process_keys.c
	security/keys/request_key.c

Fixed conflicts above by using the non 'tsk' versions.

Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 11:29:12 +11:00
David Howells
3b11a1dece CRED: Differentiate objective and effective subjective credentials on a task
Differentiate the objective and real subjective credentials from the effective
subjective credentials on a task by introducing a second credentials pointer
into the task_struct.

task_struct::real_cred then refers to the objective and apparent real
subjective credentials of a task, as perceived by the other tasks in the
system.

task_struct::cred then refers to the effective subjective credentials of a
task, as used by that task when it's actually running.  These are not visible
to the other tasks in the system.

__task_cred(task) then refers to the objective/real credentials of the task in
question.

current_cred() refers to the effective subjective credentials of the current
task.

prepare_creds() uses the objective creds as a base and commit_creds() changes
both pointers in the task_struct (indeed commit_creds() requires them to be the
same).

override_creds() and revert_creds() change the subjective creds pointer only,
and the former returns the old subjective creds.  These are used by NFSD,
faccessat() and do_coredump(), and will by used by CacheFiles.

In SELinux, current_has_perm() is provided as an alternative to
task_has_perm().  This uses the effective subjective context of current,
whereas task_has_perm() uses the objective/real context of the subject.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:26 +11:00
David Howells
d76b0d9b2d CRED: Use creds in file structs
Attach creds to file structs and discard f_uid/f_gid.

file_operations::open() methods (such as hppfs_open()) should use file->f_cred
rather than current_cred().  At the moment file->f_cred will be current_cred()
at this point.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:25 +11:00
David Howells
a6f76f23d2 CRED: Make execve() take advantage of copy-on-write credentials
Make execve() take advantage of copy-on-write credentials, allowing it to set
up the credentials in advance, and then commit the whole lot after the point
of no return.

This patch and the preceding patches have been tested with the LTP SELinux
testsuite.

This patch makes several logical sets of alteration:

 (1) execve().

     The credential bits from struct linux_binprm are, for the most part,
     replaced with a single credentials pointer (bprm->cred).  This means that
     all the creds can be calculated in advance and then applied at the point
     of no return with no possibility of failure.

     I would like to replace bprm->cap_effective with:

	cap_isclear(bprm->cap_effective)

     but this seems impossible due to special behaviour for processes of pid 1
     (they always retain their parent's capability masks where normally they'd
     be changed - see cap_bprm_set_creds()).

     The following sequence of events now happens:

     (a) At the start of do_execve, the current task's cred_exec_mutex is
     	 locked to prevent PTRACE_ATTACH from obsoleting the calculation of
     	 creds that we make.

     (a) prepare_exec_creds() is then called to make a copy of the current
     	 task's credentials and prepare it.  This copy is then assigned to
     	 bprm->cred.

  	 This renders security_bprm_alloc() and security_bprm_free()
     	 unnecessary, and so they've been removed.

     (b) The determination of unsafe execution is now performed immediately
     	 after (a) rather than later on in the code.  The result is stored in
     	 bprm->unsafe for future reference.

     (c) prepare_binprm() is called, possibly multiple times.

     	 (i) This applies the result of set[ug]id binaries to the new creds
     	     attached to bprm->cred.  Personality bit clearance is recorded,
     	     but now deferred on the basis that the exec procedure may yet
     	     fail.

         (ii) This then calls the new security_bprm_set_creds().  This should
	     calculate the new LSM and capability credentials into *bprm->cred.

	     This folds together security_bprm_set() and parts of
	     security_bprm_apply_creds() (these two have been removed).
	     Anything that might fail must be done at this point.

         (iii) bprm->cred_prepared is set to 1.

	     bprm->cred_prepared is 0 on the first pass of the security
	     calculations, and 1 on all subsequent passes.  This allows SELinux
	     in (ii) to base its calculations only on the initial script and
	     not on the interpreter.

     (d) flush_old_exec() is called to commit the task to execution.  This
     	 performs the following steps with regard to credentials:

	 (i) Clear pdeath_signal and set dumpable on certain circumstances that
	     may not be covered by commit_creds().

         (ii) Clear any bits in current->personality that were deferred from
             (c.i).

     (e) install_exec_creds() [compute_creds() as was] is called to install the
     	 new credentials.  This performs the following steps with regard to
     	 credentials:

         (i) Calls security_bprm_committing_creds() to apply any security
             requirements, such as flushing unauthorised files in SELinux, that
             must be done before the credentials are changed.

	     This is made up of bits of security_bprm_apply_creds() and
	     security_bprm_post_apply_creds(), both of which have been removed.
	     This function is not allowed to fail; anything that might fail
	     must have been done in (c.ii).

         (ii) Calls commit_creds() to apply the new credentials in a single
             assignment (more or less).  Possibly pdeath_signal and dumpable
             should be part of struct creds.

	 (iii) Unlocks the task's cred_replace_mutex, thus allowing
	     PTRACE_ATTACH to take place.

         (iv) Clears The bprm->cred pointer as the credentials it was holding
             are now immutable.

         (v) Calls security_bprm_committed_creds() to apply any security
             alterations that must be done after the creds have been changed.
             SELinux uses this to flush signals and signal handlers.

     (f) If an error occurs before (d.i), bprm_free() will call abort_creds()
     	 to destroy the proposed new credentials and will then unlock
     	 cred_replace_mutex.  No changes to the credentials will have been
     	 made.

 (2) LSM interface.

     A number of functions have been changed, added or removed:

     (*) security_bprm_alloc(), ->bprm_alloc_security()
     (*) security_bprm_free(), ->bprm_free_security()

     	 Removed in favour of preparing new credentials and modifying those.

     (*) security_bprm_apply_creds(), ->bprm_apply_creds()
     (*) security_bprm_post_apply_creds(), ->bprm_post_apply_creds()

     	 Removed; split between security_bprm_set_creds(),
     	 security_bprm_committing_creds() and security_bprm_committed_creds().

     (*) security_bprm_set(), ->bprm_set_security()

     	 Removed; folded into security_bprm_set_creds().

     (*) security_bprm_set_creds(), ->bprm_set_creds()

     	 New.  The new credentials in bprm->creds should be checked and set up
     	 as appropriate.  bprm->cred_prepared is 0 on the first call, 1 on the
     	 second and subsequent calls.

     (*) security_bprm_committing_creds(), ->bprm_committing_creds()
     (*) security_bprm_committed_creds(), ->bprm_committed_creds()

     	 New.  Apply the security effects of the new credentials.  This
     	 includes closing unauthorised files in SELinux.  This function may not
     	 fail.  When the former is called, the creds haven't yet been applied
     	 to the process; when the latter is called, they have.

 	 The former may access bprm->cred, the latter may not.

 (3) SELinux.

     SELinux has a number of changes, in addition to those to support the LSM
     interface changes mentioned above:

     (a) The bprm_security_struct struct has been removed in favour of using
     	 the credentials-under-construction approach.

     (c) flush_unauthorized_files() now takes a cred pointer and passes it on
     	 to inode_has_perm(), file_has_perm() and dentry_open().

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:24 +11:00
David Howells
d84f4f992c CRED: Inaugurate COW credentials
Inaugurate copy-on-write credentials management.  This uses RCU to manage the
credentials pointer in the task_struct with respect to accesses by other tasks.
A process may only modify its own credentials, and so does not need locking to
access or modify its own credentials.

A mutex (cred_replace_mutex) is added to the task_struct to control the effect
of PTRACE_ATTACHED on credential calculations, particularly with respect to
execve().

With this patch, the contents of an active credentials struct may not be
changed directly; rather a new set of credentials must be prepared, modified
and committed using something like the following sequence of events:

	struct cred *new = prepare_creds();
	int ret = blah(new);
	if (ret < 0) {
		abort_creds(new);
		return ret;
	}
	return commit_creds(new);

There are some exceptions to this rule: the keyrings pointed to by the active
credentials may be instantiated - keyrings violate the COW rule as managing
COW keyrings is tricky, given that it is possible for a task to directly alter
the keys in a keyring in use by another task.

To help enforce this, various pointers to sets of credentials, such as those in
the task_struct, are declared const.  The purpose of this is compile-time
discouragement of altering credentials through those pointers.  Once a set of
credentials has been made public through one of these pointers, it may not be
modified, except under special circumstances:

  (1) Its reference count may incremented and decremented.

  (2) The keyrings to which it points may be modified, but not replaced.

The only safe way to modify anything else is to create a replacement and commit
using the functions described in Documentation/credentials.txt (which will be
added by a later patch).

This patch and the preceding patches have been tested with the LTP SELinux
testsuite.

This patch makes several logical sets of alteration:

 (1) execve().

     This now prepares and commits credentials in various places in the
     security code rather than altering the current creds directly.

 (2) Temporary credential overrides.

     do_coredump() and sys_faccessat() now prepare their own credentials and
     temporarily override the ones currently on the acting thread, whilst
     preventing interference from other threads by holding cred_replace_mutex
     on the thread being dumped.

     This will be replaced in a future patch by something that hands down the
     credentials directly to the functions being called, rather than altering
     the task's objective credentials.

 (3) LSM interface.

     A number of functions have been changed, added or removed:

     (*) security_capset_check(), ->capset_check()
     (*) security_capset_set(), ->capset_set()

     	 Removed in favour of security_capset().

     (*) security_capset(), ->capset()

     	 New.  This is passed a pointer to the new creds, a pointer to the old
     	 creds and the proposed capability sets.  It should fill in the new
     	 creds or return an error.  All pointers, barring the pointer to the
     	 new creds, are now const.

     (*) security_bprm_apply_creds(), ->bprm_apply_creds()

     	 Changed; now returns a value, which will cause the process to be
     	 killed if it's an error.

     (*) security_task_alloc(), ->task_alloc_security()

     	 Removed in favour of security_prepare_creds().

     (*) security_cred_free(), ->cred_free()

     	 New.  Free security data attached to cred->security.

     (*) security_prepare_creds(), ->cred_prepare()

     	 New. Duplicate any security data attached to cred->security.

     (*) security_commit_creds(), ->cred_commit()

     	 New. Apply any security effects for the upcoming installation of new
     	 security by commit_creds().

     (*) security_task_post_setuid(), ->task_post_setuid()

     	 Removed in favour of security_task_fix_setuid().

     (*) security_task_fix_setuid(), ->task_fix_setuid()

     	 Fix up the proposed new credentials for setuid().  This is used by
     	 cap_set_fix_setuid() to implicitly adjust capabilities in line with
     	 setuid() changes.  Changes are made to the new credentials, rather
     	 than the task itself as in security_task_post_setuid().

     (*) security_task_reparent_to_init(), ->task_reparent_to_init()

     	 Removed.  Instead the task being reparented to init is referred
     	 directly to init's credentials.

	 NOTE!  This results in the loss of some state: SELinux's osid no
	 longer records the sid of the thread that forked it.

     (*) security_key_alloc(), ->key_alloc()
     (*) security_key_permission(), ->key_permission()

     	 Changed.  These now take cred pointers rather than task pointers to
     	 refer to the security context.

 (4) sys_capset().

     This has been simplified and uses less locking.  The LSM functions it
     calls have been merged.

 (5) reparent_to_kthreadd().

     This gives the current thread the same credentials as init by simply using
     commit_thread() to point that way.

 (6) __sigqueue_alloc() and switch_uid()

     __sigqueue_alloc() can't stop the target task from changing its creds
     beneath it, so this function gets a reference to the currently applicable
     user_struct which it then passes into the sigqueue struct it returns if
     successful.

     switch_uid() is now called from commit_creds(), and possibly should be
     folded into that.  commit_creds() should take care of protecting
     __sigqueue_alloc().

 (7) [sg]et[ug]id() and co and [sg]et_current_groups.

     The set functions now all use prepare_creds(), commit_creds() and
     abort_creds() to build and check a new set of credentials before applying
     it.

     security_task_set[ug]id() is called inside the prepared section.  This
     guarantees that nothing else will affect the creds until we've finished.

     The calling of set_dumpable() has been moved into commit_creds().

     Much of the functionality of set_user() has been moved into
     commit_creds().

     The get functions all simply access the data directly.

 (8) security_task_prctl() and cap_task_prctl().

     security_task_prctl() has been modified to return -ENOSYS if it doesn't
     want to handle a function, or otherwise return the return value directly
     rather than through an argument.

     Additionally, cap_task_prctl() now prepares a new set of credentials, even
     if it doesn't end up using it.

 (9) Keyrings.

     A number of changes have been made to the keyrings code:

     (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have
     	 all been dropped and built in to the credentials functions directly.
     	 They may want separating out again later.

     (b) key_alloc() and search_process_keyrings() now take a cred pointer
     	 rather than a task pointer to specify the security context.

     (c) copy_creds() gives a new thread within the same thread group a new
     	 thread keyring if its parent had one, otherwise it discards the thread
     	 keyring.

     (d) The authorisation key now points directly to the credentials to extend
     	 the search into rather pointing to the task that carries them.

     (e) Installing thread, process or session keyrings causes a new set of
     	 credentials to be created, even though it's not strictly necessary for
     	 process or session keyrings (they're shared).

(10) Usermode helper.

     The usermode helper code now carries a cred struct pointer in its
     subprocess_info struct instead of a new session keyring pointer.  This set
     of credentials is derived from init_cred and installed on the new process
     after it has been cloned.

     call_usermodehelper_setup() allocates the new credentials and
     call_usermodehelper_freeinfo() discards them if they haven't been used.  A
     special cred function (prepare_usermodeinfo_creds()) is provided
     specifically for call_usermodehelper_setup() to call.

     call_usermodehelper_setkeys() adjusts the credentials to sport the
     supplied keyring as the new session keyring.

(11) SELinux.

     SELinux has a number of changes, in addition to those to support the LSM
     interface changes mentioned above:

     (a) selinux_setprocattr() no longer does its check for whether the
     	 current ptracer can access processes with the new SID inside the lock
     	 that covers getting the ptracer's SID.  Whilst this lock ensures that
     	 the check is done with the ptracer pinned, the result is only valid
     	 until the lock is released, so there's no point doing it inside the
     	 lock.

(12) is_single_threaded().

     This function has been extracted from selinux_setprocattr() and put into
     a file of its own in the lib/ directory as join_session_keyring() now
     wants to use it too.

     The code in SELinux just checked to see whether a task shared mm_structs
     with other tasks (CLONE_VM), but that isn't good enough.  We really want
     to know if they're part of the same thread group (CLONE_THREAD).

(13) nfsd.

     The NFS server daemon now has to use the COW credentials to set the
     credentials it is going to use.  It really needs to pass the credentials
     down to the functions it calls, but it can't do that until other patches
     in this series have been applied.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:23 +11:00
David Howells
745ca2475a CRED: Pass credentials through dentry_open()
Pass credentials through dentry_open() so that the COW creds patch can have
SELinux's flush_unauthorized_files() pass the appropriate creds back to itself
when it opens its null chardev.

The security_dentry_open() call also now takes a creds pointer, as does the
dentry_open hook in struct security_operations.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:22 +11:00
David Howells
c69e8d9c01 CRED: Use RCU to access another task's creds and to release a task's own creds
Use RCU to access another task's creds and to release a task's own creds.
This means that it will be possible for the credentials of a task to be
replaced without another task (a) requiring a full lock to read them, and (b)
seeing deallocated memory.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:19 +11:00
David Howells
86a264abe5 CRED: Wrap current->cred and a few other accessors
Wrap current->cred and a few other accessors to hide their actual
implementation.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:18 +11:00
David Howells
b6dff3ec5e CRED: Separate task security context from task_struct
Separate the task security context from task_struct.  At this point, the
security data is temporarily embedded in the task_struct with two pointers
pointing to it.

Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
entry.S via asm-offsets.

With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com>

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:16 +11:00
David Howells
1cdcbec1a3 CRED: Neuter sys_capset()
Take away the ability for sys_capset() to affect processes other than current.

This means that current will not need to lock its own credentials when reading
them against interference by other processes.

This has effectively been the case for a while anyway, since:

 (1) Without LSM enabled, sys_capset() is disallowed.

 (2) With file-based capabilities, sys_capset() is neutered.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Andrew G. Morgan <morgan@kernel.org>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:14 +11:00
David Howells
da9592edeb CRED: Wrap task credential accesses in the filesystem subsystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:05 +11:00
David Howells
82ab8deda7 CRED: Wrap task credential accesses in the XFS filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: xfs@oss.sgi.com
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:04 +11:00
David Howells
a5f773a659 CRED: Wrap task credential accesses in the UFS filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:04 +11:00
David Howells
7706bb39ad CRED: Wrap task credential accesses in the UDF filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:03 +11:00
David Howells
26bf1946e6 CRED: Wrap task credential accesses in the UBIFS filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Artem Bityutskiy <dedekind@infradead.org>
Cc: Adrian Hunter <ext-adrian.hunter@nokia.com>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:03 +11:00
David Howells
fc7333deb7 CRED: Wrap task credential accesses in the SYSV filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:02 +11:00
David Howells
e2950b178e CRED: Wrap task credential accesses in the SMBFS filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:01 +11:00
David Howells
414cb209ea CRED: Wrap task credential accesses in the ReiserFS filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: reiserfs-devel@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:01 +11:00
David Howells
0785f4dad0 CRED: Wrap task credential accesses in the RAMFS filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:00 +11:00
David Howells
c222d53eb3 CRED: Wrap task credential accesses in the OMFS filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Bob Copeland <me@bobcopeland.com>
Cc: linux-karma-devel@lists.sourceforge.net
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:59 +11:00
David Howells
b19c2a3b83 CRED: Wrap task credential accesses in the OCFS2 filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: ocfs2-devel@oss.oracle.com
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:59 +11:00
David Howells
5cc0a84076 CRED: Wrap task credential accesses in the NFS daemon
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:58 +11:00
David Howells
48937024c6 CRED: Wrap task credential accesses in the NCPFS filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Petr Vandrovec <vandrove@vc.cvut.cz>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:58 +11:00
David Howells
922c030f26 CRED: Wrap task credential accesses in the Minix filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:57 +11:00
David Howells
8f659adfa2 CRED: Wrap task credential accesses in the JFS filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: jfs-discussion@lists.sourceforge.net
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:56 +11:00
David Howells
77c70de15a CRED: Wrap task credential accesses in the hugetlbfs filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: William Irwin <wli@holomorphy.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:56 +11:00
David Howells
de395b8ac2 CRED: Wrap task credential accesses in the HPFS filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:55 +11:00
David Howells
4ac8489a72 CRED: Wrap task credential accesses in the HFSplus filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:54 +11:00
David Howells
94c9a5ee4c CRED: Wrap task credential accesses in the HFS filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:54 +11:00
David Howells
3de7be3355 CRED: Wrap task credential accesses in the GFS2 filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: cluster-devel@redhat.com
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:53 +11:00
David Howells
2186a71cbc CRED: Wrap task credential accesses in the FUSE filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: fuse-devel@lists.sourceforge.net
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:53 +11:00
David Howells
f0ce7ee3a8 CRED: Wrap task credential accesses in the FAT filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:52 +11:00
David Howells
4c9c544e49 CRED: Wrap task credential accesses in the Ext4 filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: adilger@sun.com
Cc: linux-ext4@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:51 +11:00
David Howells
6a2f90e9fa CRED: Wrap task credential accesses in the Ext3 filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: adilger@sun.com
Cc: linux-ext4@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:51 +11:00
David Howells
a8dd4d67bd CRED: Wrap task credential accesses in the Ext2 filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:50 +11:00
David Howells
4eea03539d CRED: Wrap task credential accesses in the eCryptFS filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Mike Halcrow <mhalcrow@us.ibm.com>
Cc: Phillip Hellewell <phillip@hellewell.homeip.net>
Cc: ecryptfs-devel@lists.sourceforge.net
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:49 +11:00
David Howells
ec4c2aacd1 CRED: Wrap task credential accesses in the devpts filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:49 +11:00
David Howells
97b7702cd1 CRED: Wrap task credential accesses in the Coda filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: codalist@coda.cs.cmu.edu
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:48 +11:00
David Howells
a001e5b558 CRED: Wrap task credential accesses in the CIFS filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Steve French <sfrench@samba.org>
Cc: linux-cifs-client@lists.samba.org
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:47 +11:00
David Howells
1109b07b7d CRED: Wrap task credential accesses in the BFS filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Tigran A. Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:47 +11:00
David Howells
0eb790e3a2 CRED: Wrap task credential accesses in the autofs4 filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Ian Kent <raven@themaw.net>
Cc: autofs@linux.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:46 +11:00
David Howells
73c646e4af CRED: Wrap task credential accesses in the autofs filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: autofs@linux.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:45 +11:00
David Howells
215599815d CRED: Wrap task credential accesses in the AFFS filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:45 +11:00
David Howells
f8b9d53a31 CRED: Wrap task credential accesses in 9P2000 filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Reviewed-by: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: v9fs-developer@lists.sourceforge.net
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:44 +11:00
Steve French
fb39601664 [CIFS] remove unused list, add new cifs sock list to prepare for mount/umount fix
Also adds two lines missing from the previous patch (for the need reconnect flag in the
/proc/fs/cifs/DebugData handling)

The new global_cifs_sock_list is added, and initialized in init_cifs but not used yet.
Jeff Layton will be adding code in to use that and to remove the GlobalTcon and GlobalSMBSession
lists.

CC: Jeff Layton <jlayton@redhat.com>
CC: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-11-13 20:04:07 +00:00
Linus Torvalds
7b42365396 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
  dlm: fix shutdown cleanup
2008-11-13 11:56:05 -08:00
Steve French
3b79521093 [CIFS] Fix cifs reconnection flags
In preparation for Jeff's big umount/mount fixes to remove the possibility of
various races in cifs mount and linked list handling of sessions, sockets and
tree connections, this patch cleans up some repetitive code in cifs_mount,
and addresses a problem with ses->status and tcon->tidStatus in which we
were overloading the "need_reconnect" state with other status in that
field.  So the "need_reconnect" flag has been broken out from those
two state fields (need reconnect was not mutually exclusive from some of the
other possible tid and ses states).  In addition, a few exit cases in
cifs_mount were cleaned up, and a problem with a tcon flag (for lease support)
was not being set consistently for the 2nd mount of the same share

CC: Jeff Layton <jlayton@redhat.com>
CC: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-11-13 19:45:32 +00:00
David Teigland
278afcbf4f dlm: fix shutdown cleanup
Fixes a regression from commit 0f8e0d9a31,
"dlm: allow multiple lockspace creates".

An extraneous 'else' slipped into a code fragment being moved from
release_lockspace() to dlm_release_lockspace().  The result of the
unwanted 'else' is that dlm threads and structures are not stopped
and cleaned up when the final dlm lockspace is removed.  Trying to
create a new lockspace again afterward will fail with
"kmem_cache_create: duplicate cache dlm_conn" because the cache
was not previously destroyed.

Signed-off-by: David Teigland <teigland@redhat.com>
2008-11-13 13:22:34 -06:00
Theodore Tso
6cdfcc275e ext3: Clean up outdated and incorrect comment for ext3_write_super()
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-12 17:17:17 -08:00
Eric W. Biederman
afef80b3d8 vfs: fix shrink_submounts
In the last refactoring of shrink_submounts a variable was not completely
renamed.  So finish the renaming of mnt to m now.

Without this if you attempt to mount an nfs mount that has both automatic
nfs sub mounts on it, and has normal mounts on it.  The unmount will
succeed when it should not.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-12 17:17:17 -08:00
Christoph Hellwig
5a6bb10393 fat: make sure to set d_ops in fat_get_parent
fat_get_parent needs to setup the dentry operations, otherwise we might
lose them when the NFS server needs to reconnect out of cache inodes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2008-11-12 08:51:22 +09:00
OGAWA Hirofumi
985eafcc54 fat: fix duplicate addition of ->llseek handler
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2008-11-12 08:51:22 +09:00
OGAWA Hirofumi
ebeb0406f1 fat: drop negative dentry on rename() path
Drop the negative dentry on rename() path, in order to make sure to
use the case sensitive name which is specified by user if this is for
creation.

For it, this uses newly added LOOKUP_RENAME_TARGET like LOOKUP_CREATE.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2008-11-12 08:51:22 +09:00
David S. Miller
7e452baf6b Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/message/fusion/mptlan.c
	drivers/net/sfc/ethtool.c
	net/mac80211/debugfs_sta.c
2008-11-11 15:43:02 -08:00
Linus Torvalds
04ca2c17e3 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  [XFS] XFS: Check for valid transaction headers in recovery
  [XFS] handle memory allocation failures during log initialisation
  [XFS] Account for allocated blocks when expanding directories
  [XFS] Wait for all I/O on truncate to zero file size
  [XFS] Fix use-after-free with log and quotas
2008-11-11 09:32:58 -08:00
Tiger Yang
6c1e183e12 ocfs2: Check search result in ocfs2_xattr_block_get()
ocfs2_xattr_block_get() calls ocfs2_xattr_search() to find an external
xattr, but doesn't check the search result that is passed back via struct
ocfs2_xattr_search. Add a check for search result, and pass back -ENODATA if
the xattr search failed. This avoids a later NULL pointer error.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:47 -08:00
Mark Fasheh
de29c08528 ocfs2: fix printk related build warnings in xattr.c
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:47 -08:00
Dmitri Monakhov
c435400140 ocfs2: truncate outstanding block after direct io failure
Signed-off-by: Dmitri Monakhov <dmonakhov@openvz.org>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Joel Becker <Joel.Becker@oracle.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:47 -08:00
Tao Ma
80bcaf3469 ocfs2/xattr: Proper hash collision handle in bucket division
In ocfs2/xattr, we must make sure the xattrs which have the same hash value
exist in the same bucket so that the search schema can work. But in the old
implementation, when we want to extend a bucket, we just move half number of
xattrs to the new bucket. This works in most cases, but if we are lucky
enough we will move 2 xattrs into 2 different buckets. This means that an
xattr from the previous bucket cannot be found anymore. This patch fix this
problem by finding the right position during extending the bucket and extend
an empty bucket if needed.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Cc: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:47 -08:00
Tao Ma
4c1bbf1ba6 ocfs2: return 0 in page_mkwrite to let VFS retry.
In ocfs2_page_mkwrite, we return -EINVAL when we found the page mapping
isn't updated, and it will cause the user space program get SIGBUS and
exit. The reason is that during race writeable mmap, we will do
unmap_mapping_range in ocfs2_data_downconvert_worker. The good thing is
that if we reuturn 0 in page_mkwrite, VFS will retry fault and then
call page_mkwrite again, so it is safe to return 0 here.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:47 -08:00
Sunil Mushran
ae0dff6830 ocfs2: Set journal descriptor to NULL after journal shutdown
Patch sets journal descriptor to NULL after the journal is shutdown.
This ensures that jbd2_journal_release_jbd_inode(), which removes the
jbd2 inode from txn lists, can be called safely from ocfs2_clear_inode()
even after the journal has been shutdown.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:47 -08:00
Tao Ma
d32647993c ocfs2: Fix check of return value of ocfs2_start_trans() in xattr.c.
On failure, ocfs2_start_trans() returns values like ERR_PTR(-ENOMEM),
so we should check whether handle is NULL. Fix them to use IS_ERR().
Jan has made the patch for other part in ocfs2(thank Jan for it), so
this is just the fix for fs/ocfs2/xattr.c.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:47 -08:00
Jan Kara
b99835c168 ocfs2: Let inode be really deleted when ocfs2_mknod_locked() fails
We forgot to set i_nlink to 0 when returning due to error from ocfs2_mknod_locked()
and thus inode was not properly released via ocfs2_delete_inode() (e.g. claimed
space was not released). Fix it.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:46 -08:00
Jan Kara
87cfa00432 ocfs2: Fix checking of return value of new_inode()
new_inode() does not return ERR_PTR() but NULL in case of failure. Correct
checking of the return value.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:46 -08:00
Jan Kara
fa38e92cb3 ocfs2: Fix check of return value of ocfs2_start_trans()
On failure, ocfs2_start_trans() returns values like ERR_PTR(-ENOMEM).
Thus checks for !handle are wrong. Fix them to use IS_ERR().

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:46 -08:00
Tao Ma
8573f79d30 ocfs2: Fix some typos in xattr annotations.
Fix some typos in the xattr annotations.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Reported-by: Coly Li <coyli@suse.de>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:46 -08:00
Tao Ma
63fd775737 ocfs2: Remove unused ocfs2_restore_xattr_block().
Since now ocfs2 supports empty xattr buckets, we will never remove
the xattr index tree even if all the xattrs are removed, so this
function will never be called. So remove it.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:46 -08:00
Joel Becker
54f443f4e7 ocfs2: Don't repeat ocfs2_xattr_block_find()
ocfs2_xattr_block_get() looks up the xattr in a startlingly familiar
way; it's identical to the function ocfs2_xattr_block_find().  Let's just
use the later in the former.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:46 -08:00
Joel Becker
eb6ff2397d ocfs2: Specify appropriate journal access for new xattr buckets.
There are a couple places that get an xattr bucket that may be reading
an existing one or may be allocating a new one.  They should specify the
correct journal access mode depending.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:46 -08:00
Joel Becker
bd60bd37ad ocfs2: Check errors from ocfs2_xattr_update_xattr_search()
The ocfs2_xattr_update_xattr_search() function can return an error when
trying to read blocks off of disk.  The caller needs to check this error
before using those (possibly invalid) blocks.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:45 -08:00
Joel Becker
b37c4d84e9 ocfs2: Don't return -EFAULT from a corrupt xattr entry.
If the xattr disk structures are corrupt, return -EIO, not -EFAULT.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:45 -08:00
Joel Becker
f6087fb799 ocfs2: Check xattr block signatures properly.
The xattr.c code is currently memcmp()ing naking buffer pointers.
Create the OCFS2_IS_VALID_XATTR_BLOCK() macro to match its peers and use
that.

In addition, failed signature checks were returning -EFAULT, which is
completely wrong.  Return -EIO.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:44 -08:00
Tiger Yang
c988fd045f ocfs2: add handler_map array bounds checking
Make the handler_map array as large as the possible value range to avoid
a fencepost error.

[ Utilize alternate method -- Joel ]

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:44 -08:00
Tiger Yang
ceb1eba3dc ocfs2: remove duplicate definition in xattr
Include/linux/xattr.h already has the definition about xattr prefix,
so remove the duplicate definitions in xattr.c.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:44 -08:00
Tiger Yang
0030e00150 ocfs2: fix function declaration and definition in xattr
Because we merged the xattr sources into one file, some functions
no longer belong in the header file.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:44 -08:00
Tiger Yang
c3cb682735 ocfs2: fix license in xattr
This patch fixes the license in xattr.c and xattr.h.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:43 -08:00
David Chinner
220ca310a5 [XFS] XFS: Check for valid transaction headers in recovery
When we are about to add a new item to a transaction in recovery, we need
to check that it is valid first. Currently we just assert that header
magic number matches, but in production systems that is not present and we
add a corrupted transaction to the list to be processed. This results in a
kernel oops later when processing the corrupted transaction.

Instead, if we detect a corrupted transaction, abort recovery and leave
the user to clean up the mess that has occurred.

SGI-PV: 988145

SGI-Modid: xfs-linux-melb:xfs-kern:32356a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-11-10 18:01:50 +11:00
Dave Chinner
8f330f5149 [XFS] handle memory allocation failures during log initialisation
When there is no memory left in the system, xfs_buf_get_noaddr()
can fail. If this happens at mount time during xlog_alloc_log()
we fail to catch the error and oops.

Catch the error from xfs_buf_get_noaddr(), and allow other memory
allocations to fail and catch those errors too. Report the error
to the console and fail the mount with ENOMEM.

Tested by manually injecting errors into xfs_buf_get_noaddr() and
xlog_alloc_log().

Version 2:
o remove unnecessary casts of the returned pointer from kmem_zalloc()

SGI-PV: 987246

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-11-10 17:57:06 +11:00
David Chinner
6f9f51adb6 [XFS] Account for allocated blocks when expanding directories
When we create a directory, we reserve a number of blocks for the maximum
possible expansion of of the directory due to various btree splits,
freespace allocation, etc. Unfortunately, each allocation is not reflected
in the total number of blocks still available to the transaction, so the
maximal reservation is used over and over again.

This leads to problems where an allocation group has only enough blocks
for *some* of the allocations required for the directory modification.
After the first N allocations, the remaining blocks in the allocation
group drops below the total reservation, and subsequent allocations fail
because the allocator will not allow the allocation to proceed if the AG
does not have the enough blocks available for the entire allocation total.

This results in an ENOSPC occurring after an allocation has already
occurred. This results in aborting the directory operation (leaving the
directory in an inconsistent state) and cancelling a dirty transaction,
which results in a filesystem shutdown.

Avoid the problem by reflecting the number of blocks allocated in any
directory expansion in the total number of blocks available to the
modification in progress. This prevents a directory modification from
being aborted part way through with an ENOSPC.

SGI-PV: 988144

SGI-Modid: xfs-linux-melb:xfs-kern:32340a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-11-10 17:51:14 +11:00
Lachlan McIlroy
2cf7f0da3a [XFS] Wait for all I/O on truncate to zero file size
It's possible to have outstanding xfs_ioend_t's queued when the file size
is zero. This can happen in the direct I/O path when a direct I/O write
fails due to ENOSPC. In this case the xfs_ioend_t will still be queued (ie
xfs_end_io_direct() does not know that the I/O failed so can't force the
xfs_ioend_t to be flushed synchronously).

When we truncate a file on unlink we don't know to wait for these
xfs_ioend_ts and we can have a use-after-free situation if the inode is
reclaimed before the xfs_ioend_t is finally processed.

As was suggested by Dave Chinner lets wait for all I/Os to complete when
truncating the file size to zero.

SGI-PV: 981668

SGI-Modid: xfs-linux-melb:xfs-kern:32216a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-11-10 17:51:00 +11:00
Lachlan McIlroy
9ccbece546 [XFS] Fix use-after-free with log and quotas
Destroying the quota stuff on unmount can access the log - ie
XFS_QM_DONE() ends up in xfs_dqunlock() which calls
xfs_trans_unlocked_item() and then xfs_log_move_tail(). By this time the
log has already been destroyed. Just move the cleanup of the quota code
earlier in xfs_unmountfs() before the call to xfs_log_unmount(). Moving
XFS_QM_DONE() up near XFS_QM_DQPURGEALL() seems like a good spot.

SGI-PV: 987086

SGI-Modid: xfs-linux-melb:xfs-kern:32148a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Peter Leckie <pleckie@sgi.com>
2008-11-10 17:43:23 +11:00
Dave Chinner
6307091fe6 [XFS] Avoid using inodes that haven't been completely initialised
The radix tree walks in xfs_sync_inodes_ag and xfs_qm_dqrele_all_inodes()
can find inodes that are still undergoing initialisation. Avoid
them by checking for the the XFS_INEW() flag once we have a reference
on the inode. This flag is cleared once the inode is properly initialised.

SGI-PV: 987246

Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-11-10 17:13:23 +11:00
Dave Chinner
cb4f0d1d42 [XFS] fix uninitialised variable bug in dquot release
gcc on ARM warns about an using an uninitialised variable
in xfs_qm_dqrele_all_inodes(). This is a real bug, but gcc
on x86_64 is not reporting this warning so it went unnoticed.

Fix the bug by bring the inode radix tree walk code up to
date with xfs_sync_inodes_ag().

SGI-PV: 987246

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-11-10 17:11:18 +11:00
Stephen Rothwell
d44dab8d1c fs: xfs needs inode_wait to be exported
Since wait_on_inode() references it.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-11-10 17:06:05 +11:00
Dave Chinner
644c3567d1 [XFS] handle memory allocation failures during log initialisation
When there is no memory left in the system, xfs_buf_get_noaddr()
can fail. If this happens at mount time during xlog_alloc_log()
we fail to catch the error and oops.

Catch the error from xfs_buf_get_noaddr(), and allow other memory
allocations to fail and catch those errors too. Report the error
to the console and fail the mount with ENOMEM.

Tested by manually injecting errors into xfs_buf_get_noaddr() and
xlog_alloc_log().

Version 2:
o remove unnecessary casts of the returned pointer from kmem_zalloc()

SGI-PV: 987246

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-11-10 16:50:24 +11:00
Linus Torvalds
8b805ef617 Merge branch 'for-2.6.28' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.28' of git://linux-nfs.org/~bfields/linux:
  Fix nfsd truncation of readdir results
2008-11-09 12:25:44 -08:00
Doug Nazar
b726e923ea Fix nfsd truncation of readdir results
Commit 8d7c4203 "nfsd: fix failure to set eof in readdir in some
situations" introduced a bug: on a directory in an exported ext3
filesystem with dir_index unset, a READDIR will only return about 250
entries, even if the directory was larger.

Bisected it back to this commit; reverting it fixes the problem.

It turns out that in this case ext3 reads a block at a time, then
returns from readdir, which means we can end up with buf.full==0 but
with more entries in the directory still to be read.  Before 8d7c4203
(but after c002a6c797 "Optimise NFS readdir hack slightly"), this would
cause us to return the READDIR result immediately, but with the eof bit
unset.  That could cause a performance regression (because the client
would need more roundtrips to the server to read the whole directory),
but no loss in correctness, since the cleared eof bit caused the client
to send another readdir.  After 8d7c4203, the setting of the eof bit
made this a correctness problem.

So, move nfserr_eof into the loop and remove the buf.full check so that
we loop until buf.used==0.  The following seems to do the right thing
and reduces the network traffic since we don't return a READDIR result
until the buffer is full.

Tested on an empty directory & large directory; eof is properly sent and
there are no more short buffers.

Signed-off-by: Doug Nazar <nazard@dragoninc.ca>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-11-09 15:15:50 -05:00
Linus Torvalds
1538a093f7 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: add checksum calculation when clearing UNINIT flag in ext4_new_inode
  ext4: Mark the buffer_heads as dirty and uptodate after prepare_write
  ext4: calculate journal credits correctly
  ext4: wait on all pending commits in ext4_sync_fs()
  ext4: Convert to host order before using the values.
  ext4: fix missing ext4_unlock_group in error path
  jbd2: deregister proc on failure in jbd2_journal_init_inode
  jbd2: don't give up looking for space so easily in __jbd2_log_wait_for_space
  jbd: don't give up looking for space so easily in __log_wait_for_space
2008-11-07 08:15:18 -08:00
Frederic Bohe
23712a9c28 ext4: add checksum calculation when clearing UNINIT flag in ext4_new_inode
When initializing an uninitialized block group in ext4_new_inode(),
its block group checksum must be re-calculated.  This fixes a race
when several threads try to allocate a new inode in an UNINIT'd group.

There is some question whether we need to be initializing the block
bitmap in ext4_new_inode() at all, but for now, if we are going to
init the block group, let's eliminate the race.

Signed-off-by: Frederic Bohe <frederic.bohe@bull.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-11-07 09:21:01 -05:00
Aneesh Kumar K.V
ed9b3e3379 ext4: Mark the buffer_heads as dirty and uptodate after prepare_write
We need to make sure we mark the buffer_heads as dirty and uptodate
so that block_write_full_page write them correctly.

This fixes mmap corruptions that can occur in low memory situations.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-11-07 09:06:45 -05:00
Adrian Hunter
7e2d9bfa4e UBIFS: allow for gaps when dirtying the LPT
The LPT may have gaps in it because initially empty LEBs
are not added by mkfs.ubifs - because it does not know how
many there are.  Then UBIFS allocates empty LEBs in the
reverse order that they are discovered i.e. they are
added to, and removed from, the front of a list.  That
creates a gap in the middle of the LPT.

The function dirtying the LPT tree (for the purpose of
small model garbage collection) assumed that a gap could
only occur at the very end of the LPT and stopped dirtying
prematurely, which in turn resulted in the LPT running
out of space - something that is designed to be impossible.

Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-11-07 12:11:52 +02:00
David S. Miller
9eeda9abd1 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/wireless/ath5k/base.c
	net/8021q/vlan_core.c
2008-11-06 22:43:03 -08:00
Niv Sardi
dcd7b4e5c0 Merge branch 'master' of git://oss.sgi.com:8090/xfs/linux-2.6 2008-11-07 15:07:12 +11:00
Linus Torvalds
e252f4db18 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  Block: use round_jiffies_up()
  Add round_jiffies_up and related routines
  block: fix __blkdev_get() for removable devices
  generic-ipi: fix the smp_mb() placement
  blk: move blk_delete_timer call in end_that_request_last
  block: add timer on blkdev_dequeue_request() not elv_next_request()
  bio: define __BIOVEC_PHYS_MERGEABLE
  block: remove unused ll_new_mergeable()
2008-11-06 15:53:47 -08:00
Linus Torvalds
c361948712 Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6:
  [JFFS2] fix race condition in jffs2_lzo_compress()
  [MTD] [NOR] Fix cfi_send_gen_cmd handling of x16 devices in x8 mode (v4)
  [JFFS2] Fix lack of locking in thread_should_wake()
  [JFFS2] Fix build failure with !CONFIG_JFFS2_FS_WRITEBUFFER
  [MTD] [NAND] OMAP2: remove duplicated #include
2008-11-06 15:43:13 -08:00
OGAWA Hirofumi
c3302931db fat: i_blocks warning fix
blkcnt_t type depends on CONFIG_LSF. Use unsigned long long always for
printk().  But lazy to type it, so add "llu" and use it.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:22 -08:00
OGAWA Hirofumi
9ca59f4c3d fat: ->i_pos race fix
i_pos is 64bits value, hence it's not atomic to update.

Important place is fat_write_inode() only, other places without lock
are just for printk().

This adds lock for "BITS_PER_LONG == 32" kernel.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:21 -08:00
OGAWA Hirofumi
2bdf67eb16 fat: mmu_private race fix
mmu_private is 64bits value, hence it's not atomic to update.

So, the access rule for mmu_private is we must hold ->i_mutex.  But,
fat_get_block() path doesn't follow the rule on non-allocation path.

This fixes by using i_size instead if non-allocation path.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:21 -08:00
OGAWA Hirofumi
0e75f5da06 fat: Add printf attribute to fat_fs_panic()
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:21 -08:00
OGAWA Hirofumi
fa93ca18a8 fat: Fix _fat_bmap() race
fat_get_cluster() assumes the requested blocknr isn't truncated during
read. _fat_bmap() doesn't follow this rule.

This protects it by ->i_mutex.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:21 -08:00
OGAWA Hirofumi
dfc209c006 fat: Fix ATTR_RO for directory
FAT has the ATTR_RO (read-only) attribute. But on Windows, the ATTR_RO
of the directory will be just ignored actually, and is used by only
applications as flag. E.g. it's setted for the customized folder by
Explorer.

http://msdn2.microsoft.com/en-us/library/aa969337.aspx

This adds "rodir" option. If user specified it, ATTR_RO is used as
read-only flag even if it's the directory. Otherwise, inode->i_mode
is not used to hold ATTR_RO (i.e. fat_mode_can_save_ro() returns 0).

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:21 -08:00
OGAWA Hirofumi
9183482f5d fat: Fix ATTR_RO in the case of (~umask & S_WUGO) == 0
If inode->i_mode doesn't have S_WUGO, current code assumes it means
ATTR_RO.  However, if (~[ufd]mask & S_WUGO) == 0, inode->i_mode can't
hold S_WUGO. Therefore the updated directory entry will always have
ATTR_RO.

This adds fat_mode_can_hold_ro() to check it. And if inode->i_mode
can't hold, uses -i_attrs to hold ATTR_RO instead.

With this, we don't set ATTR_RO unless users change it via ioctl() if
(~[ufd]mask & S_WUGO) == 0.

And on FAT_IOCTL_GET_ATTRIBUTES path, this adds ->i_mutex to it for
not returning the partially updated attributes by FAT_IOCTL_SET_ATTRIBUTES
to userland.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:21 -08:00
OGAWA Hirofumi
9c0aa1b87b fat: Cleanup FAT attribute stuff
This adds three helpers:

fat_make_attrs() - makes FAT attributes from inode.
fat_make_mode()  - makes mode_t from FAT attributes.
fat_save_attrs() - saves FAT attributes to inode.

Then this replaces: MSDOS_MKMODE() by fat_make_mode(), fat_attr() by
fat_make_attrs(), ->i_attrs = attr & ATTR_UNUSED by fat_save_attrs().
And for root inode, those is used with ATTR_DIR instead of bogus
ATTR_NONE.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:21 -08:00
OGAWA Hirofumi
45cfbe3547 fat: Cleanup msdos_lookup()
Use same style with vfat_lookup().

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:21 -08:00
OGAWA Hirofumi
1c13a243a4 fat: Kill d_invalidate() in vfat_lookup()
d_invalidate() for positive dentry doesn't work in some cases
(vfsmount, nfsd, and maybe others). shrink_dcache_parent() by
d_invalidate() is pointless for vfat usage at all.

So, this kills it, and intead of it uses d_move().

To save old behavior, this returns alias simply for directory (don't
change pwd, etc..). the directory lookup shouldn't be important for
performance.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:21 -08:00
OGAWA Hirofumi
1b52467243 fat: Fix/Cleanup dcache handling for vfat
- Add comments for handling dcache of vfat.

- Separate case-sensitive case and case-insensitive to
  vfat_revalidate() and vfat_ci_revalidate().

  vfat_revalidate() doesn't need to drop case-insensitive negative
  dentry on creation path.

- Current code is missing to set ->d_revalidate to the negative dentry
  created by unlink/etc..

  This sets ->d_revalidate always, and returns 1 for positive
  dentry. Now, we don't need to change ->d_op dynamically anymore,
  so this just uses sb->s_root->d_op to set ->d_op.

- d_find_alias() may return DCACHE_DISCONNECTED dentry. It's not
  the interesting dentry there. This checks it.

- Add missing LOOKUP_PARENT check. We don't need to drop the valid
  negative dentry for (LOOKUP_CREATE | LOOKUP_PARENT) lookup.

- For consistent filename on creation path, this drops negative dentry
  if we can't see intent.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:21 -08:00
OGAWA Hirofumi
068f5ae05c vfat: Fix vfat_find() error path in vfat_lookup()
Current vfat_lookup() creates negetive dentry blindly if vfat_find()
returned a error. It's wrong. If the error isn't -ENOENT, just return
error.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:21 -08:00
OGAWA Hirofumi
a993b542bb fat: use fat_detach() in fat_clear_inode()
Use fat_detach() instead of opencoding it.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:20 -08:00
OGAWA Hirofumi
5e35dd4651 fat: Fix fat_ent_update_ptr() for FAT12
This fixes the missing update for bhs/nr_bhs in case the caller
accessed from block boundary to first block of boundary.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:20 -08:00
OGAWA Hirofumi
d3dfa8228f fat: improve fat_hash()
fat_hash() is using the algorithm known as bad. Instead of it, this
uses hash_32(). The following is the summary of test.

old hash:
	hash func (1000 times): 33489 cycles
	total inodes in hash table: 70926
	largest bucket contains: 696
	smallest bucket contains: 54

new hash:
	hash func (1000 times): 33129 cycles
	total inodes in hash table: 70926
	largest bucket contains: 315
	smallest bucket contains: 236

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:20 -08:00
Darren Jenkins
52e9d9f4b3 fat: cleanup fat_parse_long() error handling
Coverity CID 2332 & 2333 RESOURCE_LEAK

In fat_search_long() if fat_parse_long() returns a -ve value we return
without first freeing unicode.  This patch free's them on this error path.

The above was false positive on current tree, but this change is more
clean, so apply as cleanup.

[hirofumi@mail.parknet.co.jp: fix coding style]
Signed-off-by: Darren Jenkins <darrenrjenkins@gmail.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:20 -08:00
OGAWA Hirofumi
53472bc8f8 fat: use generic_file_llseek() for directory
Since fat_dir_ioctl() was already fixed (i.e. called under ->i_mutex),
and __fat_readdir() doesn't take BKL anymore. So, BKL for ->llseek()
is pointless, and we have to use generic_file_llseek().

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:20 -08:00
OGAWA Hirofumi
7decd1cb03 fat: Fix and cleanup timestamp conversion
This cleans date_dos2unix()/fat_date_unix2dos() up. New code should be
much more readable.

And this fixes those old functions. Those doesn't handle 2100
correctly. 2100 isn't leap year, but old one handles it as leap year.
Also, with this, centi sec is handled and is fixed.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:20 -08:00
OGAWA Hirofumi
9e975dae29 fat: split include/msdos_fs.h
This splits __KERNEL__ stuff in include/msdos_fs.h into fs/fat/fat.h.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:20 -08:00
OGAWA Hirofumi
990e194e69 fat: move fs/vfat/* and fs/msdos/* to fs/fat
This just moves those files, but change link order from MSDOS, VFAT to
VFAT, MSDOS.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:20 -08:00
Arthur Jones
c87591b719 ext3: wait on all pending commits in ext3_sync_fs
In ext3_sync_fs, we only wait for a commit to finish if we started it, but
there may be one already in progress which will not be synced.

In the case of a data=ordered umount with pending long symlinks which are
delayed due to a long list of other I/O on the backing block device, this
causes the buffer associated with the long symlinks to not be moved to the
inode dirty list in the second phase of fsync_super.  Then, before they
can be dirtied again, kjournald exits, seeing the UMOUNT flag and the
dirty pages are never written to the backing block device, causing long
symlink corruption and exposing new or previously freed block data to
userspace.

This can be reproduced with a script created
by Eric Sandeen <sandeen@redhat.com>:

	#!/bin/bash

	umount /mnt/test2
	mount /dev/sdb4 /mnt/test2
	rm -f /mnt/test2/*
	dd if=/dev/zero of=/mnt/test2/bigfile bs=1M count=512
	touch
	/mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
	ln -s
	/mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
	/mnt/test2/link
	umount /mnt/test2
	mount /dev/sdb4 /mnt/test2
	ls /mnt/test2/
	umount /mnt/test2

To ensure all commits are synced, we flush all journal commits now when
sync_fs'ing ext3.

Signed-off-by: Arthur Jones <ajones@riverbed.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: <linux-ext4@vger.kernel.org>
Cc: <stable@kernel.org>		[2.6.everything]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:19 -08:00
Ian Kent
96b0317906 autofs4: collect version check return
The function check_dev_ioctl_version() returns an error code upon fail but
it isn't captured and returned in validate_dev_ioctl() as it should be.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:17 -08:00
Ian Kent
bc9c406838 autofs4: correct offset mount expire check
When checking a directory tree in autofs_tree_busy() we can incorrectly
decide that the tree isn't busy.  This happens for the case of an active
offset mount as autofs4_follow_mount() follows past the active offset
mount, which has an open file handle used for expires, causing the file
handle not to count toward the busyness check.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:17 -08:00
Theodore Ts'o
ac51d83705 ext4: calculate journal credits correctly
This fixes a 2.6.27 regression which was introduced in commit a02908f1.

We weren't passing the chunk parameter down to the two subections,
ext4_indirect_trans_blocks() and ext4_ext_index_trans_blocks(), with
the result that massively overestimate the amount of credits needed by
ext4_da_writepages, especially in the non-extents case.  This causes
failures especially on /boot partitions, which tend to be small and
non-extent using since GRUB doesn't handle extents.

This patch fixes the bug reported by Joseph Fannin at:
http://bugzilla.kernel.org/show_bug.cgi?id=11964

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-11-06 16:49:36 -05:00
Artem Bityutskiy
e84461ad9c UBIFS: fix compilation warnings
We print 'ino_t' type using '%lu' printk() placeholder, but this
results in many warnings when compiling for Alpha platform. Fix
this by adding (unsingned long) casts.

Fixes these warnings:

fs/ubifs/journal.c:693: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/journal.c:1131: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/dir.c:163: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/tnc.c:2680: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/tnc.c:2700: warning: format '%lu' expects type 'long unsigned int', but argument 5 has type 'ino_t'
fs/ubifs/replay.c:1066: warning: format '%lu' expects type 'long unsigned int', but argument 7 has type 'ino_t'
fs/ubifs/orphan.c:108: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/orphan.c:135: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/orphan.c:142: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/orphan.c:154: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/orphan.c:159: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/orphan.c:451: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/orphan.c:539: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/orphan.c:612: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/orphan.c:843: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/orphan.c:856: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/recovery.c:1438: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/recovery.c:1443: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/recovery.c:1475: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/recovery.c:1495: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/debug.c:105: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'ino_t'
fs/ubifs/debug.c:105: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'ino_t'
fs/ubifs/debug.c:110: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'ino_t'
fs/ubifs/debug.c:110: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'ino_t'
fs/ubifs/debug.c:114: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'ino_t'
fs/ubifs/debug.c:114: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'ino_t'
fs/ubifs/debug.c:118: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'ino_t'
fs/ubifs/debug.c:118: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'ino_t'
fs/ubifs/debug.c:1591: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/debug.c:1671: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/debug.c:1674: warning: format '%lu' expects type 'long unsigned int', but argument 5 has type 'ino_t'
fs/ubifs/debug.c:1680: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/debug.c:1699: warning: format '%lu' expects type 'long unsigned int', but argument 5 has type 'ino_t'
fs/ubifs/debug.c:1788: warning: format '%lu' expects type 'long unsigned int', but argument 5 has type 'ino_t'
fs/ubifs/debug.c:1821: warning: format '%lu' expects type 'long unsigned int', but argument 5 has type 'ino_t'
fs/ubifs/debug.c:1833: warning: format '%lu' expects type 'long unsigned int', but argument 5 has type 'ino_t'
fs/ubifs/debug.c:1924: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/debug.c:1932: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/debug.c:1938: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/debug.c:1945: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/debug.c:1953: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/debug.c:1960: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/debug.c:1967: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/debug.c:1973: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/debug.c:1988: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
fs/ubifs/debug.c:1991: warning: format '%lu' expects type 'long unsigned int', but argument 5 has type 'ino_t'
fs/ubifs/debug.c:2009: warning: format '%lu' expects type 'long unsigned int', but argument 2 has type 'ino_t'

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-11-06 11:06:31 +02:00
Harvey Harrison
0ecb9529a4 UBIFS: endian handling fixes and annotations
Noticed by sparse:
fs/ubifs/file.c:75:2: warning: restricted __le64 degrades to integer
fs/ubifs/file.c:629:4: warning: restricted __le64 degrades to integer
fs/ubifs/dir.c:431:3: warning: restricted __le64 degrades to integer

This should be checked to ensure the ubifs_assert is working as
intended, I've done the suggested annotation in this patch.

fs/ubifs/sb.c:298:6: warning: incorrect type in assignment (different base types)
fs/ubifs/sb.c:298:6:    expected int [signed] [assigned] tmp
fs/ubifs/sb.c:298:6:    got restricted __le64 [usertype] <noident>
fs/ubifs/sb.c:299:19: warning: incorrect type in assignment (different base types)
fs/ubifs/sb.c:299:19:    expected restricted __le64 [usertype] atime_sec
fs/ubifs/sb.c:299:19:    got int [signed] [assigned] tmp
fs/ubifs/sb.c:300:19: warning: incorrect type in assignment (different base types)
fs/ubifs/sb.c:300:19:    expected restricted __le64 [usertype] ctime_sec
fs/ubifs/sb.c:300:19:    got int [signed] [assigned] tmp
fs/ubifs/sb.c:301:19: warning: incorrect type in assignment (different base types)
fs/ubifs/sb.c:301:19:    expected restricted __le64 [usertype] mtime_sec
fs/ubifs/sb.c:301:19:    got int [signed] [assigned] tmp

This looks like a bugfix as your tmp was a u32 so there was truncation in
the atime, mtime, ctime value, probably not intentional, add a tmp_le64
and use it here.

fs/ubifs/key.h:348:9: warning: cast to restricted __le32
fs/ubifs/key.h:348:9: warning: cast to restricted __le32
fs/ubifs/key.h:419:9: warning: cast to restricted __le32

Read from the annotated union member instead.

fs/ubifs/recovery.c:175:13: warning: incorrect type in assignment (different base types)
fs/ubifs/recovery.c:175:13:    expected unsigned int [unsigned] [usertype] save_flags
fs/ubifs/recovery.c:175:13:    got restricted __le32 [usertype] flags
fs/ubifs/recovery.c:186:13: warning: incorrect type in assignment (different base types)
fs/ubifs/recovery.c:186:13:    expected restricted __le32 [usertype] flags
fs/ubifs/recovery.c:186:13:    got unsigned int [unsigned] [usertype] save_flags

Do byteshifting at compile time of the flag value.  Annotate the saved_flags
as le32.

fs/ubifs/debug.c:368:10: warning: cast to restricted __le32
fs/ubifs/debug.c:368:10: warning: cast from restricted __le64

Should be checked if the truncation was intentional, I've changed the
printk to print the full width.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-11-06 11:06:19 +02:00
Artem Bityutskiy
069782a1ee UBIFS: remove printk
Remove the "UBIFS background thread ubifs_bgd0_0 started" message.
We kill the background thread when we switch to R/O mode, and
start it again whan we switch to R/W mode. OLPC is doing this
many times during boot, and we see this message many times as
well, which is irritating. So just kill the message.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-11-06 11:06:14 +02:00
Tejun Heo
89f97496e8 block: fix __blkdev_get() for removable devices
Commit 0762b8bde9 moved disk_get_part()
in front of recursive get on the whole disk, which caused removable
devices to try disk_get_part() before rescanning after a new media is
inserted, which might fail legit open attempts or give the old
partition.

This patch fixes the problem by moving disk_get_part() after
__blkdev_get() on the whole disk.

This problem was spotted by Borislav Petkov.

Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-11-06 08:41:56 +01:00
Geert Uytterhoeven
dc8a0843a4 [JFFS2] fix race condition in jffs2_lzo_compress()
deflate_mutex protects the globals lzo_mem and lzo_compress_buf.  However,
jffs2_lzo_compress() unlocks deflate_mutex _before_ it has copied out the
compressed data from lzo_compress_buf.  Correct this by moving the mutex
unlock after the copy.

In addition, document what deflate_mutex actually protects.

Cc: stable@kernel.org
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: Richard Purdie <rpurdie@openedhand.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2008-11-05 23:22:02 +01:00
Theodore Ts'o
14ce0cb411 ext4: wait on all pending commits in ext4_sync_fs()
In ext4_sync_fs, we only wait for a commit to finish if we started it,
but there may be one already in progress which will not be synced.

In the case of a data=ordered umount with pending long symlinks which
are delayed due to a long list of other I/O on the backing block
device, this causes the buffer associated with the long symlinks to
not be moved to the inode dirty list in the second phase of
fsync_super.  Then, before they can be dirtied again, kjournald exits,
seeing the UMOUNT flag and the dirty pages are never written to the
backing block device, causing long symlink corruption and exposing new
or previously freed block data to userspace.

To ensure all commits are synced, we flush all journal commits now
when sync_fs'ing ext4.

Signed-off-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
2008-11-03 18:10:55 -05:00
Aneesh Kumar K.V
d94e99a64c ext4: Convert to host order before using the values.
Use le16_to_cpu to read the s_reserved_gdt_blocks values
from super block.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-11-04 09:11:26 -05:00
Aneesh Kumar K.V
ae2d9fb18e ext4: fix missing ext4_unlock_group in error path
If we try to free a block which is already freed, the code was
returning without first unlocking the group.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-11-04 09:10:50 -05:00
Steve French
c527c8a7ff [CIFS] Can't rely on iov length and base when kernel_recvmsg returns error
When retrying kernel_recvmsg, reset iov_base and iov_len.

Note comment from Sridhar: "In the normal path, iov.iov_len is clearly set to 4. But i think you are
running into a case where kernel_recvmsg() is called via 'goto incomplete_rcv'
It happens if the previous call fails with EAGAIN.
If you want to call recvmsg() after EAGAIN failure, you need to reset iov."

Signed-off-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-11-03 20:46:21 +00:00
Linus Torvalds
a75952b72a Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: fix renaming one hardlink on top of another
  [CIFS] fix error in smb_send2
  [CIFS] Reduce number of socket retries in large write path
2008-11-03 11:43:59 -08:00
Jeff Layton
ae6884a9da cifs: fix renaming one hardlink on top of another
cifs: fix renaming one hardlink on top of another

POSIX says that renaming one hardlink on top of another to the same
inode is a no-op. We had the logic mostly right, but forgot to clear
the return code.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-11-03 18:31:05 +00:00
Linus Torvalds
c8126cc602 Merge branch 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc
* 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc:
  proc: revert /proc/uptime to ->read_proc hook
2008-11-03 09:59:01 -08:00
Sami Liedes
2423840ded jbd2: deregister proc on failure in jbd2_journal_init_inode
jbd2_journal_init_inode() does not call jbd2_stats_proc_exit() on all
failure paths after calling jbd2_stats_proc_init(). This leaves
dangling references to the fs in proc.

This patch fixes a bug reported by Sami Leides at:
http://bugzilla.kernel.org/show_bug.cgi?id=11493

Signed-off-by: Sami Liedes <sliedes@cc.hut.fi>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-11-02 19:23:30 -05:00
Theodore Ts'o
8c3f25d895 jbd2: don't give up looking for space so easily in __jbd2_log_wait_for_space
Commit 23f8b79e introducd a regression because it assumed that if
there were no transactions ready to be checkpointed, that no progress
could be made on making space available in the journal, and so the
journal should be aborted.  This assumption is false; it could be the
case that simply calling jbd2_cleanup_journal_tail() will recover the
necessary space, or, for small journals, the currently committing
transaction could be responsible for chewing up the required space in
the log, so we need to wait for the currently committing transaction
to finish before trying to force a checkpoint operation.

This patch fixes a bug reported by Mihai Harpau at:
https://bugzilla.redhat.com/show_bug.cgi?id=469582

This patch fixes a bug reported by François Valenduc at:
http://bugzilla.kernel.org/show_bug.cgi?id=11840

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Duane Griffin <duaneg@dghda.com>
Cc: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
2008-11-06 22:38:07 -05:00
Theodore Ts'o
e219cca082 jbd: don't give up looking for space so easily in __log_wait_for_space
Commit be07c4ed introducd a regression because it assumed that if
there were no transactions ready to be checkpointed, that no progress
could be made on making space available in the journal, and so the
journal should be aborted.  This assumption is false; it could be the
case that simply calling cleanup_journal_tail() will recover the
necessary space, or, for small journals, the currently committing
transaction could be responsible for chewing up the required space in
the log, so we need to wait for the currently committing transaction
to finish before trying to force a checkpoint operation.

This patch fixes the bug reported by Meelis Roos at:
http://bugzilla.kernel.org/show_bug.cgi?id=11937

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Duane Griffin <duaneg@dghda.com>
Cc: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
2008-11-06 22:37:59 -05:00
Al Viro
233e70f422 saner FASYNC handling on file close
As it is, all instances of ->release() for files that have ->fasync()
need to remember to evict file from fasync lists; forgetting that
creates a hole and we actually have a bunch that *does* forget.

So let's keep our lives simple - let __fput() check FASYNC in
file->f_flags and call ->fasync() there if it's been set.  And lose that
crap in ->release() instances - leaving it there is still valid, but we
don't have to bother anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-01 09:49:46 -07:00
Linus Torvalds
e06f42d6c1 Merge branch 'for-2.6.28' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.28' of git://linux-nfs.org/~bfields/linux:
  NLM: Set address family before calling nlm_host_rebooted()
  nfsd: fix failure to set eof in readdir in some situations
2008-10-31 15:44:08 -07:00
David Woodhouse
b27cf88e95 [JFFS2] Fix lack of locking in thread_should_wake()
The thread_should_wake() function trawls through the list of 'very
dirty' eraseblocks, determining whether the background GC thread should
wake. Doing this without holding the appropriate locks is a bad idea.

OLPC Trac #8615

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@kernel.org
2008-10-31 14:52:24 +00:00
Linus Torvalds
eff2502801 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  delay capable() check in ext4_has_free_blocks()
  merge ext4_claim_free_blocks & ext4_has_free_blocks
  jbd2: Call the commit callback before the transaction could get dropped
  ext4: fix a bug accessing freed memory in ext4_abort
  ext3: fix a bug accessing freed memory in ext3_abort
2008-10-31 07:52:12 -07:00
Harvey Harrison
be85940548 fs: replace NIPQUAD()
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u
can be replaced with %pI4

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31 00:56:28 -07:00
David S. Miller
a1744d3bee Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/wireless/p54/p54common.c
2008-10-31 00:17:34 -07:00
David Howells
91b7771251 CRED: Wrap task credential accesses in the XFS filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
2008-10-31 15:50:04 +11:00
Chuck Lever
d7dc61d0a7 NLM: Set address family before calling nlm_host_rebooted()
The nlm_host_rebooted() function uses nlm_cmp_addr() to find an
nsm_handle that matches the rebooted peer.  In order for this to work,
the passed-in address must have a proper address family.

This fixes a post-2.6.28 regression introduced by commit 781b61a6, which
added AF_INET6 support to nlm_cmp_addr().  Before that commit,
nlm_cmp_addr() didn't care about the address family; it compared only
the sin_addr.s_addr field for equality.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-10-30 17:19:30 -04:00
J. Bruce Fields
8d7c4203c6 nfsd: fix failure to set eof in readdir in some situations
Before 14f7dd6320 "[PATCH] Copy XFS
readdir hack into nfsd code", readdir_cd->err was reset to eof before
each call to vfs_readdir; afterwards, it is set only once.  Similarly,
c002a6c797 "[PATCH] Optimise NFS readdir
hack slightly", can cause us to exit without nfserr_eof set.  Fix this.

This ensures the "eof" bit is set when needed in readdir replies.  (The
particular case I saw was an nfsv4 readdir of an empty directory, which
returned with no entries (the protocol requires "." and ".." to be
filtered out), but with eof unset.)

Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-10-30 17:16:49 -04:00
Steve French
61de800d33 [CIFS] fix error in smb_send2
smb_send2 exit logic was strange, and with the previous change
could cause us to fail large
smb writes when all of the smb was not sent as one chunk.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-10-30 20:15:22 +00:00
Linus Torvalds
1b2d3d94ec Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  SUNRPC: Fix potential race in put_rpccred()
  SUNRPC: Fix rpcauth_prune_expired
  NFS: Convert nfs_attr_generation_counter into an atomic_long
  SUNRPC: Respond promptly to server TCP resets
2008-10-30 12:51:42 -07:00
Randy Dunlap
e74481e232 fs: remove excess kernel-doc
Delete excess kernel-doc notation in fs/ subdirectory:

Warning(linux-2.6.27-git10//fs/jbd/transaction.c:886): Excess function parameter or struct member 'credits' description in 'journal_get_undo_access'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-30 11:38:46 -07:00
Eric Sandeen
87b811c3f9 ecryptfs: fix memory corruption when storing crypto info in xattrs
When ecryptfs allocates space to write crypto headers into, before copying
it out to file headers or to xattrs, it looks at the value of
crypt_stat->num_header_bytes_at_front to determine how much space it
needs.  This is also used as the file offset to the actual encrypted data,
so for xattr-stored crypto info, the value was zero.

So, we kzalloc'd 0 bytes, and then ran off to write to that memory.
(Which returned as ZERO_SIZE_PTR, so we explode quickly).

The right answer is to always allocate a page to write into; the current
code won't ever write more than that (this is enforced by the
(PAGE_CACHE_SIZE - offset) length in the call to
ecryptfs_generate_key_packet_set).  To be explicit about this, we now send
in a "max" parameter, rather than magically using PAGE_CACHE_SIZE there.

Also, since the pointer we pass down the callchain eventually gets the
virt_to_page() treatment, we should be using a alloc_page variant, not
kzalloc (see also 7fcba05437)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-30 11:38:46 -07:00
Nick Piggin
4e02ed4b4a fs: remove prepare_write/commit_write
Nothing uses prepare_write or commit_write. Remove them from the tree
completely.

[akpm@linux-foundation.org: schedule simple_prepare_write() for unexporting]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-30 11:38:45 -07:00
David Chinner
6bfb3d065f [XFS] Fix race when looking up reclaimable inodes
If we get a race looking up a reclaimable inode, we can end up with the
winner proceeding to use the inode before it has been completely
re-initialised. This is a Bad Thing.

Fix the race by checking whether we are still initialising the inod eonce
we have a reference to it, and if so wait for the initialisation to
complete before continuing.

While there, fix a leaked reference count in the same code when
encountering an unlinked inode and we are not doing a lookup for a create
operation.

SGI-PV: 987246

SGI-Modid: xfs-linux-melb:xfs-kern:32429a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 18:32:43 +11:00
Tim Shimmin
e0b8e8b65d [XFS] remove restricted chown parameter from xfs linux
On Linux all filesystems are supposed to be operating under Posix'
restricted chown. Restricted chown means it restricts chown to the owner
unless you have CAP_FOWNER.

NOTE: that 2 files outside of fs/xfs have been modified too for this
change.

Reviewed-by: Dave Chinner <david@fromorbit.com>

SGI-PV: 988919

SGI-Modid: xfs-linux-melb:xfs-kern:32413a

Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 18:30:48 +11:00
Christoph Hellwig
ea5a3dc835 [XFS] kill sys_cred
capable_cred has been unused for a while so we can kill it and sys_cred.
That also means the cred argument to xfs_setattr and xfs_change_file_space
can be removed now.

SGI-PV: 988918

SGI-Modid: xfs-linux-melb:xfs-kern:32412a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 18:27:48 +11:00
David Chinner
7ee49acfe5 [XFS] correctly select first log item to push
Under heavy metadata load we are seeing log hangs. The AIL has items in it
ready to be pushed, and they are within the push target window. However,
we are not pushing them when the last pushed LSN is less than the LSN of
the first log item on the AIL. This is a regression introduced by the AIL
push cursor modifications.

SGI-PV: 987246

SGI-Modid: xfs-linux-melb:xfs-kern:32409a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-10-30 18:26:51 +11:00
Christoph Hellwig
9ed0451ee0 [XFS] free partially initialized inodes using destroy_inode
To make sure we free the security data inodes need to be freed using the
proper VFS helper (which we also need to export for this). We mark these
inodes bad so we can skip the flush path for them.

SGI-PV: 987246

SGI-Modid: xfs-linux-melb:xfs-kern:32398a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 18:26:04 +11:00
Christoph Hellwig
087e3b0460 Inode: export symbol destroy_inode
To make sure we free the security data inodes need to be freed using
the proper VFS helper (which we also need to export for this). We mark
these inodes bad so we can skip the flush path for them.

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 18:24:37 +11:00
Christoph Hellwig
c679eef052 [XFS] stop using xfs_itobp in xfs_bulkstat
xfs_bulkstat only wants the dinode, offset and buffer from a given inode
number. Instead of using xfs_itobp on a fake inode which is complicated
and currently leads to leaks of the security data just use xfs_inotobp
which is designed to do exactly the kind of lookup xfs_bulkstat wants. The
only thing that's missing in xfs_inotobp is a flags paramter that let's us
pass down XFS_IMAP_BULKSTAT, but that can easily added.

SGI-PV: 987246

SGI-Modid: xfs-linux-melb:xfs-kern:32397a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 18:04:13 +11:00
David Chinner
455486b9cc [XFS] avoid all reclaimable inodes in xfs_sync_inodes_ag
If we are syncing data in xfs_sync_inodes_ag(), the VFS inode must still
be referencable as the dirty data state is carried on the VFS inode. hence
if we can't get a reference via igrab(), the inode must be in reclaim
which implies that it has no dirty data attached.

Leave such inodes to the reclaim code to flush the dirty inode state to
disk and so avoid attempting to access the VFS inode when it may not exist
in xfs_sync_inodes_ag().

Version 4:
o don't reference linux inode until after igrab() succeeds

Version 3:
o converted unlock/rele to an xfs_iput() call.

Version 2:
o change igrab logic to be more linear
o remove initial reclaimable inode check now that we are using
  igrab() failure to find reclaimable inodes
o assert that igrab failure occurs only on reclaimable inodes
o clean up inode locking - only grab the iolock if we are doing
  a SYNC_DELWRI call and we have a dirty inode.

SGI-PV: 987246

SGI-Modid: xfs-linux-melb:xfs-kern:32391a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Peter Leckie <pleckie@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 18:03:14 +11:00
David Chinner
56e73ec47d [XFS] Can't lock inodes in radix tree preload region
When we are inside a radix tree preload region, we cannot sleep. Recently
we moved the inode locking inside the preload region for the inode radix
tree. Fix that, and fix a missed unlock in another error path in the same
code at the same time.

SGI-PV: 987246

SGI-Modid: xfs-linux-melb:xfs-kern:32385a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:55:27 +11:00
Christoph Hellwig
2b7035fd74 [XFS] Trivial xfs_remove comment fixup
The dp to ip comment should be for the unconditional xfs_droplink call,
and the "." link obviously only exists for directories, so it should be in
the is_dir conditional.

SGI-PV: 987246

SGI-Modid: xfs-linux-melb:xfs-kern:32374a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 17:55:18 +11:00
Christoph Hellwig
1ec7944beb [XFS] fix biosize option
iosizelog shouldn't be the same as iosize but the logarithm of it. Then
again the current biosize option doesn't make much sense to me as it
doesn't set the preferred I/O size as mentioned in the comment next to it
but rather the allocation size and thus is identical to the allocsize
option (except for the missing logarithm). It's also not documented in
Documentation/filesystems/xfs.txt or the mount manpage.

SGI-PV: 987246

SGI-Modid: xfs-linux-melb:xfs-kern:32373a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 17:55:08 +11:00
Christoph Hellwig
469fc23d5d [XFS] fix the noquota mount option
Noquota should clear all mount options, and not just user and group quota.
Probably doesn't matter very much in real life.

SGI-PV: 987246

SGI-Modid: xfs-linux-melb:xfs-kern:32372a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 17:54:57 +11:00
Christoph Hellwig
9d565ffa33 [XFS] kill struct xfs_mount_args
No need to parse the mount option into a structure before applying it to
struct xfs_mount.

The content of xfs_start_flags gets merged into xfs_parseargs. Calls
inbetween don't care and can use mount members instead of the args struct.

This patch uncovered that the mount option for shared filesystems wasn't
ever exposed on Linux. The code to handle it is #if 0'ed in this patch
pending a decision on this feature. I'll send a writeup about it to the
list soon.

SGI-PV: 987246

SGI-Modid: xfs-linux-melb:xfs-kern:32371a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 17:53:24 +11:00
David Chinner
5a792c4579 [XFS] XFS: Check for valid transaction headers in recovery
When we are about to add a new item to a transaction in recovery, we need
to check that it is valid first. Currently we just assert that header
magic number matches, but in production systems that is not present and we
add a corrupted transaction to the list to be processed. This results in a
kernel oops later when processing the corrupted transaction.

Instead, if we detect a corrupted transaction, abort recovery and leave
the user to clean up the mess that has occurred.

SGI-PV: 988145

SGI-Modid: xfs-linux-melb:xfs-kern:32356a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 17:40:09 +11:00
David Chinner
783a2f656f [XFS] Finish removing the mount pointer from the AIL API
Change all the remaining AIL API functions that are passed struct
xfs_mount pointers to pass pointers directly to the struct xfs_ail being
used. With this conversion, all external access to the AIL is via the
struct xfs_ail. Hence the operation and referencing of the AIL is almost
entirely independent of the xfs_mount that is using it - it is now much
more tightly tied to the log and the items it is tracking in the log than
it is tied to the xfs_mount.

SGI-PV: 988143

SGI-Modid: xfs-linux-melb:xfs-kern:32353a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:39:58 +11:00
David Chinner
fc1829f34d [XFS] Add ail pointer into log items
Add an xfs_ail pointer to log items so that the log items can reference
the AIL directly during callbacks without needed a struct xfs_mount.

SGI-PV: 988143

SGI-Modid: xfs-linux-melb:xfs-kern:32352a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:39:46 +11:00
David Chinner
a9c21c1b9d [XFS] Given the log a pointer to the AIL
When we need to go from the log to the AIL, we have to go via the
xfs_mount. Add a xfs_ail pointer to the log so we can go directly to the
AIL associated with the log.

SGI-PV: 988143

SGI-Modid: xfs-linux-melb:xfs-kern:32351a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:39:35 +11:00
David Chinner
c7e8f26827 [XFS] Move the AIL lock into the struct xfs_ail
Bring the ail lock inside the struct xfs_ail. This means the AIL can be
entirely manipulated via the struct xfs_ail rather than needing both the
struct xfs_mount and the struct xfs_ail.

SGI-PV: 988143

SGI-Modid: xfs-linux-melb:xfs-kern:32350a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:39:23 +11:00
David Chinner
7b2e2a31f5 [XFS] Allow 64 bit machines to avoid the AIL lock during flushes
When copying lsn's from the log item to the inode or dquot flush lsn, we
currently grab the AIL lock. We do this because the LSN is a 64 bit
quantity and it needs to be read atomically. The lock is used to guarantee
atomicity for 32 bit platforms.

Make the LSN copying a small function, and make the function used
conditional on BITS_PER_LONG so that 64 bit machines don't need to take
the AIL lock in these places.

SGI-PV: 988143

SGI-Modid: xfs-linux-melb:xfs-kern:32349a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:39:12 +11:00
David Chinner
5b00f14fbd [XFS] move the AIl traversal over to a consistent interface
With the new cursor interface, it makes sense to make all the traversing
code use the cursor interface and make the old one go away. This means
more of the AIL interfacing is done by passing struct xfs_ail pointers
around the place instead of struct xfs_mount pointers.

We can replace the use of xfs_trans_first_ail() in xfs_log_need_covered()
as it is only checking if the AIL is empty. We can do that with a call to
xfs_trans_ail_tail() instead, where a zero LSN returned indicates and
empty AIL...

SGI-PV: 988143

SGI-Modid: xfs-linux-melb:xfs-kern:32348a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:39:00 +11:00
David Chinner
27d8d5fe0e [XFS] Use a cursor for AIL traversal.
To replace the current generation number ensuring sanity of the AIL
traversal, replace it with an external cursor that is linked to the AIL.

Basically, we store the next item in the cursor whenever we want to drop
the AIL lock to do something to the current item. When we regain the lock.
the current item may already be free, so we can't reference it, but the
next item in the traversal is already held in the cursor.

When we move or delete an object, we search all the active cursors and if
there is an item match we clear the cursor(s) that point to the object.
This forces the traversal to restart transparently.

We don't invalidate the cursor on insert because the cursor still points
to a valid item. If the intem is inserted between the current item and the
cursor it does not matter; the traversal is considered to be past the
insertion point so it will be picked up in the next traversal.

Hence traversal restarts pretty much disappear altogether with this method
of traversal, which should substantially reduce the overhead of pushing on
a busy AIL.

Version 2 o add restart logic o comment cursor interface o minor cleanups

SGI-PV: 988143

SGI-Modid: xfs-linux-melb:xfs-kern:32347a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:38:39 +11:00
David Chinner
82fa901245 [XFS] Allocate the struct xfs_ail
Rather than embedding the struct xfs_ail in the struct xfs_mount, allocate
it during AIL initialisation. Add a back pointer to the struct xfs_ail so
that we can pass around the xfs_ail and still be able to access the
xfs_mount if need be. This is th first step involved in isolating the AIL
implementation from the surrounding filesystem code.

SGI-PV: 988143

SGI-Modid: xfs-linux-melb:xfs-kern:32346a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:38:26 +11:00
David Chinner
a7444053fb [XFS] Account for allocated blocks when expanding directories
When we create a directory, we reserve a number of blocks for the maximum
possible expansion of of the directory due to various btree splits,
freespace allocation, etc. Unfortunately, each allocation is not reflected
in the total number of blocks still available to the transaction, so the
maximal reservation is used over and over again.

This leads to problems where an allocation group has only enough blocks
for *some* of the allocations required for the directory modification.
After the first N allocations, the remaining blocks in the allocation
group drops below the total reservation, and subsequent allocations fail
because the allocator will not allow the allocation to proceed if the AG
does not have the enough blocks available for the entire allocation total.

This results in an ENOSPC occurring after an allocation has already
occurred. This results in aborting the directory operation (leaving the
directory in an inconsistent state) and cancelling a dirty transaction,
which results in a filesystem shutdown.

Avoid the problem by reflecting the number of blocks allocated in any
directory expansion in the total number of blocks available to the
modification in progress. This prevents a directory modification from
being aborted part way through with an ENOSPC.

SGI-PV: 988144

SGI-Modid: xfs-linux-melb:xfs-kern:32340a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 17:38:12 +11:00
David Chinner
8c38ab0320 [XFS] Prevent looping in xfs_sync_inodes_ag
If the last block of the AG has inodes in it and the AG is an exactly
power-of-2 size then the last inode in the AG points to the last block in
the AG. If we try to find the next inode in the AG by adding one to the
inode number, we increment the inode number past the size of the AG. The
result is that the macro XFS_INO_TO_AGINO() will strip the AG portion of
the inode number and return an inode number of zero.

That is, instead of terminating the lookup loop because we hit the inode
number went outside the valid range for the AG, the search index returns
to zero and we start traversing the radix tree from the start again. This
results in an endless loop in xfs_sync_inodes_ag().

Fix it be detecting if the new search index decreases as a result of
incrementing the current inode number. That indicate an overflow and hence
that we have finished processing the AG so we can terminate the loop.

SGI-PV: 988142

SGI-Modid: xfs-linux-melb:xfs-kern:32335a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:38:00 +11:00
David Chinner
116545130c [XFS] kill deleted inodes list
Now that the deleted inodes list is unused, kill it. This also removes the
i_reclaim list head from the xfs_inode, shrinking it by two pointers.

SGI-PV: 988142

SGI-Modid: xfs-linux-melb:xfs-kern:32334a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:37:49 +11:00
David Chinner
7a3be02bae [XFS] use the inode radix tree for reclaiming inodes
Use the reclaim tag to walk the radix tree and find the inodes under
reclaim. This was the only user of the deleted inode list.

SGI-PV: 988142

SGI-Modid: xfs-linux-melb:xfs-kern:32333a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:37:37 +11:00
David Chinner
396beb8531 [XFS] mark inodes for reclaim via a tag in the inode radix tree
Prepare for removing the deleted inode list by marking inodes for reclaim
in the inode radix trees so that we can use the radix trees to find
reclaimable inodes.

SGI-PV: 988142

SGI-Modid: xfs-linux-melb:xfs-kern:32331a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:37:26 +11:00
David Chinner
1dc3318ae1 [XFS] rename inode reclaim functions
The function names xfs_finish_reclaim and xfs_finish_reclaim_all are not
very descriptive of what they are reclaiming. Rename to
xfs_reclaim_inode[s] to match the xfs_sync_inodes() function.

SGI-PV: 988142

SGI-Modid: xfs-linux-melb:xfs-kern:32330a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:37:15 +11:00
David Chinner
fce08f2f3b [XFS] move inode reclaim functions to xfs_sync.c
Background inode reclaim is run by the xfssyncd. Move the reclaim worker
functions to be close to the sync code as the are very similar in
structure and are both run from the same background thread.

SGI-PV: 988142

SGI-Modid: xfs-linux-melb:xfs-kern:32329a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:37:03 +11:00
Lachlan McIlroy
493dca6178 [XFS] Fix build warning - xfs_fs_alloc_inode() needs a return statement
SGI-PV: 988141

SGI-Modid: xfs-linux-melb:xfs-kern:32325a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 17:36:52 +11:00
David Chinner
99fa8cb3c5 [XFS] Prevent use-after-free caused by synchronous inode reclaim
With the combined linux and XFS inode, we need to ensure that the combined
structure is not freed before the generic code is finished with the inode.
As it turns out, there is a case where the XFS inode is freed before the
linux inode - when xfs_reclaim() is called from ->clear_inode() on a clean
inode, the xfs inode is freed during that call. The generic code
references the inode after the ->clear_inode() call, so this is a use
after free situation.

Fix the problem by moving the xfs_reclaim() call to ->destroy_inode()
instead of in ->clear_inode(). This ensures the combined inode structure
is not freed until after the generic code has finished with it.

SGI-PV: 988141

SGI-Modid: xfs-linux-melb:xfs-kern:32324a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:36:40 +11:00
David Chinner
bf904248a2 [XFS] Combine the XFS and Linux inodes
To avoid issues with different lifecycles of XFS and Linux inodes, embedd
the linux inode inside the XFS inode. This means that the linux inode has
the same lifecycle as the XFS inode, even when it has been released by the
OS. XFS inodes don't live much longer than this (a short stint in reclaim
at most), so there isn't significant memory usage penalties here.

Version 3 o kill xfs_icount()

Version 2 o remove unused commented out code from xfs_iget(). o kill
useless cast in VFS_I()

SGI-PV: 988141

SGI-Modid: xfs-linux-melb:xfs-kern:32323a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:36:14 +11:00
David Chinner
8290c35f87 Inode: Allow external list initialisation
To allow XFS to combine the XFS and linux inodes into a single
structure, we need to drive inode lookup from the XFS inode cache,
not the generic inode cache. This means that we need initialise a
struct inode from a context outside alloc_inode() as it is no longer
used by XFS.

After inode allocation and initialisation, we need to add the inode
to the superblock list, the in-use list, hash it and do some
accounting. This all needs to be done with the inode_lock held and
there are already several places in fs/inode.c that do this list
manipulation.  Factor out the common code, add a locking wrapper and
export the function so ti can be called from XFS.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 17:35:24 +11:00
David Chinner
2cb1599f9b Inode: Allow external initialisers
To allow XFS to combine the XFS and linux inodes into a single
structure, we need to drive inode lookup from the XFS inode cache,
not the generic inode cache. This means that we need initialise a
struct inode from a context outside alloc_inode() as it is no longer
used by XFS.

Factor and export the struct inode initialisation code from
alloc_inode() to inode_init_always() as a counterpart to
inode_init_once().  i.e. we have to call this init function for each
inode instantiation (always), as opposed inode_init_once() which is
only called on slab object instantiation (once).

Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 17:32:23 +11:00
David Chinner
94b97e39b0 [XFS] Never call mark_inode_dirty_sync() directly
Once the Linux inode and the XFS inode are combined, we cannot rely on
just check if the linux inode exists as a method of determining if it is
valid or not. Hence we should always call xfs_mark_inode_dirty_sync()
instead as it does the correct checks to determine if the liinux inode is
in a valid state or not.

SGI-PV: 988141

SGI-Modid: xfs-linux-melb:xfs-kern:32318a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:21:30 +11:00
David Chinner
6441e54915 [XFS] factor xfs_iget_core() into hit and miss cases
There are really two cases in xfs_iget_core(). The first is the cache hit
case, the second is the miss case. They share very little code, and hence
can easily be factored out into separate functions. This makes the code
much easier to understand and subsequently modify.

SGI-PV: 988141

SGI-Modid: xfs-linux-melb:xfs-kern:32317a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:21:19 +11:00
Christoph Hellwig
3471394ba5 [XFS] fix instant oops with tracing enabled
We can only read inode->i_count if the inode is actually there and not a
NULL pointer. This was introduced in one of the recent sync patches.

SGI-PV: 988255

SGI-Modid: xfs-linux-melb:xfs-kern:32315a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 17:21:10 +11:00
David Chinner
76bf105cb1 [XFS] Move remaining quiesce code.
With all the other filesystem sync code it in xfs_sync.c including the
data quiesce code, it makes sense to move the remaining quiesce code to
the same place.

SGI-PV: 988140

SGI-Modid: xfs-linux-melb:xfs-kern:32312a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:16:21 +11:00
David Chinner
a4e4c4f4a8 [XFS] Kill xfs_sync()
There are no more callers to xfs_sync() now, so remove the function
altogther.

SGI-PV: 988140

SGI-Modid: xfs-linux-melb:xfs-kern:32311a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:16:11 +11:00
David Chinner
cb56a4b995 [XFS] Kill SYNC_CLOSE
SYNC_CLOSE is only ever used and checked in conjunction with SYNC_WAIT,
and this only done in one spot. The only thing this does is make
XFS_bflush() calls to the data buftargs.

This will happen very shortly afterwards the xfs_sync() call anyway in the
unmount path via the xfs_close_devices(), so this code is redundant and
can be removed. That only user of SYNC_CLOSE is now gone, so kill the flag
completely.

SGI-PV: 988140

SGI-Modid: xfs-linux-melb:xfs-kern:32310a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:16:00 +11:00
David Chinner
e9f1c6ee12 [XFS] make SYNC_DELWRI no longer use xfs_sync
Continue to de-multiplex xfs_sync be replacing all SYNC_DELWRI callers
with direct calls functions that do the work. Isolate the data quiesce
case to a function in xfs_sync.c. Isolate the FSDATA case with explicit
calls to xfs_sync_fsdata().

Version 2: o Push delwri related log forces into xfs_sync_inodes().

SGI-PV: 988140

SGI-Modid: xfs-linux-melb:xfs-kern:32309a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:15:50 +11:00
David Chinner
be97d9d557 [XFS] make SYNC_ATTR no longer use xfs_sync
Continue to de-multiplex xfs_sync be replacing all SYNC_ATTR callers with
direct calls xfs_sync_inodes(). Add an assert into xfs_sync() to ensure we
caught all the SYNC_ATTR callers.

SGI-PV: 988140

SGI-Modid: xfs-linux-melb:xfs-kern:32308a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:15:38 +11:00
David Chinner
aacaa880bf [XFS] xfssyncd: don't call xfs_sync
Start de-multiplexing xfs_sync() by making xfs_sync_worker() call the
specific sync functions it needs. This is only a small, unique subset of
the entire xfs_sync() code so is easier to follow.

SGI-PV: 988140

SGI-Modid: xfs-linux-melb:xfs-kern:32307a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:15:29 +11:00
David Chinner
dfd837a9eb [XFS] kill xfs_syncsub
Now that the only caller is xfs_sync(), merge the two together as it makes
no sense to keep them separate.

SGI-PV: 988140

SGI-Modid: xfs-linux-melb:xfs-kern:32306a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:15:21 +11:00
David Chinner
2030b5aba8 [XFS] use xfs_sync_inodes rather than xfs_syncsub
Kill the unused arg in xfs_syncsub() and xfs_sync_inodes(). For callers of
xfs_syncsub() that only want to flush inodes, replace xfs_syncsub() with
direct calls to xfs_sync_inodes() as that is all that is being done with
the specific flags being passed in.

SGI-PV: 988140

SGI-Modid: xfs-linux-melb:xfs-kern:32305a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:15:12 +11:00
David Chinner
bc60a99323 [XFS] Use struct inodes instead of vnodes to kill vn_grab
With the sync code relocated to the linux-2.6 directory we can use struct
inodes directly. If we do the same thing for the quota release code, we
can remove vn_grab altogether. While here, convert the VN_BAD() checks to
is_bad_inode() so we can remove vnodes entirely from this code.

SGI-PV: 988140

SGI-Modid: xfs-linux-melb:xfs-kern:32304a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:15:03 +11:00
Christoph Hellwig
2af75df7be [XFS] split out two helpers from xfs_syncsub
Split out two helpers from xfs_syncsub for the dummy log commit and the
superblock writeout.

SGI-PV: 988140

SGI-Modid: xfs-linux-melb:xfs-kern:32303a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 17:14:53 +11:00
Christoph Hellwig
4e8938feba [XFS] Move XFS_BMAP_SANITY_CHECK out of line.
Move the XFS_BMAP_SANITY_CHECK macro out of line and make it a properly
typed function. Also pass the xfs_buf for the btree block instead of just
the btree block header, as we will need some additional information for it
to implement CRC checking of btree blocks.

SGI-PV: 988146

SGI-Modid: xfs-linux-melb:xfs-kern:32301a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 17:14:43 +11:00
Christoph Hellwig
7cc95a821d [XFS] Always use struct xfs_btree_block instead of short / longform
structures.

Always use the generic xfs_btree_block type instead of the short / long
structures. Add XFS_BTREE_SBLOCK_LEN / XFS_BTREE_LBLOCK_LEN defines for
the length of a short / long form block. The rationale for this is that we
will grow more btree block header variants to support CRCs and other RAS
information, and always accessing them through the same datatype with
unions for the short / long form pointers makes implementing this much
easier.

SGI-PV: 988146

SGI-Modid: xfs-linux-melb:xfs-kern:32300a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 17:14:34 +11:00
Christoph Hellwig
136341b41a [XFS] cleanup btree record / key / ptr addressing macros.
Replace the generic record / key / ptr addressing macros that use cpp
token pasting with simpler macros that do the job for just one given btree
type. The new macros lose the cur argument and thus can be used outside
the core btree code, but also gain an xfs_mount * argument to allow for
checking the CRC flag in the near future. Note that many of these macros
aren't actually used in the kernel code, but only in userspace (mostly in
xfs_repair).

SGI-PV: 988146

SGI-Modid: xfs-linux-melb:xfs-kern:32295a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 17:11:40 +11:00
David Chinner
6c7699c047 [XFS] remove the mount inode list
Now we've removed all users of the mount inode list, we can kill it. This
reduces the size of the xfs_inode by 2 pointers.

SGI-PV: 988139

SGI-Modid: xfs-linux-melb:xfs-kern:32293a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:11:29 +11:00
Christoph Hellwig
60197e8df3 [XFS] Cleanup maxrecs calculation.
Clean up the way the maximum and minimum records for the btree blocks are
calculated. For the alloc and inobt btrees all the values are
pre-calculated in xfs_mount_common, and we switch the current loop around
the ugly generic macros that use cpp token pasting to generate type names
to two small helpers in normal C code. For the bmbt and bmdr trees these
helpers also exist, but can be called during runtime, too. Here we also
kill various macros dealing with them and inline the logic into the
get_minrecs / get_maxrecs / get_dmaxrecs methods in xfs_bmap_btree.c.

Note that all these new helpers take an xfs_mount * argument which will be
needed to determine the size of a btree block once we add support for
extended btree blocks with CRCs and other RAS information.

SGI-PV: 988146

SGI-Modid: xfs-linux-melb:xfs-kern:32292a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 17:11:19 +11:00
David Chinner
5b4d89ae0f [XFS] Traverse inode trees when releasing dquots
Make releasing all inode dquots traverse the per-ag inode radix trees
rather than the mount inode list. This removes another user of the mount
inode list.

Version 3 o fix comment relating to avoiding trying to release the

quota inodes and those in reclaim.

Version 2 o add comment explaining use of gang lookups for a single inode
o use IRELE, not VN_RELE o move check for ag initialisation to caller.

SGI-PV: 988139

SGI-Modid: xfs-linux-melb:xfs-kern:32291a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:08:03 +11:00
David Chinner
683a897080 [XFS] Use the inode tree for finding dirty inodes
Update xfs_sync_inodes to walk the inode radix tree cache to find dirty
inodes. This removes a huge bunch of nasty, messy code for traversing the
mount inode list safely and removes another user of the mount inode list.

Version 3 o rediff against new linux-2.6/xfs_sync.c code

Version 2 o add comment explaining use of gang lookups for a single inode
o use IRELE, not VN_RELE o move check for ag initialisation to caller.

SGI-PV: 988139

SGI-Modid: xfs-linux-melb:xfs-kern:32290a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:07:29 +11:00
David Chinner
2f8a3ce1c2 [XFS] don't block in xfs_qm_dqflush() during async writeback.
Normally dquots are written back via delayed write mechanisms. They are
flushed to their backing buffer by xfssyncd, which is then pushed out by
either AIL or xfsbufd flushing. The flush from the xfssyncd is supposed to
be non-blocking, but xfs_qm_dqflush() always waits for pinned duots, which
means that it will block for the length of time it takes to do a
synchronous log force. This causes unnecessary extra log I/O to be issued
whenever we try to flush a busy dquot.

Avoid the log forces and blocking xfssyncd by making xfs_qm_dqflush() pay
attention to what type of sync it is doing when it sees a pinned dquot and
not waiting when doing non-blocking flushes.

SGI-PV: 988147

SGI-Modid: xfs-linux-melb:xfs-kern:32287a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Peter Leckie <pleckie@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 17:07:20 +11:00
David Chinner
75c68f411b [XFS] Remove xfs_iflush_all and clean up xfs_finish_reclaim_all()
xfs_iflush_all() walks the m_inodes list to find inodes that need
reclaiming. We already have such a list - the m_del_inodes list. Replace
xfs_iflush_all() with a call to xfs_finish_reclaim_all() and clean up
xfs_finish_reclaim_all() to handle the different flush modes now needed.

Originally based on a patch from Christoph Hellwig.

Version 3 o rediff against new linux-2.6/xfs_sync.c code

Version 2 o revert xfs_syncsub() inode reclaim behaviour back to original

code o xfs_quiesce_fs() should use XFS_IFLUSH_DELWRI_ELSE_ASYNC, not

XFS_IFLUSH_ASYNC, to prevent change of behaviour.

SGI-PV: 988139

SGI-Modid: xfs-linux-melb:xfs-kern:32284a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:06:28 +11:00
David Chinner
a167b17e89 [XFS] move xfssyncd code to xfs_sync.c
Move all the xfssyncd code to the new xfs_sync.c file. This places it
closer to the actual code that it interacts with, rather than just being
associated with high level VFS code.

SGI-PV: 988139

SGI-Modid: xfs-linux-melb:xfs-kern:32283a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:06:18 +11:00
David Chinner
fe4fa4b8e4 [XFS] move sync code to its own file
The sync code in XFS is spread around several files. While it used to make
sense to have such a distribution, the code is about to be cleaned up and
so centralising it in one spot as the first step makes sense.

SGI-PV: 988139

SGI-Modid: xfs-linux-melb:xfs-kern:32282a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:06:08 +11:00
Barry Naujok
34519daae6 [XFS] Show buffer address with debug hexdump on corruption
SGI-PV: 987246

SGI-Modid: xfs-linux-melb:xfs-kern:32233a

Signed-off-by: Barry Naujok <bnaujok@sgi.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 17:05:58 +11:00
Barry Naujok
89b2839319 [XFS] Check agf_btreeblks is valid when reading in the AGF
SGI-PV: 987683

SGI-Modid: xfs-linux-melb:xfs-kern:32232a

Signed-off-by: Barry Naujok <bnaujok@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 17:05:49 +11:00
Barry Naujok
847fff5ca8 [XFS] Sync up kernel and user-space headers
SGI-PV: 986558

SGI-Modid: xfs-linux-melb:xfs-kern:32231a

Signed-off-by: Barry Naujok <bnaujok@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 17:05:38 +11:00
Lachlan McIlroy
24ee0e49c9 [XFS] Make xfs_btree_check_ptr() debug-only code.
SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32224a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 17:05:26 +11:00
Peter Leckie
d1de802155 [XFS] Fix build brakage from patch "Clean up dquot pincount code"
This is a fix for patch " Clean up dquot pincount code" which introduced a
build breakage due to a missing & in xfs_qm_dquot_logitem_pin.

SGI-PV: 986789

SGI-Modid: xfs-linux-melb:xfs-kern:32221a

Signed-off-by: Peter Leckie <pleckie@sgi.com>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 17:05:18 +11:00
Peter Leckie
bc3048e3cd [XFS] Clean up dquot pincount code.
This is a code cleanup and optimization that removes a per mount point
spinlock from the quota code and cleans up the code.

The patch changes the pincount from being an int protected by a spinlock
to an atomic_t allowing the pincount to be manipulated without holding the
spinlock.

This cleanup also protects against random wakup's of both the aild and
xfssyncd by reevaluating the pincount after been woken. Two latter patches
will address the Spurious wakeups.

SGI-PV: 986789

SGI-Modid: xfs-linux-melb:xfs-kern:32215a

Signed-off-by: Peter Leckie <pleckie@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 17:05:04 +11:00
Lachlan McIlroy
d112f29845 [XFS] Wait for all I/O on truncate to zero file size
It's possible to have outstanding xfs_ioend_t's queued when the file size
is zero. This can happen in the direct I/O path when a direct I/O write
fails due to ENOSPC. In this case the xfs_ioend_t will still be queued (ie
xfs_end_io_direct() does not know that the I/O failed so can't force the
xfs_ioend_t to be flushed synchronously).

When we truncate a file on unlink we don't know to wait for these
xfs_ioend_ts and we can have a use-after-free situation if the inode is
reclaimed before the xfs_ioend_t is finally processed.

As was suggested by Dave Chinner lets wait for all I/Os to complete when
truncating the file size to zero.

SGI-PV: 981668

SGI-Modid: xfs-linux-melb:xfs-kern:32216a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 16:59:06 +11:00
Christoph Hellwig
7f7c39ccb6 [XFS] make btree tracing generic
Make the existing bmap btree tracing generic so that it applies to all
btree types.

Some fragments lifted from a patch by Dave Chinner.

This adds two files that were missed from the previous btree tracing
checkin.

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32210a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:58:50 +11:00
Christoph Hellwig
3cc7524c84 [XFS] mark various functions in xfs_btree.c static
Lots of functionality in xfs_btree.c isn't needed by callers outside of
this file anymore, so mark these functions static.

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32209a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:58:41 +11:00
Christoph Hellwig
4a26e66e77 [XFS] add keys_inorder and recs_inorder btree methods
Add methods to check whether two keys/records are in the righ order. This
replaces the xfs_btree_check_key and xfs_btree_check_rec methods. For the
callers from xfs_bmap.c just opencode the bmbt-specific asserts.

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32208a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:58:32 +11:00
Christoph Hellwig
fd6bcc5b63 [XFS] kill xfs_bmbt_log_block and xfs_bmbt_log_recs
These are equivalent to the xfs_btree_* versions, and the only remaining
caller can be switched to the generic one after they are exported. Also
remove some now dead infrastructure in xfs_bmap_btree.c.

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32207a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:58:21 +11:00
Christoph Hellwig
8cc938fe42 [XFS] implement generic xfs_btree_get_rec
Not really much reason to make it generic given that it's so small, but
this is the last non-method in xfs_alloc_btree.c and xfs_ialloc_btree.c,
so it makes the whole btree implementation more structured.

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32206a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:58:11 +11:00
Christoph Hellwig
91cca5df9b [XFS] implement generic xfs_btree_delete/delrec
Make the btree delete code generic. Based on a patch from David Chinner
with lots of changes to follow the original btree implementations more
closely. While this loses some of the generic helper routines for
inserting/moving/removing records it also solves some of the one off bugs
in the original code and makes it easier to verify.

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32205a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:58:01 +11:00
Christoph Hellwig
d4b3a4b7dd [XFS] move xfs_bmbt_killroot to common code
xfs_bmbt_killroot is a mostly generic implementation of moving from a real
block based root to an inode based root. So move it to xfs_btree.c where
it can use all the nice infrastructure there and make it pointer size
agnostic

The new name for it is xfs_btree_kill_iroot, following the old naming but
making it clear we're dealing with the root in inode case here, and to
avoid confusion with xfs_btree_new_root which is used for the not inode
rooted case. I've also added a comment describing what it does and why
it's named the way it is.

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32203a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:57:51 +11:00
Christoph Hellwig
4b22a57188 [XFS] implement generic xfs_btree_insert/insrec
Make the btree insert code generic. Based on a patch from David Chinner
with lots of changes to follow the original btree implementations more
closely. While this loses some of the generic helper routines for
inserting/moving/removing records it also solves some of the one off bugs
in the original code and makes it easier to verify.

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32202a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:57:40 +11:00
Christoph Hellwig
ea77b0a66e [XFS] move xfs_bmbt_newroot to common code
xfs_bmbt_newroot is a mostly generic implementation of moving from an
inode root to a real block based root. So move it to xfs_btree.c where it
can use all the nice infrastructure there and make it pointer size
agnostic

The new name for it is xfs_btree_new_iroot, following the old naming but
making it clear we're dealing with the root in inode case here, and to
avoid confusion with xfs_btree_new_root which is used for the not inode
rooted case.

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32201a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:57:28 +11:00
Christoph Hellwig
344207ce84 [XFS] implement semi-generic xfs_btree_new_root
From: Dave Chinner <dgc@sgi.com>

Add a xfs_btree_new_root helper for the alloc and ialloc btrees. The bmap
btree needs it's own version and is not converted.

[hch: split out from bigger patch and minor adaptions]

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32200a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:57:16 +11:00
Christoph Hellwig
f5eb8e7ca5 [XFS] implement generic xfs_btree_split
Make the btree split code generic. Based on a patch from David Chinner
with lots of changes to follow the original btree implementations more
closely. While this loses some of the generic helper routines for
inserting/moving/removing records it also solves some of the one off bugs
in the original code and makes it easier to verify.

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32198a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:57:03 +11:00
Christoph Hellwig
687b890a18 [XFS] implement generic xfs_btree_lshift
Make the btree left shift code generic. Based on a patch from David
Chinner with lots of changes to follow the original btree implementations
more closely. While this loses some of the generic helper routines for
inserting/moving/removing records it also solves some of the one off bugs
in the original code and makes it easier to verify.

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32197a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:56:53 +11:00
Christoph Hellwig
9eaead51be [XFS] implement generic xfs_btree_rshift
Make the btree right shift code generic. Based on a patch from David
Chinner with lots of changes to follow the original btree implementations
more closely. While this loses some of the generic helper routines for
inserting/moving/removing records it also solves some of the one off bugs
in the original code and makes it easier to verify.

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32196a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:56:43 +11:00
Christoph Hellwig
278d0ca14e [XFS] implement generic xfs_btree_update
From: Dave Chinner <dgc@sgi.com>

The most complicated part here is the lastrec tracking for the alloc
btree. Most logic is in the update_lastrec method which has to do some
hopefully good enough dirty magic to maintain it.

[hch: split out from bigger patch and a rework of the lastrec

logic]

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32194a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:56:32 +11:00
Christoph Hellwig
38bb74237d [XFS] implement generic xfs_btree_updkey
From: Dave Chinner <dgc@sgi.com>

Note that there are many > 80 char lines introduced due to the
xfs_btree_key casts. But the places where this happens is throw-away code
once the whole btree code gets merged into a common implementation.

The same is true for the temporary xfs_alloc_log_keys define to the new
name. All old users will be gone after a few patches.

[hch: split out from bigger patch and minor adaptions]

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32193a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:56:22 +11:00
Christoph Hellwig
fe033cc848 [XFS] implement generic xfs_btree_lookup
From: Dave Chinner <dgc@sgi.com>

[hch: split out from bigger patch and minor adaptions]

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32192a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:56:09 +11:00
Christoph Hellwig
8df4da4a0a [XFS] implement generic xfs_btree_decrement
From: Dave Chinner <dgc@sgi.com>

[hch: split out from bigger patch and minor adaptions]

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32191a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:55:58 +11:00
Christoph Hellwig
637aa50f46 [XFS] implement generic xfs_btree_increment
From: Dave Chinner <dgc@sgi.com>

Because this is the first major generic btree routine this patch includes
some infrastrucure, first a few routines to deal with a btree block that
can be either in short or long form, second xfs_btree_read_buf_block,
which is the new central routine to read a btree block given a cursor, and
third the new xfs_btree_ptr_addr routine to calculate the address for a
given btree pointer record.

[hch: split out from bigger patch and minor adaptions]

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32190a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:55:45 +11:00
Christoph Hellwig
65f1eaeac0 [XFS] add helpers for addressing entities inside a btree block
Add new helpers in xfs_btree.c to find the record, key and block pointer
entries inside a btree block. To implement this genericly the
->get_maxrecs methods and two new xfs_btree_ops entries for the key and
record sizes are used. Also add a big comment describing how the
addressing inside a btree block works.

Note that these helpers are unused until users are introduced in the next
patches and this patch will thus cause some harmless compiler warnings.

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32189a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:55:34 +11:00
Christoph Hellwig
ce5e42db42 [XFS] add get_maxrecs btree operation
Factor xfs_btree_maxrecs into a per-btree operation.

The get_maxrecs method is based on a patch from Dave Chinner.

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32188a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:55:23 +11:00
Christoph Hellwig
8c4ed633e6 [XFS] make btree tracing generic
Make the existing bmap btree tracing generic so that it applies to all
btree types.

Some fragments lifted from a patch by Dave Chinner.

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32187a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:55:13 +11:00
David Chinner
854929f058 [XFS] add new btree statistics
From: Dave Chinner <dgc@sgi.com>

Introduce statistics coverage of all the btrees and cover all the btree
operations, not just some.

Invaluable for determining test code coverage of all the btree
operations....

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32184a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
2008-10-30 16:55:03 +11:00
Christoph Hellwig
a23f6ef8ce [XFS] refactor btree validation helpers
Move the various btree validation helpers around in xfs_btree.c so that
they are close to each other and in common #ifdef DEBUG sections.

Also add a new xfs_btree_check_ptr helper to check a btree ptr that can be
either long or short form.

Split out from a bigger patch from Dave Chinner with various small changes
applied by me.

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32183a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:54:53 +11:00
Christoph Hellwig
b524bfeee2 [XFS] refactor xfs_btree_readahead
From: Dave Chinner <dgc@sgi.com>

Refactor xfs_btree_readahead to make it more readable:

(a) remove the inline xfs_btree_readahead wrapper and move all checks out

of line into the main routine.

(b) factor out helpers for short/long form btrees

(c) move check for root in inodes from the callers into
xfs_btree_readahead

[hch: split out from a big patch and minor cleanups]

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32182a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:54:43 +11:00
Christoph Hellwig
e99ab90d6a [XFS] add a long pointers flag to xfs_btree_cur
Add a flag to the xfs btree cursor when using long (64bit) block pointers
instead of checking btnum == XFS_BTNUM_BMAP.

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32181a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:54:33 +11:00
Christoph Hellwig
8186e517fa [XFS] make btree root in inode support generic
The bmap btree is rooted in the inode and not in a disk block. Make the
support for this feature more generic by adding a btree flag to for this
feature instead of relying on the XFS_BTNUM_BMAP btnum check.

Also clean up xfs_btree_get_block where this new flag is used.

Based upon a patch from Dave Chinner.

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32180a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:54:22 +11:00
Christoph Hellwig
de227dd960 [XFS] add generic btree types
Add generic union types for btree pointers, keys and records. The generic
btree pointer contains either a 32 and 64bit big endian scalar for short
and long form btrees, and the key and record contain the relevant type for
each possible btree.

Split out from a bigger patch from Dave Chinner and simplified a little
further.

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32178a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:54:12 +11:00
Christoph Hellwig
561f7d1739 [XFS] split up xfs_btree_init_cursor
xfs_btree_init_cursor contains close to little shared code for the
different btrees and will get even more non-common code in the future.
Split it up into one routine per btree type.

Because xfs_btree_dup_cursor needs to call the init routine for a generic
btree cursor add a new btree operation vector that contains a dup_cursor
method that initializes a new cursor based on an existing one.

The btree operations vector is based on an idea and code from Dave Chinner
and will grow more entries later during this series.

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32176a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:53:59 +11:00
Christoph Hellwig
f2277f06e6 [XFS] kill struct xfs_btree_hdr
This type is only embedded in struct xfs_btree_block and never used
directly. By moving the fields directly into struct xfs_btree_block a lot
of the macros for struct xfs_btree_sblock and struct xfs_btree_lblock can
be used for struct xfs_btree_block too now which helps greatly with some
of the migrations during implementing the generic btree code.

SGI-PV: 985583

SGI-Modid: xfs-linux-melb:xfs-kern:32174a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Bill O'Donnell <billodo@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:53:47 +11:00
Lachlan McIlroy
f338f90364 [XFS] Unlock inode before calling xfs_idestroy()
Lock debugging reported the ilock was being destroyed without being
unlocked. We don't need to lock the inode until we are going to insert it
into the radix tree.

SGI-PV: 987246

SGI-Modid: xfs-linux-melb:xfs-kern:32159a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 16:53:38 +11:00
Lachlan McIlroy
a357a12156 [XFS] Fix use-after-free with log and quotas
Destroying the quota stuff on unmount can access the log - ie
XFS_QM_DONE() ends up in xfs_dqunlock() which calls
xfs_trans_unlocked_item() and then xfs_log_move_tail(). By this time the
log has already been destroyed. Just move the cleanup of the quota code
earlier in xfs_unmountfs() before the call to xfs_log_unmount(). Moving
XFS_QM_DONE() up near XFS_QM_DQPURGEALL() seems like a good spot.

SGI-PV: 987086

SGI-Modid: xfs-linux-melb:xfs-kern:32148a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Peter Leckie <pleckie@sgi.com>
2008-10-30 16:53:25 +11:00
Barry Naujok
46039928c9 [XFS] Remove final remnants of dirv1 macros and other stuff
SGI-PV: 981498

SGI-Modid: xfs-linux-melb:xfs-kern:32002a

Signed-off-by: Barry Naujok <bnaujok@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 16:52:35 +11:00
Lachlan McIlroy
d07c60e54f [XFS] Use xfs_idestroy() to cleanup an inode.
SGI-PV: 981498

SGI-Modid: xfs-linux-melb:xfs-kern:31927a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <david@fromorbit.com>
2008-10-30 16:50:35 +11:00
Lachlan McIlroy
be8b78a626 [XFS] Remove kmem_zone_t argument from xfs_inode_init_once()
kmem cache constructor no longer takes a kmem_zone_t argument.

SGI-PV: 957103

SGI-Modid: xfs-linux-melb:xfs-kern:32254a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 16:42:34 +11:00
David Chinner
07c8f67587 [XFS] Make use of the init-once slab optimisation.
To avoid having to initialise some fields of the XFS inode on every
allocation, we can use the slab init-once feature to initialise them. All
we have to guarantee is that when we free the inode, all it's entries are
in the initial state. Add asserts where possible to ensure debug kernels
check this initial state before freeing and after allocation.

SGI-PV: 981498

SGI-Modid: xfs-linux-melb:xfs-kern:31925a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-10-30 16:11:59 +11:00
Harvey Harrison
5b095d9892 net: replace %p6 with %pI6
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29 12:52:50 -07:00
Harvey Harrison
4b7a4274ca net: replace %#p6 format specifier with %pi6
gcc warns when using the # modifier with the %p format specifier,
so we can't use this to omit the colons when needed, introduces
%pi6 instead.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29 12:50:24 -07:00
Steve French
edf1ae4038 [CIFS] Reduce number of socket retries in large write path
CIFS in some heavy stress conditions cifs could get EAGAIN
repeatedly in smb_send2 which led to repeated retries and eventually
failure of large writes which could lead to data corruption.

There are three changes that were suggested by various network
developers:

1) convert cifs from non-blocking to blocking tcp sendmsg
(we left in the retry on failure)
2) change cifs to not set sendbuf and rcvbuf size for the socket
(let tcp autotune the buffer sizes since that works much better
in the TCP stack now)
3) if we have a partial frame sent in smb_send2, mark the tcp
session as invalid (close the socket and reconnect) so we do
not corrupt the remaining part of the SMB with the beginning
of the next SMB.

This does not appear to hurt performance measurably and has
been run in various scenarios, but it definately removes
a corruption that we were seeing in some high stress
test cases.

Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-10-29 00:47:57 +00:00
Harvey Harrison
1afa67f5e7 misc: replace NIP6_FMT with %p6 format specifier
The iscsi_ibft.c changes are almost certainly a bugfix as the
pointer 'ip' is a u8 *, so they never print the last 8 bytes
of the IPv6 address, and the eight bytes they do print have
a zero byte with them in each 16-bit word.

Other than that, this should cause no difference in functionality.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28 16:06:44 -07:00
Harvey Harrison
b071195deb net: replace all current users of NIP6_SEQFMT with %#p6
The define in kernel.h can be done away with at a later time.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28 16:05:40 -07:00
Trond Myklebust
ae05f26940 NFS: Convert nfs_attr_generation_counter into an atomic_long
The most important property we need from nfs_attr_generation_counter is
monotonicity, which is not guaranteed by the current system of smp memory
barriers. We should convert it to an atomic_long_t, and drop the memory
barriers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-28 15:21:40 -04:00
Eric Sandeen
a996031c87 delay capable() check in ext4_has_free_blocks()
As reported by Eric Paris, the capable() check in ext4_has_free_blocks()
sometimes causes SELinux denials.

We can rearrange the logic so that we only try to use the root-reserved
blocks when necessary, and even then we can move the capable() test
to last, to avoid the check most of the time.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-10-28 00:08:17 -04:00
Eric Sandeen
8c3bf8a01c merge ext4_claim_free_blocks & ext4_has_free_blocks
Mingming pointed out that ext4_claim_free_blocks & ext4_has_free_blocks
are largely cut & pasted; they can be collapsed/merged as follows.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-10-28 00:08:12 -04:00
Theodore Ts'o
6c20ec8503 jbd2: Call the commit callback before the transaction could get dropped
The transaction can potentially get dropped if there are no buffers
that need to be written.  Make sure we call the commit callback before
potentially deciding to drop the transaction.  Also avoid
dereferencing the commit_transaction pointer in the marker for the
same reason.

This patch fixes the bug reported by Eric Paris at:
http://bugzilla.kernel.org/show_bug.cgi?id=11838

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Eric Paris <eparis@redhat.com>
2008-10-28 21:08:20 -04:00
Hidehiro Kawai
ef2cabf7c6 ext4: fix a bug accessing freed memory in ext4_abort
Vegard Nossum reported a bug which accesses freed memory (found via
kmemcheck).  When journal has been aborted, ext4_put_super() calls
ext4_abort() after freeing the journal_t object, and then ext4_abort()
accesses it.  This patch fix it.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-10-27 22:53:05 -04:00
Hidehiro Kawai
44d6f78756 ext3: fix a bug accessing freed memory in ext3_abort
Vegard Nossum reported a bug which accesses freed memory (found via
kmemcheck).  When journal has been aborted, ext3_put_super() calls
ext3_abort() after freeing the journal_t object, and then ext3_abort()
accesses it.  This patch fix it.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-10-27 22:51:46 -04:00
Alexey Dobriyan
6c87df37dc proc: revert /proc/uptime to ->read_proc hook
Turned out some VMware userspace does pread(2) on /proc/uptime, but
seqfiles currently don't allow pread() resulting in -ESPIPE.

Seqfiles in theory can do pread(), but this can be a long story,
so revert to ->read_proc until then.

http://bugzilla.kernel.org/show_bug.cgi?id=11856

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-27 22:56:56 +03:00
Alan Cox
526719ba51 Switch to a valid email address...
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-27 08:40:17 -07:00
Davide Libenzi
9ce209d64d epoll: avoid double-inserts in case of EFAULT
In commit f337b9c583 ("epoll: drop
unnecessary test") Thomas found that there is an unnecessary (always
true) test in ep_send_events().  The callback never inserts into
->rdllink while the send loop is performed, and also does the
~EP_PRIVATE_BITS test.  Given we're holding the mutex during this time,
the conditions tested inside the loop are always true.

HOWEVER.

The test "!ep_is_linked(&epi->rdllink)" wasn't there because we insert
into ->rdllink, but because the send-events loop might terminate before
the whole list is scanned (-EFAULT).

In such cases, when the loop terminates early, and when a (leftover)
file received an event while we're performing the lockless loop, we need
such test to avoid to double insert the epoll items.  The list_splice()
done a few steps below, will correctly re-insert the ones that were left
on "txlist".

This should fix the kenrel.org bugzilla entry 11831.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-26 12:09:49 -07:00
Arjan van de Ven
4d36a9e65d select: deal with math overflow from borderline valid userland data
Some userland apps seem to pass in a "0" for the seconds, and several
seconds worth of usecs to select().  The old kernels accepted this just
fine, so the new kernels must too.

However, due to the upscaling of the microseconds to nanoseconds we had
some cases where we got math overflow, and depending on the GCC version
(due to inlining decisions) that actually resulted in an -EINVAL return.

This patch fixes this by adding the excess microseconds to the seconds
field.

Also with thanks to Marcin Slusarz for spotting some implementation bugs
in the diagnostics patches.

Reported-by: Carlos R. Mafra <crmafra2@gmail.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-26 11:22:08 -07:00
Theodore Ts'o
3c37fc86d2 ext4: Fix duplicate entries returned from getdents() system call
Fix a regression caused by commit d0156417, "ext4: fix ext4_dx_readdir
hash collision handling", where deleting files in a large directory
(requiring more than one getdents system call), results in some
filenames being returned twice.  This was caused by a failure to
update info->curr_hash and info->curr_minor_hash, so that if the
directory had gotten modified since the last getdents() system call
(as would be the case if the user is running "rm -r" or "git clean"),
a directory entry would get returned twice to the userspace.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

This patch fixes the bug reported by Markus Trippelsdorf at:
http://bugzilla.kernel.org/show_bug.cgi?id=11844

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
2008-10-25 22:37:55 -04:00
Theodore Ts'o
8c9fa93d51 ext3: Fix duplicate entries returned from getdents() system call
Fix a regression caused by commit 6a897cf4, "ext3: fix ext3_dx_readdir
hash collision handling", where deleting files in a large directory
(requiring more than one getdents system call), results in some
filenames being returned twice.  This was caused by a failure to
update info->curr_hash and info->curr_minor_hash, so that if the
directory had gotten modified since the last getdents() system call
(as would be the case if the user is running "rm -r" or "git clean"),
a directory entry would get returned twice to the userspace.

This patch fixes the bug reported by Markus Trippelsdorf at:
http://bugzilla.kernel.org/show_bug.cgi?id=11844

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
2008-10-25 22:37:44 -04:00
Linus Torvalds
88ed86fee6 Merge branch 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc
* 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc: (35 commits)
  proc: remove fs/proc/proc_misc.c
  proc: move /proc/vmcore creation to fs/proc/vmcore.c
  proc: move pagecount stuff to fs/proc/page.c
  proc: move all /proc/kcore stuff to fs/proc/kcore.c
  proc: move /proc/schedstat boilerplate to kernel/sched_stats.h
  proc: move /proc/modules boilerplate to kernel/module.c
  proc: move /proc/diskstats boilerplate to block/genhd.c
  proc: move /proc/zoneinfo boilerplate to mm/vmstat.c
  proc: move /proc/vmstat boilerplate to mm/vmstat.c
  proc: move /proc/pagetypeinfo boilerplate to mm/vmstat.c
  proc: move /proc/buddyinfo boilerplate to mm/vmstat.c
  proc: move /proc/vmallocinfo to mm/vmalloc.c
  proc: move /proc/slabinfo boilerplate to mm/slub.c, mm/slab.c
  proc: move /proc/slab_allocators boilerplate to mm/slab.c
  proc: move /proc/interrupts boilerplate code to fs/proc/interrupts.c
  proc: move /proc/stat to fs/proc/stat.c
  proc: move rest of /proc/partitions code to block/genhd.c
  proc: move /proc/cpuinfo code to fs/proc/cpuinfo.c
  proc: move /proc/devices code to fs/proc/devices.c
  proc: move rest of /proc/locks to fs/locks.c
  ...
2008-10-23 12:04:37 -07:00
Christoph Hellwig
3856d30ded ext4: remove unused variable in ext4_get_parent
Signed-off-by: Christoph Hellwig <hch@lst.de>
[ All users removed in "switch all filesystems over to d_obtain_alias",
  aka commit 440037287c ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-23 12:03:23 -07:00
Linus Torvalds
12e1ec9ff3 ext3 quota support: fix compile failure
This one was due to a merge error: we added a use of nd.path in commit
2d7c820e56 ("ext3: add checks for errors
from jbd"), and concurrently we got rid of 'nd' and used a naked 'path'
in commit 8264613def ("[PATCH] switch
quota_on-related stuff to kern_path()").

That all merged cleanly, but it didn't actually _work_.  This should fix
it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-23 11:48:56 -07:00
Linus Torvalds
1f6d6e8ebe Merge branch 'v28-range-hrtimers-for-linus-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'v28-range-hrtimers-for-linus-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (37 commits)
  hrtimers: add missing docbook comments to struct hrtimer
  hrtimers: simplify hrtimer_peek_ahead_timers()
  hrtimers: fix docbook comments
  DECLARE_PER_CPU needs linux/percpu.h
  hrtimers: fix typo
  rangetimers: fix the bug reported by Ingo for real
  rangetimer: fix BUG_ON reported by Ingo
  rangetimer: fix x86 build failure for the !HRTIMERS case
  select: fix alpha OSF wrapper
  select: fix alpha OSF wrapper
  hrtimer: peek at the timer queue just before going idle
  hrtimer: make the futex() system call use the per process slack value
  hrtimer: make the nanosleep() syscall use the per process slack
  hrtimer: fix signed/unsigned bug in slack estimator
  hrtimer: show the timer ranges in /proc/timer_list
  hrtimer: incorporate feedback from Peter Zijlstra
  hrtimer: add a hrtimer_start_range() function
  hrtimer: another build fix
  hrtimer: fix build bug found by Ingo
  hrtimer: make select() and poll() use the hrtimer range feature
  ...
2008-10-23 10:53:02 -07:00
Linus Torvalds
db563fc2e8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: handle the TCP_Server_Info->tsk field more carefully
  cifs: fix unlinking of rename target when server doesn't support open file renames
  [CIFS] improve setlease handling
  [CIFS] fix saving of resume key before CIFSFindNext
  cifs: make cifs_rename handle -EACCES errors
  [CIFS] fix build error
  [CIFS] undo changes in cifs_rename_pending_delete if it errors out
  cifs: track DeletePending flag in cifsInodeInfo
  cifs: don't use CREATE_DELETE_ON_CLOSE in cifs_rename_pending_delete
  [CIFS] eliminate usage of kthread_stop for cifsd
  [CIFS] Add nodfs mount option
2008-10-23 10:43:36 -07:00
Linus Torvalds
2248485640 Merge git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev
* git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: (66 commits)
  [PATCH] kill the rest of struct file propagation in block ioctls
  [PATCH] get rid of struct file use in blkdev_ioctl() BLKBSZSET
  [PATCH] get rid of blkdev_locked_ioctl()
  [PATCH] get rid of blkdev_driver_ioctl()
  [PATCH] sanitize blkdev_get() and friends
  [PATCH] remember mode of reiserfs journal
  [PATCH] propagate mode through swsusp_close()
  [PATCH] propagate mode through open_bdev_excl/close_bdev_excl
  [PATCH] pass fmode_t to blkdev_put()
  [PATCH] kill the unused bsize on the send side of /dev/loop
  [PATCH] trim file propagation in block/compat_ioctl.c
  [PATCH] end of methods switch: remove the old ones
  [PATCH] switch sr
  [PATCH] switch sd
  [PATCH] switch ide-scsi
  [PATCH] switch tape_block
  [PATCH] switch dcssblk
  [PATCH] switch dasd
  [PATCH] switch mtd_blkdevs
  [PATCH] switch mmc
  ...
2008-10-23 10:23:07 -07:00
Linus Torvalds
5ed487bc2c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (46 commits)
  [PATCH] fs: add a sanity check in d_free
  [PATCH] i_version: remount support
  [patch] vfs: make security_inode_setattr() calling consistent
  [patch 1/3] FS_MBCACHE: don't needlessly make it built-in
  [PATCH] move executable checking into ->permission()
  [PATCH] fs/dcache.c: update comment of d_validate()
  [RFC PATCH] touch_mnt_namespace when the mount flags change
  [PATCH] reiserfs: add missing llseek method
  [PATCH] fix ->llseek for more directories
  [PATCH vfs-2.6 6/6] vfs: add LOOKUP_RENAME_TARGET intent
  [PATCH vfs-2.6 5/6] vfs: remove LOOKUP_PARENT from non LOOKUP_PARENT lookup
  [PATCH vfs-2.6 4/6] vfs: remove unnecessary fsnotify_d_instantiate()
  [PATCH vfs-2.6 3/6] vfs: add __d_instantiate() helper
  [PATCH vfs-2.6 2/6] vfs: add d_ancestor()
  [PATCH vfs-2.6 1/6] vfs: replace parent == dentry->d_parent by IS_ROOT()
  [PATCH] get rid of on-stack dentry in udf
  [PATCH 2/2] anondev: switch to IDA
  [PATCH 1/2] anondev: init IDR statically
  [JFFS2] Use d_splice_alias() not d_add() in jffs2_lookup()
  [PATCH] Optimise NFS readdir hack slightly.
  ...
2008-10-23 10:22:40 -07:00
Linus Torvalds
b4d0b08a4c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: fix sparse warnings
  9p: rdma: RDMA Transport Support for 9P
  9p: fix format warning
  9p: fix debug build error
2008-10-23 10:14:45 -07:00
Linus Torvalds
33217379be Merge branch 'for-2.6.28' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.28' of git://linux-nfs.org/~bfields/linux:
  nfsd: clean up expkey_parse error cases
  nfsd: Drop reference in expkey_parse error cases
  nfsd: Fix memory leak in nfsd_getxattr
  NFSD: Fix BUG during NFSD shutdown processing
2008-10-23 10:14:23 -07:00
Duane Griffin
be07c4ed40 jbd: abort instead of waiting for nonexistent transactions
The __log_wait_for_space function sits in a loop checkpointing
transactions until there is sufficient space free in the journal.
However, if there are no transactions to be processed (e.g.  because the
free space calculation is wrong due to a corrupted filesystem) it will
never progress.

Check for space being required when no transactions are outstanding and
abort the journal instead of endlessly looping.

This patch fixes the bug reported by Sami Liedes at:
http://bugzilla.kernel.org/show_bug.cgi?id=10976

Signed-off-by: Duane Griffin <duaneg@dghda.com>
Tested-by: Sami Liedes <sliedes@cc.hut.fi>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-23 08:55:02 -07:00
Hidehiro Kawai
9f818b4ac0 jbd: test BH_Write_EIO to detect errors on metadata buffers
__try_to_free_cp_buf(), __process_buffer(), and __wait_cp_io() test
BH_Uptodate flag to detect write I/O errors on metadata buffers.  But by
commit 95450f5a7e "ext3: don't read inode
block if the buffer has a write error"(*), BH_Uptodate flag can be set to
inode buffers with BH_Write_EIO in order to avoid reading old inode data.
So now, we have to test BH_Write_EIO flag of checkpointing inode buffers
instead of BH_Uptodate.  This patch does it.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-23 08:55:02 -07:00
Hidehiro Kawai
2d7c820e56 ext3: add checks for errors from jbd
If the journal has aborted due to a checkpointing failure, we have to
keep the contents of the journal space.  Otherwise, the filesystem will
lose uncheckpointed metadata completely and become inconsistent.  To
avoid this, we need to keep needs_recovery flag if checkpoint has
failed.

With this patch, ext3_put_super() detects a checkpointing failure from
the return value of journal_destroy(), then it invokes ext3_abort() to
make the filesystem read only and keep needs_recovery flag.  Errors
from journal_flush() are also handled by this patch in some places.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-23 08:55:01 -07:00
Hidehiro Kawai
4afe978530 jbd: fix error handling for checkpoint io
When a checkpointing IO fails, current JBD code doesn't check the error
and continue journaling.  This means latest metadata can be lost from both
the journal and filesystem.

This patch leaves the failed metadata blocks in the journal space and
aborts journaling in the case of log_do_checkpoint().  To achieve this, we
need to do:

1. don't remove the failed buffer from the checkpoint list where in
   the case of __try_to_free_cp_buf() because it may be released or
   overwritten by a later transaction
2. log_do_checkpoint() is the last chance, remove the failed buffer
   from the checkpoint list and abort the journal
3. when checkpointing fails, don't update the journal super block to
   prevent the journaled contents from being cleaned.  For safety,
   don't update j_tail and j_tail_sequence either
4. when checkpointing fails, notify this error to the ext3 layer so
   that ext3 don't clear the needs_recovery flag, otherwise the
   journaled contents are ignored and cleaned in the recovery phase
5. if the recovery fails, keep the needs_recovery flag
6. prevent cleanup_journal_tail() from being called between
   __journal_drop_transaction() and journal_abort() (a race issue
   between journal_flush() and __log_wait_for_space()

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-23 08:55:01 -07:00
Alexey Dobriyan
59c7572e82 proc: remove fs/proc/proc_misc.c
Now that everything was moved to their more or less expected places,
apply rm(1).

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 18:54:05 +04:00
Alexey Dobriyan
5aa140c2de proc: move /proc/vmcore creation to fs/proc/vmcore.c
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 18:51:22 +04:00
Alexey Dobriyan
6d80e53f00 proc: move pagecount stuff to fs/proc/page.c
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 18:48:31 +04:00
Alexey Dobriyan
97ce5d6dcb proc: move all /proc/kcore stuff to fs/proc/kcore.c
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 18:32:38 +04:00
Alexey Dobriyan
b5aadf7f14 proc: move /proc/schedstat boilerplate to kernel/sched_stats.h
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 18:06:12 +04:00
Alexey Dobriyan
3b5d5c6b0c proc: move /proc/modules boilerplate to kernel/module.c
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 18:03:13 +04:00
Alexey Dobriyan
31d85ab28e proc: move /proc/diskstats boilerplate to block/genhd.c
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
2008-10-23 17:57:37 +04:00
Alexey Dobriyan
5c9fe6281b proc: move /proc/zoneinfo boilerplate to mm/vmstat.c
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
2008-10-23 17:35:04 +04:00
Alexey Dobriyan
b6aa44ab69 proc: move /proc/vmstat boilerplate to mm/vmstat.c
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
2008-10-23 17:12:51 +04:00
Alexey Dobriyan
74e2e8e8ce proc: move /proc/pagetypeinfo boilerplate to mm/vmstat.c
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 16:33:29 +04:00
Alexey Dobriyan
8f32f7e5ac proc: move /proc/buddyinfo boilerplate to mm/vmstat.c
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 16:12:04 +04:00
Alexey Dobriyan
5f6a6a9c4e proc: move /proc/vmallocinfo to mm/vmalloc.c
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
2008-10-23 15:48:28 +04:00
Alexey Dobriyan
7b3c3a50a3 proc: move /proc/slabinfo boilerplate to mm/slub.c, mm/slab.c
Lose dummy ->write hook in case of SLUB, it's possible now.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
2008-10-23 15:20:06 +04:00
Alexey Dobriyan
a0ec95a8e6 proc: move /proc/slab_allocators boilerplate to mm/slab.c
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
2008-10-23 15:17:27 +04:00
Alexey Dobriyan
d6917e19f3 proc: move /proc/interrupts boilerplate code to fs/proc/interrupts.c
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 15:15:46 +04:00
Alexey Dobriyan
df8106dbb5 proc: move /proc/stat to fs/proc/stat.c
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 15:14:05 +04:00
Alexey Dobriyan
f500975a3f proc: move rest of /proc/partitions code to block/genhd.c
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
2008-10-23 15:07:31 +04:00
Alexey Dobriyan
8591cf4322 proc: move /proc/cpuinfo code to fs/proc/cpuinfo.c
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 15:05:11 +04:00
Alexey Dobriyan
fe2510426a proc: move /proc/devices code to fs/proc/devices.c
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 15:02:18 +04:00
Alexey Dobriyan
d8ba7a3633 proc: move rest of /proc/locks to fs/locks.c
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 14:37:00 +04:00
Alexey Dobriyan
ae048112c0 proc: move /proc/kmsg creation to fs/proc/kmsg.c
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 14:35:08 +04:00
Alexey Dobriyan
659689280a proc: remove remnants of ->read_proc in proc_misc.c
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 14:32:59 +04:00
Alexey Dobriyan
6e62775ece proc: move /proc/execdomains to kernel/exec_domain.c
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 14:30:41 +04:00
Alexey Dobriyan
cf9887f102 proc: switch /proc/cmdline to seq_file
and move it to fs/proc/cmdline.c while I'm at it.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 14:29:04 +04:00
Alexey Dobriyan
6827400713 proc: move /proc/filesystems to fs/filesystems.c
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 14:27:09 +04:00
Alexey Dobriyan
4c150f6c30 proc: move /proc/stram to m68k-specific code
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 14:25:35 +04:00
Alexey Dobriyan
813dcf7a6e proc: move /proc/hardware to m68k-specific code
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 14:24:03 +04:00
Alexey Dobriyan
b457d15161 proc: switch /proc/version to seq_file
and move it to fs/proc/version.c while I'm at it.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 14:19:58 +04:00
Alexey Dobriyan
e1759c215b proc: switch /proc/meminfo to seq_file
and move it to fs/proc/meminfo.c while I'm at it.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 13:52:40 +04:00
Alexey Dobriyan
9617760287 proc: switch /proc/uptime to seq_file
and move it to fs/proc/uptime.c while I'm at it.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 13:48:01 +04:00
Alexey Dobriyan
5b3acc8de8 proc: switch /proc/loadavg to seq_file
and move it to fs/proc/loadavg.c while I'm at it.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 13:45:28 +04:00
Arjan van de Ven
6c2f91e077 proc: use WARN() rather than printk+backtrace
Use WARN() rather than a printk() + backtrace();
this gives a more standard format message as well as complete
information (including line numbers etc) that will be collected
by kerneloops.org

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 13:34:38 +04:00
Alexey Dobriyan
1e0edd3f67 proc: spread __init
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 13:32:31 +04:00
Alexey Dobriyan
5bcd7ff9e1 proc: proc_init_inodecache() can't fail
kmem_cache creation code will panic, don't return anything.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 13:27:50 +04:00
Joe Korty
7c88db0cb5 proc: fix vma display mismatch between /proc/pid/{maps,smaps}
Commit 4752c36978 aka
"maps4: simplify interdependence of maps and smaps" broke /proc/pid/smaps,
causing it to display some vmas twice and other vmas not at all.  For example:

    grep .- /proc/1/smaps >/tmp/smaps; diff /proc/1/maps /tmp/smaps

    1  25d24
    2  < 7fd7e23aa000-7fd7e23ac000 rw-p 7fd7e23aa000 00:00 0
    3  28a28
    4  > ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0  [vsyscall]

The bug has something to do with setting m->version before all the
seq_printf's have been performed.  show_map was doing this correctly,
but show_smap was doing this in the middle of its seq_printf sequence.
This patch arranges things so that the setting of m->version in show_smap
is also done at the end of its seq_printf sequence.

Testing: in addition to the above grep test, for each process I summed
up the 'Rss' fields of /proc/pid/smaps and compared that to the 'VmRSS'
field of /proc/pid/status.  All matched except for Xorg (which has a
/dev/mem mapping which Rss accounts for but VmRSS does not).  This result
gives us some confidence that neither /proc/pid/maps nor /proc/pid/smaps
are any longer skipping or double-counting vmas.

Signed-off-by: Joe Korty <joe.korty@ccur.com>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 13:21:29 +04:00
Arjan van de Ven
fd217f4d70 [PATCH] fs: add a sanity check in d_free
Hi Al,

remember that debug session we did at KS? You suggested this patch back
then....

From 7751eaf30474b8cbfaea64795805a17eab05ac53 Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <arjan@linux.intel.com>
Date: Tue, 16 Sep 2008 16:51:17 -0700
Subject: [PATCH] fs: add a sanity check in d_free

we're seeing some corruption in the dentry->d_alias list that
appears like a free of an entry still on the list; this patch
adds a WARN_ON() to catch this scenario, as suggested by Al Viro

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
2008-10-23 05:17:12 -04:00
Miklos Szeredi
a77b72da24 [patch] vfs: make security_inode_setattr() calling consistent
Call security_inode_setattr() consistetly before inode_change_ok().
It doesn't make sense to try to "optimize" the i_op->setattr == NULL
case, as most filesystem do define their own setattr function.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-10-23 05:13:27 -04:00
Adrian Bunk
2c512397ca [patch 1/3] FS_MBCACHE: don't needlessly make it built-in
Assume you have:
- one or more of ext2/3/4 statically built into your kernel
- none of these with extended attributes enabled and
- want to add onother one of ext2/3/4 modular and with
  extended attributes enabled

then you currently have to reboot to use it since this results in
CONFIG_FS_MBCACHE=y.

That's not a common issue, but I just ran into it and since there's no
reason to get a built-in mbcache in this case this patch fixes it.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Andreas Gruenbacher <agruen@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2008-10-23 05:13:26 -04:00
Miklos Szeredi
f696a3659f [PATCH] move executable checking into ->permission()
For execute permission on a regular files we need to check if file has
any execute bits at all, regardless of capabilites.

This check is normally performed by generic_permission() but was also
added to the case when the filesystem defines its own ->permission()
method.  In the latter case the filesystem should be responsible for
performing this check.

Move the check from inode_permission() inside filesystems which are
not calling generic_permission().

Create a helper function execute_ok() that returns true if the inode
is a directory or if any execute bits are present in i_mode.

Also fix up the following code:

 - coda control file is never executable
 - sysctl files are never executable
 - hfs_permission seems broken on MAY_EXEC, remove
 - hfsplus_permission is eqivalent to generic_permission(), remove

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-10-23 05:13:25 -04:00
Qinghuang Feng
5cec56deb6 [PATCH] fs/dcache.c: update comment of d_validate()
Parameters @hash and @len have been removed since 2.4.3,
now just to delete them.

Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com>
2008-10-23 05:13:24 -04:00
Dan Williams
0e55a7cca4 [RFC PATCH] touch_mnt_namespace when the mount flags change
Daemons that need to be launched while the rootfs is read-only can now
poll /proc/mounts to be notified when their O_RDWR requests may no
longer end in EROFS.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-10-23 05:13:23 -04:00
Christoph Hellwig
91efc167d0 [PATCH] reiserfs: add missing llseek method
Reiserfs currently doesn't set a llseek method for regular files, which
means it will fall back to default_llseek.  This means no one can seek
beyond 2 Gigabytes on reiserfs, and that there's not protection vs
the i_size updates from writers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2008-10-23 05:13:22 -04:00
Christoph Hellwig
3222a3e55f [PATCH] fix ->llseek for more directories
With this patch all directory fops instances that have a readdir
that doesn't take the BKL are switched to generic_file_llseek.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2008-10-23 05:13:21 -04:00
OGAWA Hirofumi
4e9ed2f85a [PATCH vfs-2.6 6/6] vfs: add LOOKUP_RENAME_TARGET intent
This adds LOOKUP_RENAME_TARGET intent for lookup of rename destination.

LOOKUP_RENAME_TARGET is going to be used like LOOKUP_CREATE. But since
the destination of rename() can be existing directory entry, so it has a
difference. Although that difference doesn't matter in my usage, this
tells it to user of this intent.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2008-10-23 05:13:20 -04:00
OGAWA Hirofumi
0612d9fb27 [PATCH vfs-2.6 5/6] vfs: remove LOOKUP_PARENT from non LOOKUP_PARENT lookup
lookup_hash() with LOOKUP_PARENT is bogus. And this prepares to add
new intent on those path.

The user of LOOKUP_PARENT intent is nfs only, and it checks whether
nd->flags has LOOKUP_CREATE or LOOKUP_OPEN, so the result is same.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2008-10-23 05:13:19 -04:00
OGAWA Hirofumi
8f3dfaa5ba [PATCH vfs-2.6 4/6] vfs: remove unnecessary fsnotify_d_instantiate()
This calls d_move(), so fsnotify_d_instantiate() is unnecessary like
rename path.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2008-10-23 05:13:18 -04:00
OGAWA Hirofumi
360da90029 [PATCH vfs-2.6 3/6] vfs: add __d_instantiate() helper
This adds __d_instantiate() for users which is already taking
dcache_lock, and replace with it.

The part of d_add_ci() isn't equivalent. But it should be needed
fsnotify_d_instantiate() actually, because the path is to add the
inode to negative dentry.  fsnotify_d_instantiate() should be called
after change from negative to positive.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2008-10-23 05:13:17 -04:00
OGAWA Hirofumi
e2761a1167 [PATCH vfs-2.6 2/6] vfs: add d_ancestor()
This adds d_ancestor() instead of d_isparent(), then use it.

If new_dentry == old_dentry, is_subdir() returns 1, looks strange.
"new_dentry == old_dentry" is not subdir obviously. But I'm not
checking callers for now, so this keeps current behavior.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2008-10-23 05:13:16 -04:00
OGAWA Hirofumi
871c0067d5 [PATCH vfs-2.6 1/6] vfs: replace parent == dentry->d_parent by IS_ROOT()
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2008-10-23 05:13:16 -04:00
Al Viro
9fbb76ce0f [PATCH] get rid of on-stack dentry in udf
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-23 05:13:15 -04:00
Alexey Dobriyan
ad76cbc63b [PATCH 2/2] anondev: switch to IDA
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 05:13:14 -04:00
Alexey Dobriyan
6de24f0ed0 [PATCH 1/2] anondev: init IDR statically
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-23 05:13:13 -04:00
David Woodhouse
8966c5e0fc [JFFS2] Use d_splice_alias() not d_add() in jffs2_lookup()
Now that JFFS2 can be exported by NFS, we need to get this right.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-23 05:13:12 -04:00
David Woodhouse
c002a6c797 [PATCH] Optimise NFS readdir hack slightly.
Avoid calling the underlying ->readdir() again when we reached the end
already; keep going round the loop only if we stopped due to our own
buffer being full.

[AV: tidy the things up a bit, while we are there]

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-23 05:13:11 -04:00
Al Viro
53c9c5c0e3 [PATCH] prepare vfs_readdir() callers to returning filldir result
It's not the final state, but it allows moving ->readdir() instances
to passing filldir return value to caller of vfs_readdir().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-23 05:13:10 -04:00