Commit Graph

23952 Commits

Author SHA1 Message Date
Linus Torvalds
f6a975c50a Merge git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6:
  fat: fat16 support maximum 4GB file/vol size as WinXP or 7.
  fat: fix utf8 iocharset warning message
  fat: fix build warning
2011-08-18 14:16:13 -07:00
Timo Warns
338d0f0a6f befs: Validate length of long symbolic links.
Signed-off-by: Timo Warns <warns@pre-sense.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-17 13:31:24 -07:00
Namjae Jeon
710d4403a4 fat: fat16 support maximum 4GB file/vol size as WinXP or 7.
FAT16 support maximum 4GB vol/file size with 64KB cluster size.

Win NT/XP/7 increased the maximum cluster size to 64KB, and file/vol
size increased 4GB also.  Although increasing, the file size of linux
FAT is still limited at 2GB.

I found that it is limited by sb->maxbytes(0x7fffffff) when partition
is formatted by FAT16.  sb->s_maxbytes in fill_super should be set to
0xffffffff like fat32.

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2011-08-17 19:35:00 +09:00
Mihai Moldovan
186b53701c fat: fix utf8 iocharset warning message
The fat_msg function already formats the given message and appends
a newline to it - we don't need to do this in the passed message
string as well, or will end up with a blank line printed in the
kernel log ring buffer.

Also change the loglevel from error to warning.

Signed-off-by: Mihai Moldovan <ionic@ionic.de>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2011-08-17 19:34:59 +09:00
Jonas Aberg
8c320c079c fat: fix build warning
This fixes a compile warning (unititialized variable) in
the fat filesystem code.

Signed-off-by: Jonas Aberg <jonas.aberg@stericsson.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2011-08-17 19:34:58 +09:00
Linus Torvalds
a0b3447fb1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
  jfs: flush journal completely before releasing metadata inodes
2011-08-15 08:40:24 -07:00
Linus Torvalds
aa2b1cf5ca Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: Do not set cifs/ntfs acl using a file handle (try #4)
  [CIFS] Cleanup use of CONFIG_CIFS_STATS2 ifdef to make transport routines more readable
2011-08-15 08:36:30 -07:00
Linus Torvalds
e68ff9cd15 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: replace xfs_buf_geterror() with bp->b_error
  xfs: Check the return value of xfs_buf_read() for NULL
  "xfs: fix error handling for synchronous writes" revisited
  xfs: set cursor in xfs_ail_splice() even when AIL was empty
  xfs: Remove the macro XFS_BUFTARG_NAME
  xfs: Remove the macro XFS_BUF_TARGET
  xfs: Remove the macro XFS_BUF_SET_TARGET
  Replace the macro XFS_BUF_ISPINNED with helper xfs_buf_ispinned
  xfs: Remove the macro XFS_BUF_SET_PTR
  xfs: Remove the macro XFS_BUF_PTR
  xfs: Remove macro XFS_BUF_SET_START
  xfs: Remove macro XFS_BUF_HOLD
  xfs: Remove macro XFS_BUF_BUSY and family
  xfs: Remove the macro XFS_BUF_ERROR and family
  xfs: Remove the macro XFS_BUF_BFLAGS
2011-08-12 20:43:01 -07:00
Chandra Seetharaman
e570280521 xfs: replace xfs_buf_geterror() with bp->b_error
Since we just checked bp for NULL, it is ok to replace
xfs_buf_geterror() with bp->b_error in these places.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-08-12 13:39:40 -05:00
Chandra Seetharaman
ac4d6888b2 xfs: Check the return value of xfs_buf_read() for NULL
Check the return value of xfs_buf_read() for NULL and return ENOMEM
if it is NULL.  This is necessary in a few spots to avoid subsequent
code blindly dereferencing the null buffer pointer.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-08-12 13:39:29 -05:00
Linus Torvalds
ce8a84ef1e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
  e1000e: increase driver version number
  e1000e: alternate MAC address update
  e1000e: do not disable receiver on 82574/82583
  e1000e: alternate MAC address does not work on device id 0x1060
  PCnet: Fix section mismatch
  bnx2x: disable dcb on 578xx since not supported yet
  bnx2x: properly clean indirect addresses
  bnx2x: prevent race between undi_unload and load flows
  bnx2x: fix select_queue when FCoE is disabled
  bnx2x: init FCOE FP only once
  ipv4: some rt_iif -> rt_route_iif conversions
  net/bridge/netfilter/ebtables.c: use available error handling code
  net/netlabel/netlabel_kapi.c: add missing cleanup code
  net/irda: sh_sir: tidyup compile warning
  net/irda: sh_sir: add missing header
  net/irda: sh_irda: add missing header
  slcan: ldisc generated skbs are received in softirq context
  scm: Capture the full credentials of the scm sender
  tcp: initialize variable ecn_ok in syncookies path
  drivers/net/wireless/wl1251: add missing kfree
  ...
2011-08-12 06:43:53 -07:00
Boaz Harrosh
8cf1fb2163 pnfs: Automatically select blocks & objects layouts
Just like files-layout, blocks & objects layouts are part of the
NFS 4.1 protocol and should be automatically selected if NFS_4_1
is selected. The small problem is that these depend on other
Kernel support being present, while files only depends on NFS
itself.

This patch removes from the user choice the presence of objects
and blocks layout. But makes sure these are selected only if
the depended subsystems are present in the Kernel.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Acked-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-11 17:51:27 -07:00
Eric Sandeen
8c20871998 ext4: Properly count journal credits for long symlinks
Commit df5e622340 ("ext4: fix deadlock in ext4_symlink() in ENOSPC
conditions") recalculated the number of credits needed for a long
symlink, in the process of splitting it into two transactions.  However,
the first credit calculation under-counted because if selinux is
enabled, credits are needed to create the selinux xattr as well.

Overrunning the reservation will result in an OOPS in
jbd2_journal_dirty_metadata() due to this assert:

  J_ASSERT_JH(jh, handle->h_buffer_credits > 0);

Fix this by increasing the reservation size.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-11 17:23:40 -07:00
Eric Sandeen
d2db60df1e ext3: Properly count journal credits for long symlinks
Commit ae54870a1d ("ext3: Fix lock inversion in ext3_symlink()")
recalculated the number of credits needed for a long symlink, in the
process of splitting it into two transactions.  However, the first
credit calculation under-counted because if selinux is enabled, credits
are needed to create the selinux xattr as well.

Overrunning the reservation will result in an OOPS in
journal_dirty_metadata() due to this assert:

  J_ASSERT_JH(jh, handle->h_buffer_credits > 0);

Fix this by increasing the reservation size.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-11 17:23:40 -07:00
Vasiliy Kulikov
72fa59970f move RLIMIT_NPROC check from set_user() to do_execve_common()
The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC
check in set_user() to check for NPROC exceeding via setuid() and
similar functions.

Before the check there was a possibility to greatly exceed the allowed
number of processes by an unprivileged user if the program relied on
rlimit only.  But the check created new security threat: many poorly
written programs simply don't check setuid() return code and believe it
cannot fail if executed with root privileges.  So, the check is removed
in this patch because of too often privilege escalations related to
buggy programs.

The NPROC can still be enforced in the common code flow of daemons
spawning user processes.  Most of daemons do fork()+setuid()+execve().
The check introduced in execve() (1) enforces the same limit as in
setuid() and (2) doesn't create similar security issues.

Neil Brown suggested to track what specific process has exceeded the
limit by setting PF_NPROC_EXCEEDED process flag.  With the change only
this process would fail on execve(), and other processes' execve()
behaviour is not changed.

Solar Designer suggested to re-check whether NPROC limit is still
exceeded at the moment of execve().  If the process was sleeping for
days between set*uid() and execve(), and the NPROC counter step down
under the limit, the defered execve() failure because NPROC limit was
exceeded days ago would be unexpected.  If the limit is not exceeded
anymore, we clear the flag on successful calls to execve() and fork().

The flag is also cleared on successful calls to set_user() as the limit
was exceeded for the previous user, not the current one.

Similar check was introduced in -ow patches (without the process flag).

v3 - clear PF_NPROC_EXCEEDED on successful calls to set_user().

Reviewed-by: James Morris <jmorris@namei.org>
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-11 11:24:42 -07:00
Shirish Pargaonkar
e22906c564 cifs: Do not set cifs/ntfs acl using a file handle (try #4)
Set security descriptor using path name instead of a file handle.
We can't be sure that the file handle has adequate permission to
set a security descriptor (to modify DACL).

Function set_cifs_acl_by_fid() has been removed since we can't be
sure how a file was opened for writing, a valid request can fail
if the file was not opened with two above mentioned permissions.
We could have opted to add on WRITE_DAC and WRITE_OWNER permissions
to file opens and then use that file handle but adding addtional
permissions such as WRITE_DAC and WRITE_OWNER could cause an
any open to fail.

And it was incorrect to look for read file handle to set a
security descriptor anyway.

Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-08-11 18:23:45 +00:00
Steve French
789e666123 [CIFS] Cleanup use of CONFIG_CIFS_STATS2 ifdef to make transport routines more readable
Christoph had requested that the stats related code (in
CONFIG_CIFS_STATS2) be moved into helpers to make code flow more
readable.   This patch should help.   For example the following
section from transport.c

                       spin_unlock(&GlobalMid_Lock);
                       atomic_inc(&ses->server->num_waiters);
                       wait_event(ses->server->request_q,
                                  atomic_read(&ses->server->inFlight)
                                    < cifs_max_pending);
                       atomic_dec(&ses->server->num_waiters);
                       spin_lock(&GlobalMid_Lock);

becomes simpler (with the patch below):
                       spin_unlock(&GlobalMid_Lock);
                       cifs_num_waiters_inc(server);
                       wait_event(server->request_q,
                                  atomic_read(&server->inFlight)
                                    < cifs_max_pending);
                       cifs_num_waiters_dec(server);
                       spin_lock(&GlobalMid_Lock);

Reviewed-by: Jeff Layton <jlayton@redhat.com>
CC: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
2011-08-11 18:23:45 +00:00
Peng Tao
54a33b190a NFS41: make PNFS_BLOCK selectable
PNFS_BLOCK needs BLK_DEV_DM/MD, which is not a dependency for other
pnfs layout drivers. Seperate it out so others can still build when
BLK_DEV_DM/MD is not enabled.

Also change select to depends on to avoid build failures.

Reported-and-tested-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Acked-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-11 08:58:02 -07:00
Ajeet Yadav
9e978d8f7d "xfs: fix error handling for synchronous writes" revisited
xfs: fix for hang during synchronous buffer write error

If removed storage while synchronous buffer write underway,
"xfslogd" hangs.

Detailed log http://oss.sgi.com/archives/xfs/2011-07/msg00740.html

Related work bfc60177f8
"xfs: fix error handling for synchronous writes"

Given that xfs_bwrite actually does the shutdown already after
waiting for the b_iodone completion and given that we actually
found that calling xfs_force_shutdown from inside
xfs_buf_iodone_callbacks was a major contributor the problem
it better to drop this call.

Signed-off-by: Ajeet Yadav <ajeet.yadav.77@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-08-10 17:00:21 -05:00
Linus Torvalds
d55140ce3a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
  Ecryptfs: Add mount option to check uid of device being mounted = expect uid
  eCryptfs: Fix payload_len unitialized variable warning
  eCryptfs: fix compile error
  eCryptfs: Return error when lower file pointer is NULL
2011-08-10 11:08:06 -07:00
John Johansen
764355487e Ecryptfs: Add mount option to check uid of device being mounted = expect uid
Close a TOCTOU race for mounts done via ecryptfs-mount-private.  The mount
source (device) can be raced when the ownership test is done in userspace.
Provide Ecryptfs a means to force the uid check at mount time.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Cc: <stable@kernel.org>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-08-09 23:29:01 -05:00
Alex Elder
e44f4112a4 xfs: set cursor in xfs_ail_splice() even when AIL was empty
In xfs_ail_splice(), if a cursor is provided it is updated to
point to the last item on the list being spliced into the AIL.
But if the AIL was found to be empty, the cursor (if provided)
is just initialized instead.

There is no reason the empty AIL case needs to be treated any
differently.  And treating it the same way allows this code
to be rearranged a bit, with a somewhat tidier result.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-08-09 15:30:43 -05:00
Tyler Hicks
99b373ff2d eCryptfs: Fix payload_len unitialized variable warning
fs/ecryptfs/keystore.c: In function ‘ecryptfs_generate_key_packet_set’:
fs/ecryptfs/keystore.c:1991:28: warning: ‘payload_len’ may be used uninitialized in this function [-Wuninitialized]
fs/ecryptfs/keystore.c:1976:9: note: ‘payload_len’ was declared here

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-08-09 13:42:46 -05:00
Roberto Sassu
4b6fee17b1 eCryptfs: fix compile error
This patch fixes the compile error reported at the address:

https://bugzilla.kernel.org/show_bug.cgi?id=40292

The problem arises when compiling eCryptfs as built-in and the 'encrypted'
key type as a module. The patch prevents this combination from being set in
the kernel configuration, by fixing the eCryptfs dependencies.

Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Reported-by: David Hill <hilld@binarystorm.net>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-08-09 13:42:46 -05:00
Tyler Hicks
f61500e000 eCryptfs: Return error when lower file pointer is NULL
When an eCryptfs inode's lower file has been closed, and the pointer has
been set to NULL, return an error when trying to do a lower read or
write rather than calling BUG().

https://bugzilla.kernel.org/show_bug.cgi?id=37292

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
2011-08-09 13:42:45 -05:00
Linus Torvalds
2f84dd7091 autofs4: fix debug printk warning uncovered by cleanup
The previous comit made the autofs4 debug printouts check types against
the printout format, and uncovered this bug:

  fs/autofs4/waitq.c:106:2: warning: format ‘%08lx’ expects type ‘long unsigned int’, but argument 4 has type ‘autofs_wqt_t’

which is due to the insane type for wait_queue_token.  That thing should
be some fixed well-defined size (preferably just 'unsigned int' or
'u32') but for unexplained reasons it is randomly either 'unsigned long'
or 'unsigned int' depending on the architecture.

For now, cast it to 'unsigned long' for printing, the way we do
elsewhere.  Somebody else can try to explain the typedef mess.

(There's a reason we don't support excessive use of typedefs in the
kernel: it's usually just a good way of confusing yourself).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-08 12:02:43 -07:00
Linus Torvalds
c3ad996246 autofs4: clean up uaotfs use of debug/info/warning printouts
Use 'pr_debug()' for DPRINTK, which will do the proper type checking on
the arguments (without generating code) even when DEBUG isn't #defined.

Also, use the standard __VA_ARGS__ for the macros, and stop the
pointless abuse of 'do { xyz } while (0)' when the macro is already a
perfectly well-formed single statement.

Reported-by: David Howells <dhowells@redhat.com>
Suggested-by: Joe Perches <joe@perches.com>
Cc: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-08 11:35:17 -07:00
Alex Elder
2ddb4e9406 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux 2011-08-08 07:06:24 -05:00
Florian Westphal
8bab6f1408 compat_ioctl: add compat handler for PPPIOCGL2TPSTATS
fixes following error seen on x86_64 kernel:
ioctl32(openl2tpd:7480): Unknown cmd fd(14) cmd(80487436){t:'t';sz:72} arg(ffa7e6c0) on socket:[105094]

The argument (struct pppol2tp_ioc_stats) uses "aligned_u64" and thus doesn't need
fixups.

Cc: James Chapman <jchapman@katalix.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-07 22:24:41 -07:00
Linus Torvalds
7813b94a54 vfs: rename 'do_follow_link' to 'should_follow_link'
Al points out that the do_follow_link() helper function really is
misnamed - it's about whether we should try to follow a symlink or not,
not about actually doing the following.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-07 13:42:25 -07:00
Ari Savolainen
206b1d09a5 Fix POSIX ACL permission check
After commit 3567866bf2: "RCUify freeing acls, let check_acl() go ahead in
RCU mode if acl is cached" posix_acl_permission is being called with an
unsupported flag and the permission check fails. This patch fixes the issue.

Signed-off-by: Ari Savolainen <ari.m.savolainen@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-08-07 04:52:23 -04:00
Linus Torvalds
c2f340a69c Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  ore: Make ore its own module
  exofs: Rename raid engine from exofs/ios.c => ore
  exofs: ios: Move to a per inode components & device-table
  exofs: Move exofs specific osd operations out of ios.c
  exofs: Add offset/length to exofs_get_io_state
  exofs: Fix truncate for the raid-groups case
  exofs: Small cleanup of exofs_fill_super
  exofs: BUG: Avoid sbi realloc
  exofs: Remove pnfs-osd private definitions
  nfs_xdr: Move nfs4_string definition out of #ifdef CONFIG_NFS_V4
2011-08-06 22:56:03 -07:00
Linus Torvalds
3ddcd0569c vfs: optimize inode cache access patterns
The inode structure layout is largely random, and some of the vfs paths
really do care.  The path lookup in particular is already quite D$
intensive, and profiles show that accessing the 'inode->i_op->xyz'
fields is quite costly.

We already optimized the dcache to not unnecessarily load the d_op
structure for members that are often NULL using the DCACHE_OP_xyz bits
in dentry->d_flags, and this does something very similar for the inode
ops that are used during pathname lookup.

It also re-orders the fields so that the fields accessed by 'stat' are
together at the beginning of the inode structure, and roughly in the
order accessed.

The effect of this seems to be in the 1-2% range for an empty kernel
"make -j" run (which is fairly kernel-intensive, mostly in filename
lookup), so it's visible.  The numbers are fairly noisy, though, and
likely depend a lot on exact microarchitecture.  So there's more tuning
to be done.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-06 22:53:23 -07:00
Linus Torvalds
830c0f0edc vfs: renumber DCACHE_xyz flags, remove some stale ones
Gcc tends to generate better code with small integers, including the
DCACHE_xyz flag tests - so move the common ones to be first in the list.
Also just remove the unused DCACHE_INOTIFY_PARENT_WATCHED and
DCACHE_AUTOFS_PENDING values, their users no longer exists in the source
tree.

And add a "unlikely()" to the DCACHE_OP_COMPARE test, since we want the
common case to be a nice straight-line fall-through.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-06 22:52:40 -07:00
Boaz Harrosh
cf283ade08 ore: Make ore its own module
Export everything from ore need exporting. Change Kbuild and Kconfig
to build ore.ko as an independent module. Import ore from exofs

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-06 19:36:19 -07:00
Boaz Harrosh
8ff660ab85 exofs: Rename raid engine from exofs/ios.c => ore
ORE stands for "Objects Raid Engine"

This patch is a mechanical rename of everything that was in ios.c
and its API declaration to an ore.c and an osd_ore.h header. The ore
engine will later be used by the pnfs objects layout driver.

* File ios.c => ore.c

* Declaration of types and API are moved from exofs.h to a new
  osd_ore.h

* All used types are prefixed by ore_ from their exofs_ name.

* Shift includes from exofs.h to osd_ore.h so osd_ore.h is
  independent, include it from exofs.h.

Other than a pure rename there are no other changes. Next patch
will move the ore into it's own module and will export the API
to be used by exofs and later the layout driver

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-06 19:36:18 -07:00
Boaz Harrosh
9e9db45649 exofs: ios: Move to a per inode components & device-table
Exofs raid engine was saving on memory space by having a single layout-info,
single pid, and a single device-table, global to the filesystem. Then passing
a credential and object_id info at the io_state level, private for each
inode. It would also devise this contraption of rotating the device table
view for each inode->ino to spread out the device usage.

This is not compatible with the pnfs-objects standard, demanding that
each inode can have it's own layout-info, device-table, and each object
component it's own pid, oid and creds.

So: Bring exofs raid engine to be usable for generic pnfs-objects use by:

* Define an exofs_comp structure that holds obj_id and credential info.

* Break up exofs_layout struct to an exofs_components structure that holds a
  possible array of exofs_comp and the array of devices + the size of the
  arrays.

* Add a "comps" parameter to get_io_state() that specifies the ids creds
  and device array to use for each IO.

  This enables to keep the layout global, but the device-table view, creds
  and IDs at the inode level. It only adds two 64bit to each inode, since
  some of these members already existed in another form.

* ios raid engine now access layout-info and comps-info through the passed
  pointers. Everything is pre-prepared by caller for generic access of
  these structures and arrays.

At the exofs Level:

* Super block holds an exofs_components struct that holds the device
  array, previously in layout. The devices there are in device-table
  order. The device-array is twice bigger and repeats the device-table
  twice so now each inode's device array can point to a random device
  and have a round-robin view of the table, making it compatible to
  previous exofs versions.

* Each inode has an exofs_components struct that is initialized at
  load time, with it's own view of the device table IDs and creds.
  When doing IO this gets passed to the io_state together with the
  layout.

While preforming this change. Bugs where found where credentials with the
wrong IDs where used to access the different SB objects (super.c). As well
as some dead code. It was never noticed because the target we use does not
check the credentials.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-06 19:35:32 -07:00
Boaz Harrosh
85e44df474 exofs: Move exofs specific osd operations out of ios.c
ios.c will be moving to an external library, for use by the
objects-layout-driver. Remove from it some exofs specific functions.

Also g_attr_logical_length is used both by inode.c and ios.c
move definition to the later, to keep it independent

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-06 19:35:31 -07:00
Boaz Harrosh
e1042ba099 exofs: Add offset/length to exofs_get_io_state
In future raid code we will need to know the IO offset/length
and if it's a read or write to determine some of the array
sizes we'll need.

So add a new exofs_get_rw_state() API for use when
writeing/reading. All other simple cases are left using the
old way.

The major change to this is that now we need to call
exofs_get_io_state later at inode.c::read_exec and
inode.c::write_exec when we actually know these things. So this
patch is kept separate so I can test things apart from other
changes.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-06 19:35:31 -07:00
Linus Torvalds
1957e7fdef Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: cope with negative dentries in cifs_get_root
  cifs: convert prefixpath delimiters in cifs_build_path_to_root
  CIFS: Fix missing a decrement of inFlight value
  cifs: demote DFS referral lookup errors to cFYI
  Revert "cifs: advertise the right receive buffer size to the server"
2011-08-06 13:54:36 -07:00
Linus Torvalds
1117f72ea0 vfs: show O_CLOEXE bit properly in /proc/<pid>/fdinfo/<fd> files
The CLOEXE bit is magical, and for performance (and semantic) reasons we
don't actually maintain it in the file descriptor itself, but in a
separate bit array.  Which means that when we show f_flags, the CLOEXE
status is shown incorrectly: we show the status not as it is now, but as
it was when the file was opened.

Fix that by looking up the bit properly in the 'fdt->close_on_exec' bit
array.

Uli needs this in order to re-implement the pfiles program:

  "For normal file descriptors (not sockets) this was the last piece of
   information which wasn't available.  This is all part of my 'give
   Solaris users no reason to not switch' effort.  I intend to offer the
   code to the util-linux-ng maintainers."

Requested-by: Ulrich Drepper <drepper@akkadia.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-06 11:51:33 -07:00
Linus Torvalds
c21427043d oom_ajd: don't use WARN_ONCE, just use printk_once
WARN_ONCE() is very annoying, in that it shows the stack trace that we
don't care about at all, and also triggers various user-level "kernel
oopsed" logic that we really don't care about.  And it's not like the
user can do anything about the applications (sshd) in question, it's a
distro issue.

Requested-by: Andi Kleen <andi@firstfloor.org> (and many others)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-06 11:43:08 -07:00
Jeff Layton
80975d21aa cifs: cope with negative dentries in cifs_get_root
The loop around lookup_one_len doesn't handle the case where it might
return a negative dentry, which can cause an oops on the next pass
through the loop. Check for that and break out of the loop with an
error of -ENOENT if there is one.

Fixes the panic reported here:

    https://bugzilla.redhat.com/show_bug.cgi?id=727927

Reported-by: TR Bentley <home@trarbentley.net>
Reported-by: Iain Arnell <iarnell@gmail.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-08-05 15:03:09 +00:00
Jeff Layton
f9e8c45002 cifs: convert prefixpath delimiters in cifs_build_path_to_root
Regression from 2.6.39...

The delimiters in the prefixpath are not being converted based on
whether posix paths are in effect. Fixes:

    https://bugzilla.redhat.com/show_bug.cgi?id=727834

Reported-and-Tested-by: Iain Arnell <iarnell@gmail.com>
Reported-by: Patrick Oltmann <patrick.oltmann@gmx.net>
Cc: Pavel Shilovsky <piastryyy@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-08-05 14:55:15 +00:00
Linus Torvalds
24f0eed266 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  RCUify freeing acls, let check_acl() go ahead in RCU mode if acl is cached
  get rid of boilerplate switches in posix_acl.h
  fix block device fallout from ->fsync() changes
2011-08-04 16:44:40 -10:00
Boaz Harrosh
16f75bb35d exofs: Fix truncate for the raid-groups case
In the general raid-group case the truncate was wrong in that
it did not also fix the object length of the neighboring groups.

There are two bad cases in the old code:
1. Space that should be freed was not.
2. If a file That was big is truncated small, then made bigger
   again, the holes would not contain zeros but could expose old data.
   (If the growing of the file expands to more than a full
    groups cycle + group size (> S + T))

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-04 12:35:25 -07:00
Boaz Harrosh
9ce730475e exofs: Small cleanup of exofs_fill_super
Small cleanup that unifies duplicated code used in both the
error and success cases

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-04 12:35:23 -07:00
Boaz Harrosh
6d4073e881 exofs: BUG: Avoid sbi realloc
Since the beginning we realloced the sbi structure when a bigger
then one device table was specified. (I know that was really stupid).

Then much later when "register bdi" was added (By Jens) it was
registering the pointer to sbi->bdi before the realloc.

We never saw this problem because up till now the realloc did not
do anything since the device table was small enough to fit in the
original allocation. But once we starting testing with large device
tables (Bigger then 28) we noticed the crash of writeback operating
on a deallocated pointer.

* Avoid the all mess by allocating the device-table as a second array
  and get rid of the variable-sized structure and the rest of this
  mess.
* Take the chance to clean near by structures and comments.
* Add a needed dprint on startup to indicate the loaded layout.
* Also move the bdi registration to the very end because it will
  only fail in a low memory, which will probably fail before hand.
  There are many more likely causes to not load before that. This
  way the error handling is made simpler. (Just doing this would be
  enough to fix the BUG)

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-04 12:35:20 -07:00
Boaz Harrosh
26ae93c2dc exofs: Remove pnfs-osd private definitions
Now that pnfs-osd has hit mainline we can remove exofs's
private header. (And the FIXME comment)

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-04 12:35:18 -07:00
Linus Torvalds
14595f708e Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: use kzalloc in ext4_kzalloc()
2011-08-03 15:09:10 -10:00