Commit Graph

43518 Commits

Author SHA1 Message Date
Linus Torvalds
12f1d7e493 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "Various small CIFS/SMB3 fixes for stable:

  Fixes address oops that can occur when accessing Macs with SMB3, and
  another problem found to Samba when read responses queued (e.g. with
  gluster under Samba)"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Fix duplicate line introduced by clone_file_range patch
  Fix cifs_uniqueid_to_ino_t() function for s390x
  CIFS: Fix SMB2+ interim response processing for read requests
  cifs: fix out-of-bounds access in lease parsing
2016-03-02 09:15:21 -08:00
Linus Torvalds
39680f50ae userfaultfd: don't block on the last VM updates at exit time
The exit path will do some final updates to the VM of an exiting process
to inform others of the fact that the process is going away.

That happens, for example, for robust futex state cleanup, but also if
the parent has asked for a TID update when the process exits (we clear
the child tid field in user space).

However, at the time we do those final VM accesses, we've already
stopped accepting signals, so the usual "stop waiting for userfaults on
signal" code in fs/userfaultfd.c no longer works, and the process can
become an unkillable zombie waiting for something that will never
happen.

To solve this, just make handle_userfault() abort any user fault
handling if we're already in the exit path past the signal handling
state being dead (marked by PF_EXITING).

This VM special case is pretty ugly, and it is possible that we should
look at finalizing signals later (or move the VM final accesses
earlier).  But in the meantime this is a fairly minimally intrusive fix.

Reported-and-tested-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-02 09:03:18 -08:00
Linus Torvalds
f691b77b1f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull d_inode/d_flags race fix from Al Viro.

I love this fix.  Not only does it fix the race in the dentry type
handling, it entirely gets rid of the nasty and subtle memory ordering
rules for d_type and d_inode, and replaces them with the basic dentry
locking rules (sequence numbers under RCU, d_lock elsewhere).

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  use ->d_seq to get coherency between ->d_inode and ->d_flags
2016-03-01 15:30:45 -08:00
Steve French
9589995e46 CIFS: Fix duplicate line introduced by clone_file_range patch
Commit 04b38d6012 ("vfs: pull btrfs clone API to vfs layer")
added a duplicated line (in cifsfs.c) which causes a sparse compile
warning.

Signed-off-by: Steve French <steve.french@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-03-01 09:38:00 -06:00
Al Viro
a528aca7f3 use ->d_seq to get coherency between ->d_inode and ->d_flags
Games with ordering and barriers are way too brittle.  Just
bump ->d_seq before and after updating ->d_inode and ->d_flags
type bits, so that verifying ->d_seq would guarantee they are
coherent.

Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-29 12:16:43 -05:00
Yadan Fan
1ee9f4bd1a Fix cifs_uniqueid_to_ino_t() function for s390x
This issue is caused by commit 02323db17e ("cifs: fix
cifs_uniqueid_to_ino_t not to ever return 0"), when BITS_PER_LONG
is 64 on s390x, the corresponding cifs_uniqueid_to_ino_t()
function will cast 64-bit fileid to 32-bit by using (ino_t)fileid,
because ino_t (typdefed __kernel_ino_t) is int type.

It's defined in arch/s390/include/uapi/asm/posix_types.h

    #ifndef __s390x__

    typedef unsigned long   __kernel_ino_t;
    ...
    #else /* __s390x__ */

    typedef unsigned int    __kernel_ino_t;

So the #ifdef condition is wrong for s390x, we can just still use
one cifs_uniqueid_to_ino_t() function with comparing sizeof(ino_t)
and sizeof(u64) to choose the correct execution accordingly.

Signed-off-by: Yadan Fan <ydfan@suse.com>
CC: stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-02-29 00:46:55 -06:00
Pavel Shilovsky
6cc3b24235 CIFS: Fix SMB2+ interim response processing for read requests
For interim responses we only need to parse a header and update
a number credits. Now it is done for all SMB2+ command except
SMB2_READ which is wrong. Fix this by adding such processing.

Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Tested-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-02-29 00:21:36 -06:00
Justin Maggard
deb7deff2f cifs: fix out-of-bounds access in lease parsing
When opening a file, SMB2_open() attempts to parse the lease state from the
SMB2 CREATE Response.  However, the parsing code was not careful to ensure
that the create contexts are not empty or invalid, which can lead to out-
of-bounds memory access.  This can be seen easily by trying
to read a file from a OSX 10.11 SMB3 server.  Here is sample crash output:

BUG: unable to handle kernel paging request at ffff8800a1a77cc6
IP: [<ffffffff8828a734>] SMB2_open+0x804/0x960
PGD 8f77067 PUD 0
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 3 PID: 2876 Comm: cp Not tainted 4.5.0-rc3.x86_64.1+ #14
Hardware name: NETGEAR ReadyNAS 314          /ReadyNAS 314          , BIOS 4.6.5 10/11/2012
task: ffff880073cdc080 ti: ffff88005b31c000 task.ti: ffff88005b31c000
RIP: 0010:[<ffffffff8828a734>]  [<ffffffff8828a734>] SMB2_open+0x804/0x960
RSP: 0018:ffff88005b31fa08  EFLAGS: 00010282
RAX: 0000000000000015 RBX: 0000000000000000 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff88007eb8c8b0
RBP: ffff88005b31fad8 R08: 666666203d206363 R09: 6131613030383866
R10: 3030383866666666 R11: 00000000000002b0 R12: ffff8800660fd800
R13: ffff8800a1a77cc2 R14: 00000000424d53fe R15: ffff88005f5a28c0
FS:  00007f7c8a2897c0(0000) GS:ffff88007eb80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff8800a1a77cc6 CR3: 000000005b281000 CR4: 00000000000006e0
Stack:
 ffff88005b31fa70 ffffffff88278789 00000000000001d3 ffff88005f5a2a80
 ffffffff00000003 ffff88005d029d00 ffff88006fde05a0 0000000000000000
 ffff88005b31fc78 ffff88006fde0780 ffff88005b31fb2f 0000000100000fe0
Call Trace:
 [<ffffffff88278789>] ? cifsConvertToUTF16+0x159/0x2d0
 [<ffffffff8828cf68>] smb2_open_file+0x98/0x210
 [<ffffffff8811e80c>] ? __kmalloc+0x1c/0xe0
 [<ffffffff882685f4>] cifs_open+0x2a4/0x720
 [<ffffffff88122cef>] do_dentry_open+0x1ff/0x310
 [<ffffffff88268350>] ? cifsFileInfo_get+0x30/0x30
 [<ffffffff88123d92>] vfs_open+0x52/0x60
 [<ffffffff88131dd0>] path_openat+0x170/0xf70
 [<ffffffff88097d48>] ? remove_wait_queue+0x48/0x50
 [<ffffffff88133a29>] do_filp_open+0x79/0xd0
 [<ffffffff8813f2ca>] ? __alloc_fd+0x3a/0x170
 [<ffffffff881240c4>] do_sys_open+0x114/0x1e0
 [<ffffffff881241a9>] SyS_open+0x19/0x20
 [<ffffffff8896e257>] entry_SYSCALL_64_fastpath+0x12/0x6a
Code: 4d 8d 6c 07 04 31 c0 4c 89 ee e8 47 6f e5 ff 31 c9 41 89 ce 44 89 f1 48 c7 c7 28 b1 bd 88 31 c0 49 01 cd 4c 89 ee e8 2b 6f e5 ff <45> 0f b7 75 04 48 c7 c7 31 b1 bd 88 31 c0 4d 01 ee 4c 89 f6 e8
RIP  [<ffffffff8828a734>] SMB2_open+0x804/0x960
 RSP <ffff88005b31fa08>
CR2: ffff8800a1a77cc6
---[ end trace d9f69ba64feee469 ]---

Signed-off-by: Justin Maggard <jmaggard@netgear.com>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
2016-02-29 00:21:31 -06:00
Linus Torvalds
12b9fa6a97 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  do_last(): ELOOP failure exit should be done after leaving RCU mode
  should_follow_link(): validate ->d_seq after having decided to follow
  namei: ->d_inode of a pinned dentry is stable only for positives
  do_last(): don't let a bogus return value from ->open() et.al. to confuse us
  fs: return -EOPNOTSUPP if clone is not supported
  hpfs: don't truncate the file when delete fails
2016-02-27 17:10:32 -08:00
Al Viro
5129fa482b do_last(): ELOOP failure exit should be done after leaving RCU mode
... or we risk seeing a bogus value of d_is_symlink() there.

Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27 19:37:37 -05:00
Al Viro
a7f775428b should_follow_link(): validate ->d_seq after having decided to follow
... otherwise d_is_symlink() above might have nothing to do with
the inode value we've got.

Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27 19:31:01 -05:00
Al Viro
d4565649b6 namei: ->d_inode of a pinned dentry is stable only for positives
both do_last() and walk_component() risk picking a NULL inode out
of dentry about to become positive, *then* checking its flags and
seeing that it's not negative anymore and using (already stale by
then) value they'd fetched earlier.  Usually ends up oopsing soon
after that...

Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27 19:23:16 -05:00
Al Viro
c80567c82a do_last(): don't let a bogus return value from ->open() et.al. to confuse us
... into returning a positive to path_openat(), which would interpret that
as "symlink had been encountered" and proceed to corrupt memory, etc.
It can only happen due to a bug in some ->open() instance or in some LSM
hook, etc., so we report any such event *and* make sure it doesn't trick
us into further unpleasantness.

Cc: stable@vger.kernel.org # v3.6+, at least
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27 19:17:33 -05:00
Christoph Hellwig
0fcbf996d8 fs: return -EOPNOTSUPP if clone is not supported
-EBADF is a rather confusing error if an operations is not supported,
and nfsd gets rather upset about it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27 19:15:51 -05:00
Mikulas Patocka
b6853f78e7 hpfs: don't truncate the file when delete fails
The delete opration can allocate additional space on the HPFS filesystem
due to btree split. The HPFS driver checks in advance if there is
available space, so that it won't corrupt the btree if we run out of space
during splitting.

If there is not enough available space, the HPFS driver attempted to
truncate the file, but this results in a deadlock since the commit
7dd29d8d86 ("HPFS: Introduce a global mutex
and lock it on every callback from VFS").

This patch removes the code that tries to truncate the file and -ENOSPC is
returned instead. If the user hits -ENOSPC on delete, he should try to
delete other files (that are stored in a leaf btree node), so that the
delete operation will make some space for deleting the file stored in
non-leaf btree node.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: stable@vger.kernel.org	# 2.6.39+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27 19:15:51 -05:00
Linus Torvalds
691429e13d Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "10 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  dax: move writeback calls into the filesystems
  dax: give DAX clearing code correct bdev
  ext4: online defrag not supported with DAX
  ext2, ext4: only set S_DAX for regular inodes
  block: disable block device DAX by default
  ocfs2: unlock inode if deleting inode from orphan fails
  mm: ASLR: use get_random_long()
  drivers: char: random: add get_random_long()
  mm: numa: quickly fail allocations for NUMA balancing on full nodes
  mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
2016-02-27 12:46:16 -08:00
Linus Torvalds
1c271479b5 This fixes a file system corruption bug with DAX
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJW0fNWAAoJEPL5WVaVDYGjbroIAJKZBmlMh5DADLWIUNqeo6y+
 U8O0hpPUyoYb/j0wTVBe5z4cyfWjl1idrA4ZIb2VgMB28F8pPxuLifTVMx0kLeO9
 B1rcqn7CTzwmU9nj6yjcBkYp/spR8lBzaHq2REm3lE9Jwf6NdD4uwhzPiNmL3+xR
 dcg7lFzS6PSmLYD3mhb/lD5/3D3sYDlZ4nmX7uEl5WgxYaB1j5zsBVzYDU2Q0jZZ
 s+r/kj1eL8i9EnZ4zgZ4Bvtjm0jy5iVhO2YvLNQUZDEgmvJpNbVSBv/wAWoe9N3U
 rnm65s8F5hRbc2c8w4M43074uuEA4p0zZwR2z1E6RZvFZsl4Z5kk0/YEmxF7N6g=
 =IALP
 -----END PGP SIGNATURE-----

Merge tag 'tags/ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext2/4 DAX fix from Ted Ts'o:
 "This fixes a file system corruption bug with DAX"

* tag 'tags/ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext2, ext4: fix issue with missing journal entry in ext4_dax_mkwrite()
2016-02-27 12:40:49 -08:00
Ross Zwisler
1e9d180ba3 ext2, ext4: fix issue with missing journal entry in ext4_dax_mkwrite()
As it is currently written ext4_dax_mkwrite() assumes that the call into
__dax_mkwrite() will not have to do a block allocation so it doesn't create
a journal entry.  For a read that creates a zero page to cover a hole
followed by a write that actually allocates storage this is incorrect.  The
ext4_dax_mkwrite() -> __dax_mkwrite() -> __dax_fault() path calls
get_blocks() to allocate storage.

Fix this by having the ->page_mkwrite fault handler call ext4_dax_fault()
as this function already has all the logic needed to allocate a journal
entry and call __dax_fault().

Also update the ext2 fault handlers in this same way to remove duplicate
code and keep the logic between ext2 and ext4 the same.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-27 14:01:16 -05:00
Ross Zwisler
7f6d5b529b dax: move writeback calls into the filesystems
Previously calls to dax_writeback_mapping_range() for all DAX filesystems
(ext2, ext4 & xfs) were centralized in filemap_write_and_wait_range().

dax_writeback_mapping_range() needs a struct block_device, and it used
to get that from inode->i_sb->s_bdev.  This is correct for normal inodes
mounted on ext2, ext4 and XFS filesystems, but is incorrect for DAX raw
block devices and for XFS real-time files.

Instead, call dax_writeback_mapping_range() directly from the filesystem
->writepages function so that it can supply us with a valid block
device.  This also fixes DAX code to properly flush caches in response
to sync(2).

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Ross Zwisler
20a90f5899 dax: give DAX clearing code correct bdev
dax_clear_blocks() needs a valid struct block_device and previously it
was using inode->i_sb->s_bdev in all cases.  This is correct for normal
inodes on mounted ext2, ext4 and XFS filesystems, but is incorrect for
DAX raw block devices and for XFS real-time devices.

Instead, rename dax_clear_blocks() to dax_clear_sectors(), and change
its arguments to take a bdev and a sector instead of an inode and a
block.  This better reflects what the function does, and it allows the
filesystem and raw block device code to pass in an appropriate struct
block_device.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Ross Zwisler
73f34a5e2c ext4: online defrag not supported with DAX
Online defrag operations for ext4 are hard coded to use the page cache.
See ext4_ioctl() -> ext4_move_extents() -> move_extent_per_page()

When combined with DAX I/O, which circumvents the page cache, this can
result in data corruption.  This was observed with xfstests ext4/307 and
ext4/308.

Fix this by only allowing online defrag for non-DAX files.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Ross Zwisler
0a6cf9137d ext2, ext4: only set S_DAX for regular inodes
When S_DAX is set on an inode we assume that if there are pages attached
to the mapping (mapping->nrpages != 0), those pages are clean zero pages
that were used to service reads from holes.  Any dirty data associated
with the inode should be in the form of DAX exceptional entries
(mapping->nrexceptional) that is written back via
dax_writeback_mapping_range().

With the current code, though, this isn't always true.  For example,
ext2 and ext4 directory inodes can have S_DAX set, but have their dirty
data stored as dirty page cache entries.  For these types of inodes,
having S_DAX set doesn't really make sense since their I/O doesn't
actually happen through the DAX code path.

Instead, only allow S_DAX to be set for regular inodes for ext2 and
ext4.  This allows us to have strict DAX vs non-DAX paths in the
writeback code.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Dan Williams
03cdadb040 block: disable block device DAX by default
The recent *sync enabling discovered that we are inserting into the
block_device pagecache counter to the expectations of the dirty data
tracking for dax mappings.  This can lead to data corruption.

We want to support DAX for block devices eventually, but it requires
wider changes to properly manage the pagecache.

   dump_stack+0x85/0xc2
   dax_writeback_mapping_range+0x60/0xe0
   blkdev_writepages+0x3f/0x50
   do_writepages+0x21/0x30
   __filemap_fdatawrite_range+0xc6/0x100
   filemap_write_and_wait+0x4a/0xa0
   set_blocksize+0x70/0xd0
   sb_set_blocksize+0x1d/0x50
   ext4_fill_super+0x75b/0x3360
   mount_bdev+0x180/0x1b0
   ext4_mount+0x15/0x20
   mount_fs+0x38/0x170

Mark the support broken so its disabled by default, but otherwise still
available for testing.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Guozhonghua
a4a8481ff6 ocfs2: unlock inode if deleting inode from orphan fails
When doing append direct io cleanup, if deleting inode fails, it goes
out without unlocking inode, which will cause the inode deadlock.

This issue was introduced by commit cf1776a9e8 ("ocfs2: fix a tiny
race when truncate dio orohaned entry").

Signed-off-by: Guozhonghua <guozhonghua@h3c.com>
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: <stable@vger.kernel.org>	[4.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Daniel Cashman
5ef11c35ce mm: ASLR: use get_random_long()
Replace calls to get_random_int() followed by a cast to (unsigned long)
with calls to get_random_long().  Also address shifting bug which, in
case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits.

Signed-off-by: Daniel Cashman <dcashman@android.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nick Kralevich <nnk@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Mike Marshall
9f08cfe944 Orangefs: update orangefs.txt
Al Viro has cleaned up the way ops are processed and waited for,
now orangefs.txt has an overview of how it works. Several recent
related commits have added to the comments in the code as well.

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-26 14:39:08 -05:00
Mike Marshall
ca9f518ead Orangefs: code sanitation.
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-26 10:21:12 -05:00
Arnd Bergmann
401898eed7 orangefs: remove unused 'diff' function
orangefs contains a helper function to calculate the difference
between two timeval structures. We are trying to remove all
instances of timespec from the kernel, and this one is not
used at all, so let's remove it now.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-26 10:18:43 -05:00
Arnd Bergmann
be81ce48b2 orangefs: avoid time conversion function
The new orangefs code uses a helper function to read a time field to
its private structures from struct iattr. This will conflict with the
move to 64-bit timestamps in the kernel and is generally not necessary.

This replaces the conversion with a simple cast to time64_t that shows
what is going on. As the orangefs-internal representation already uses
64-bit timestamps, there should be no ambiguity to negative values,
and the cast ensures that we treat them as times before 1970 on both
32-bit and 64-bit architectures, rather than times after 2038. This
patch keeps that behavior.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-26 10:18:39 -05:00
David Woodhouse
be629c62a6 Fix directory hardlinks from deleted directories
When a directory is deleted, we don't take too much care about killing off
all the dirents that belong to it — on the basis that on remount, the scan
will conclude that the directory is dead anyway.

This doesn't work though, when the deleted directory contained a child
directory which was moved *out*. In the early stages of the fs build
we can then end up with an apparent hard link, with the child directory
appearing both in its true location, and as a child of the original
directory which are this stage of the mount process we don't *yet* know
is defunct.

To resolve this, take out the early special-casing of the "directories
shall not have hard links" rule in jffs2_build_inode_pass1(), and let the
normal nlink processing happen for directories as well as other inodes.

Then later in the build process we can set ic->pino_nlink to the parent
inode#, as is required for directories during normal operaton, instead
of the nlink. And complain only *then* about hard links which are still
in evidence even after killing off all the unreachable paths.

Reported-by: Liu Song <liu.song11@zte.com.cn>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
2016-02-25 11:11:28 +00:00
David Woodhouse
49e91e7079 jffs2: Fix page lock / f->sem deadlock
With this fix, all code paths should now be obtaining the page lock before
f->sem.

Reported-by: Szabó Tamás <sztomi89@gmail.com>
Tested-by: Thomas Betker <thomas.betker@rohde-schwarz.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
2016-02-25 11:11:26 +00:00
Thomas Betker
157078f64b Revert "jffs2: Fix lock acquisition order bug in jffs2_write_begin"
This reverts commit 5ffd3412ae
("jffs2: Fix lock acquisition order bug in jffs2_write_begin").

The commit modified jffs2_write_begin() to remove a deadlock with
jffs2_garbage_collect_live(), but this introduced new deadlocks found
by multiple users. page_lock() actually has to be called before
mutex_lock(&c->alloc_sem) or mutex_lock(&f->sem) because
jffs2_write_end() and jffs2_readpage() are called with the page locked,
and they acquire c->alloc_sem and f->sem, resp.

In other words, the lock order in jffs2_write_begin() was correct, and
it is the jffs2_garbage_collect_live() path that has to be changed.

Revert the commit to get rid of the new deadlocks, and to clear the way
for a better fix of the original deadlock.

Reported-by: Deng Chao <deng.chao1@zte.com.cn>
Reported-by: Ming Liu <liu.ming50@gmail.com>
Reported-by: wangzaiwei <wangzaiwei@top-vision.cn>
Signed-off-by: Thomas Betker <thomas.betker@rohde-schwarz.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
2016-02-25 11:11:25 +00:00
Martin Brandenburg
69a23de2f3 orangefs: clean up fill_default_sys_attrs
Size and type are read-only and not in the mask. The times were left
unset despite being in the mask.

We zero-fill the times since the server will fill them in and we will
get the correct time when we fill the inode with getattr.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24 17:07:51 -05:00
Martin Brandenburg
6ceaf7818f orangefs: we never lookup with sym_follow set
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24 17:07:51 -05:00
Martin Brandenburg
9c2bcf288e orangefs: remove vestigial async io code
I have verified that there is nothing in the userspace daemon version we
are implementing this protocol against that ever looks at this field.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24 17:07:50 -05:00
Martin Brandenburg
47b4948fdb orangefs: use ORANGEFS_NAME_LEN everywhere; remove ORANGEFS_NAME_MAX
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24 17:07:50 -05:00
Martin Brandenburg
ee70fca0bc orangefs: don't d_drop in d_revalidate since the caller will
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24 17:07:50 -05:00
Martin Brandenburg
ee3b8d377c orangefs: free readdir buffer index before the dir_emit loop
We only need it while the service operation is actually in progress
since it is only used to co-ordinate the client-core's memory use. The
kernel allocates its own space.

Also clean up some comments which mislead the reader into thinking
the readdir buffers are shared memory.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24 17:07:50 -05:00
Linus Torvalds
aa263c43fe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Assorted fixes - xattr one from this cycle, the rest - stable fodder"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs/pnode.c: treat zero mnt_group_id-s as unequal
  affs_do_readpage_ofs(): just use kmap_atomic() around memcpy()
  xattr handlers: plug a lock leak in simple_xattr_list
  fs: allow no_seek_end_llseek to actually seek
2016-02-24 14:00:26 -08:00
Mike Marshall
adcf34a289 Orangefs: code sanitation
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24 16:54:27 -05:00
Mike Marshall
d37c0f307a Orangefs: clean up orangefs_kernel_op_s comments.
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-24 13:24:14 -05:00
Linus Torvalds
420eb6d7ef NFS client bugfixes for Linux 4.5
Stable bugfixes:
 - Fix nfs_size_to_loff_t
 - NFSv4: Fix a dentry leak on alias use
 
 Other bugfixes:
 - Don't schedule a layoutreturn if the layout segment can be freed immediately.
 - Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode
 - rpcrdma_bc_receive_call() should init rq_private_buf.len
 - fix stateid handling for the NFS v4.2 operations
 - pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page
 - fix panic in gss_pipe_downcall() in fips mode
 - Fix a race between layoutget and pnfs_destroy_layout
 - Fix a race between layoutget and bulk recalls
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWzKShAAoJEGcL54qWCgDyN0QQALiX8v2wvn07vE5ZeXB5uONq
 +mfx8avhEoc3NVrpG6F4Kj+yJmHeAbkgIygnhZn4tcM/2YRxGDwlVLHb++yUTHO9
 8zEi+tiKx9f5pK2PxRQ0PjavVxO/xOyO0/QNrUdnj8hSNR9ow+YOVjEYUulbuhIg
 VAI3oSy5qIKgtDyW7w5PuPpTXLo74hPmyqHaa+ZIr2et//nJMSsw++vAmSg3oqXq
 6QkLWPHt/8yvDRRn2hKkbD9gOrFCVfaZIGLM6Q0zRWAcGTzJi94ELzPdm8cVpD1o
 eXKcufgLXPt3GOeAmxZ9kwQeebR6IFcvkYom5dsPhtMBuzXu1wpanU8PGgYIQ0VA
 88b2YNl+TZpiVbRzxSEellZq5b+zapH/VVVnYptZiq9wUTACc7jK6W2heqe5PzaT
 iepTGCAE21tV5JewcITMQHDZiOjRNdtbBzgixI7pNfMN8whU6e5NHYj6psZqT7cf
 xEEZzL+RBJuCFKhXSPbBefccA4HCRkDEpT+2QgrMbS4KKfWOg36UNbJ2kgbvcRVi
 HTqoRONR6zMzYBhyMlLaUuJ1co8nSHgEsL81Q3MwWSY6gucSW7jeJ2stR20KJIo1
 7qgod9Ac/BAIozjzywi0LtmxouPyPU8cqaboMhSRVPDKfFlqZBNBkFLNWwgoYXMa
 r1afZQwNeRRbZUR3RulE
 =/WDS
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Stable bugfixes:
   - Fix nfs_size_to_loff_t
   - NFSv4: Fix a dentry leak on alias use

  Other bugfixes:
   - Don't schedule a layoutreturn if the layout segment can be freed
     immediately.
   - Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode
   - rpcrdma_bc_receive_call() should init rq_private_buf.len
   - fix stateid handling for the NFS v4.2 operations
   - pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page
   - fix panic in gss_pipe_downcall() in fips mode
   - Fix a race between layoutget and pnfs_destroy_layout
   - Fix a race between layoutget and bulk recalls"

* tag 'nfs-for-4.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4.x/pnfs: Fix a race between layoutget and bulk recalls
  NFSv4.x/pnfs: Fix a race between layoutget and pnfs_destroy_layout
  auth_gss: fix panic in gss_pipe_downcall() in fips mode
  pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page
  nfs4: fix stateid handling for the NFS v4.2 operations
  NFSv4: Fix a dentry leak on alias use
  xprtrdma: rpcrdma_bc_receive_call() should init rq_private_buf.len
  pNFS: Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode
  pNFS: Fix pnfs_mark_matching_lsegs_return()
  nfs: fix nfs_size_to_loff_t
2016-02-23 16:39:21 -08:00
Trond Myklebust
9fd4b9fc76 NFSv4.x/pnfs: Fix a race between layoutget and bulk recalls
Replace another case where the layout 'plh_block_lgets' can trigger
infinite loops in send_layoutget().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-22 17:46:34 -05:00
Trond Myklebust
2454dfea0a NFSv4.x/pnfs: Fix a race between layoutget and pnfs_destroy_layout
If the server reboots while there is a layoutget outstanding, then
the call to pnfs_choose_layoutget_stateid() will fail with an EAGAIN
error, which causes an infinite loop in send_layoutget(). The reason
why we never break out of the loop is that the layout 'plh_block_lgets'
field is never cleared.

Fix is to replace plh_block_lgets with NFS_LAYOUT_INVALID_STID, which
can be reset after a new layoutget.

Fixes: ab7d763e47 ("pNFS: Ensure nfs4_layoutget_prepare returns...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-22 17:34:59 -05:00
Linus Torvalds
0389075ecf Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "This is unusually large, partly due to the EFI fixes that prevent
  accidental deletion of EFI variables through efivarfs that may brick
  machines.  These fixes are somewhat involved to maintain compatibility
  with existing install methods and other usage modes, while trying to
  turn off the 'rm -rf' bricking vector.

  Other fixes are for large page ioremap()s and for non-temporal
  user-memcpy()s"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix vmalloc_fault() to handle large pages properly
  hpet: Drop stale URLs
  x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache()
  x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable
  lib/ucs2_string: Correct ucs2 -> utf8 conversion
  efi: Add pstore variables to the deletion whitelist
  efi: Make efivarfs entries immutable by default
  efi: Make our variable validation list include the guid
  efi: Do variable name validation tests in utf8
  efi: Use ucs2_as_utf8 in efivarfs instead of open coding a bad version
  lib/ucs2_string: Add ucs2 -> utf8 helper functions
2016-02-20 09:32:40 -08:00
Maxim Patlasov
7ae8fd0351 fs/pnode.c: treat zero mnt_group_id-s as unequal
propagate_one(m) calculates "type" argument for copy_tree() like this:

>    if (m->mnt_group_id == last_dest->mnt_group_id) {
>        type = CL_MAKE_SHARED;
>    } else {
>        type = CL_SLAVE;
>        if (IS_MNT_SHARED(m))
>           type |= CL_MAKE_SHARED;
>   }

The "type" argument then governs clone_mnt() behavior with respect to flags
and mnt_master of new mount. When we iterate through a slave group, it is
possible that both current "m" and "last_dest" are not shared (although,
both are slaves, i.e. have non-NULL mnt_master-s). Then the comparison
above erroneously makes new mount shared and sets its mnt_master to
last_source->mnt_master. The patch fixes the problem by handling zero
mnt_group_id-s as though they are unequal.

The similar problem exists in the implementation of "else" clause above
when we have to ascend upward in the master/slave tree by calling:

>    last_source = last_source->mnt_master;
>    last_dest = last_source->mnt_parent;

proper number of times. The last step is governed by
"n->mnt_group_id != last_dest->mnt_group_id" condition that may lie if
both are zero. The patch fixes this case in the same way as the former one.

[AV: don't open-code an obvious helper...]

Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-20 00:15:52 -05:00
Al Viro
0bacbe528e affs_do_readpage_ofs(): just use kmap_atomic() around memcpy()
It forgets kunmap() on a failure exit, but there's really no point keeping
the page kmapped at all - after all, what we are doing is a bunch of memcpy()
into the parts of page, so kmap_atomic()/kunmap_atomic() just around those
memcpy() is enough.

Spotted-by: Insu Yun <wuninsu@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-20 00:15:51 -05:00
Mateusz Guzik
0e9a7da51b xattr handlers: plug a lock leak in simple_xattr_list
The code could leak xattrs->lock on error.

Problem introduced with 786534b92f "tmpfs: listxattr should
include POSIX ACL xattrs".

Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-20 00:15:51 -05:00
Wouter van Kesteren
2feb55f890 fs: allow no_seek_end_llseek to actually seek
The user-visible impact of the issue is for example that without this
patch sensors-detect breaks when trying to seek in /dev/cpu/0/cpuid.

'~0ULL' is a 'unsigned long long' that when converted to a loff_t,
which is signed, gets turned into -1. later in vfs_setpos we have
'if (offset > maxsize)', which makes it always return EINVAL.

Fixes: b25472f9b9 ("new helpers: no_seek_end_llseek{,_size}()")
Signed-off-by: Wouter van Kesteren <woutershep@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-20 00:15:50 -05:00
Linus Torvalds
020ecbba05 Miscellaneous ext4 bug fixes for v4.5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJWx2ZiAAoJEPL5WVaVDYGjrbcH/2EqCUDmW+FqVqR7PkpQsNiV
 WxBTNkxnVXf1Jin5beIUN/Ehq0GSuqcSujMwdbFUa0i7YJNVEe++hTw28JmFILYV
 5nZtTYmYIq7dZb/tnc3tj0SsDpgEE1h31VyWAu4W2q4wSQMDc8AqGM90VktgrerJ
 H9k/WDDL6KC8uXagBsQC0d5xaQglJNZC+S6pSBbMegBAFNJqAL5N78oWAoEFN3OH
 LN3B3eccxBx98rGWx8DBiugY8ZDRHB4Cre+fXu8wmAuMb/+Y7Mwj4RzI+fz5Vpiw
 vMS5RqZ7PvCaMhdyUWt9bI8j10bBXcaxOHL2UQND5A1zundJ1ZNOY/ZPvHVUS4s=
 =AXFu
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 bugfixes from Ted Ts'o:
 "Miscellaneous ext4 bug fixes for v4.5"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix crashes in dioread_nolock mode
  ext4: fix bh->b_state corruption
  ext4: fix memleak in ext4_readdir()
  ext4: remove unused parameter "newblock" in convert_initialized_extent()
  ext4: don't read blocks from disk after extents being swapped
  ext4: fix potential integer overflow
  ext4: add a line break for proc mb_groups display
  ext4: ioctl: fix erroneous return value
  ext4: fix scheduling in atomic on group checksum failure
  ext4 crypto: move context consistency check to ext4_file_open()
  ext4 crypto: revalidate dentry after adding or removing the key
2016-02-19 13:44:12 -08:00
Linus Torvalds
ce6b71432d Merge branch 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fix from Chris Mason:
 "My for-linus-4.5 branch has a btrfs DIO error passing fix.

  I know how much you love DIO, so I'm going to suggest against reading
  it.  We'll follow up with a patch to drop the error arg from
  dio_end_io in the next merge window."

* 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix direct IO requests not reporting IO error to user space
2016-02-19 13:40:42 -08:00
Linus Torvalds
87d9ac712b Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "10 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: slab: free kmem_cache_node after destroy sysfs file
  ipc/shm: handle removed segments gracefully in shm_mmap()
  MAINTAINERS: update Kselftest Framework mailing list
  devm_memremap_release(): fix memremap'd addr handling
  mm/hugetlb.c: fix incorrect proc nr_hugepages value
  mm, x86: fix pte_page() crash in gup_pte_range()
  fsnotify: turn fsnotify reaper thread into a workqueue job
  Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread"
  mm: fix regression in remap_file_pages() emulation
  thp, dax: do not try to withdraw pgtable from non-anon VMA
2016-02-19 13:36:00 -08:00
Al Viro
c1223ca48b orangefs: get rid of op refcounts
not needed anymore

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-19 13:45:56 -05:00
Al Viro
05a50a5be8 orangefs: have ..._clean_interrupted_...() wait for copy to/from daemon
* turn all those list_del(&op->list) into list_del_init()
* don't pick ops that are already given up in control device
  ->read()/->write_iter().
* have orangefs_clean_interrupted_operation() notice if op is currently
  being copied to/from daemon (by said ->read()/->write_iter()) and
  wait for that to finish.
* when we are done copying to/from daemon and find that it had been
  given up while we were doing that, wake the waiting ..._clean_interrupted_...

As the result, we are guaranteed that orangefs_clean_interrupted_operation(op)
doesn't return until nobody else can see op.  Moreover, we don't need to play
with op refcounts anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-19 13:45:56 -05:00
Al Viro
5964c1b839 orangefs: set correct ->downcall.status on failing to copy reply from daemon
... and clean the end of control device ->write_iter() while we are at it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-19 13:45:55 -05:00
Mike Marshall
ddb84da38d Orangefs: remove vestigial ASYNC code
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-19 13:45:55 -05:00
Mike Marshall
5253487e04 Orangefs: make some gossip statements more helpful.
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-19 13:45:55 -05:00
Al Viro
897c5df6cf orangefs: get rid of op->done
shouldn't be needed now

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-19 13:45:55 -05:00
Al Viro
82d37f19ff orangefs_readdir_index_put(): get rid of bufmap argument
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-19 13:45:54 -05:00
Al Viro
ea2c9c9f65 orangefs: bufmap rewrite
new waiting-for-slot logics:
	* make request for slot wait for bufmap to be set up if it
comes before it's installed *OR* while it's running down
	* make closing control device wait for all slots to be freed
	* waiting itself rewritten to (open-coded) analogues of wait_event_...
primitives - we would need wait_event_locked() and, pardon an obscenely
long name, wait_event_interruptible_exclusive_timeout_locked().
	* we never wait for more than slot_timeout_secs in total and,
if during the wait the daemon goes away, we only allow
ORANGEFS_BUFMAP_WAIT_TIMEOUT_SECS for it to come back.
	* (cosmetical) bitmap is used instead of an array of zeroes and ones
	* old (and only reached if we are about to corrupt memory) waiting
for daemon restart in service_operation() removed.

[Martin's fixes folded]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-19 13:45:54 -05:00
Al Viro
178041848a orangefs_bufmap_..._query(): don't bother with refcounts
... just hold the spinlock while fetching the field in question.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-19 13:45:54 -05:00
Al Viro
05b39a8b5c orangefs: lift handling of timeouts and attempts count to service_operation()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-19 13:45:54 -05:00
Al Viro
c72f15b7d9 service_operation(): don't block signals, just use ..._killable
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-19 13:45:53 -05:00
Al Viro
98815ade9e orangefs: sanitize handling of request list
* checking that daemon is running (to decide whether we want to limit
the timeout) should be done *after* the damn thing is included into
the list; doing that before means that if the daemon gets shut down
in between, we'll end up waiting indefinitely (== up to kill -9).

* cancels should go into the head of the queue - the sooner they
are picked, the less work daemon has to do and the sooner we get to
free the slot held by aborted operation.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-19 13:45:53 -05:00
Al Viro
d2d87a3b6d orangefs: get rid of loop in wait_for_matching_downcall()
turn op->waitq into struct completion...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-19 13:45:53 -05:00
Martin Brandenburg
cf22644a0e orangefs: use S_ISREG(mode) and friends instead of mode & S_IFREG.
Suggestion from Dan Carpenter.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-19 13:45:53 -05:00
Al Viro
78699e29fd orangefs: delay freeing slot until cancel completes
Make cancels reuse the aborted read/write op, to make sure they do not
fail on lack of memory.

Don't issue a cancel unless the daemon has seen our read/write, has not
replied and isn't being shut down.

If cancel *is* issued, don't wait for it to complete; stash the slot
in there and just have it freed when cancel is finally replied to or
purged (and delay dropping the reference until then, obviously).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-19 13:45:53 -05:00
Jan Kara
74dae42785 ext4: fix crashes in dioread_nolock mode
Competing overwrite DIO in dioread_nolock mode will just overwrite
pointer to io_end in the inode. This may result in data corruption or
extent conversion happening from IO completion interrupt because we
don't properly set buffer_defer_completion() when unlocked DIO races
with locked DIO to unwritten extent.

Since unlocked DIO doesn't need io_end for anything, just avoid
allocating it and corrupting pointer from inode for locked DIO.
A cleaner fix would be to avoid these games with io_end pointer from the
inode but that requires more intrusive changes so we leave that for
later.

Cc: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-19 00:33:21 -05:00
Jan Kara
ed8ad83808 ext4: fix bh->b_state corruption
ext4 can update bh->b_state non-atomically in _ext4_get_block() and
ext4_da_get_block_prep(). Usually this is fine since bh is just a
temporary storage for mapping information on stack but in some cases it
can be fully living bh attached to a page. In such case non-atomic
update of bh->b_state can race with an atomic update which then gets
lost. Usually when we are mapping bh and thus updating bh->b_state
non-atomically, nobody else touches the bh and so things work out fine
but there is one case to especially worry about: ext4_finish_bio() uses
BH_Uptodate_Lock on the first bh in the page to synchronize handling of
PageWriteback state. So when blocksize < pagesize, we can be atomically
modifying bh->b_state of a buffer that actually isn't under IO and thus
can race e.g. with delalloc trying to map that buffer. The result is
that we can mistakenly set / clear BH_Uptodate_Lock bit resulting in the
corruption of PageWriteback state or missed unlock of BH_Uptodate_Lock.

Fix the problem by always updating bh->b_state bits atomically.

CC: stable@vger.kernel.org
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-19 00:18:25 -05:00
Jeff Layton
0918f1c309 fsnotify: turn fsnotify reaper thread into a workqueue job
We don't require a dedicated thread for fsnotify cleanup.  Switch it
over to a workqueue job instead that runs on the system_unbound_wq.

In the interest of not thrashing the queued job too often when there are
a lot of marks being removed, we delay the reaper job slightly when
queueing it, to allow several to gather on the list.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Tested-by: Eryu Guan <guaneryu@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-18 16:23:24 -08:00
Jeff Layton
13d34ac6e5 Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread"
This reverts commit c510eff6be ("fsnotify: destroy marks with
call_srcu instead of dedicated thread").

Eryu reported that he was seeing some OOM kills kick in when running a
testcase that adds and removes inotify marks on a file in a tight loop.

The above commit changed the code to use call_srcu to clean up the
marks.  While that does (in principle) work, the srcu callback job is
limited to cleaning up entries in small batches and only once per jiffy.
It's easily possible to overwhelm that machinery with too many call_srcu
callbacks, and Eryu's reproduer did just that.

There's also another potential problem with using call_srcu here.  While
you can obviously sleep while holding the srcu_read_lock, the callbacks
run under local_bh_disable, so you can't sleep there.

It's possible when putting the last reference to the fsnotify_mark that
we'll end up putting a chain of references including the fsnotify_group,
uid, and associated keys.  While I don't see any obvious ways that that
could occurs, it's probably still best to avoid using call_srcu here
after all.

This patch reverts the above patch.  A later patch will take a different
approach to eliminated the dedicated thread here.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reported-by: Eryu Guan <guaneryu@gmail.com>
Tested-by: Eryu Guan <guaneryu@gmail.com>
Cc: Jan Kara <jack@suse.com>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-18 16:23:24 -08:00
Linus Torvalds
2850713576 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A collection of fixes from the past few weeks that should go into 4.5.
  This contains:

   - Overflow fix for sysfs discard show function from Alan.

   - A stacking limit init fix for max_dev_sectors, so we don't end up
     artificially capping some use cases.  From Keith.

   - Have blk-mq proper end unstarted requests on a dying queue, instead
     of pushing that to the driver.  From Keith.

   - NVMe:
        - Update to Kconfig description for NVME_SCSI, since it was
          vague and having it on is important for some SUSE distros.
          From Christoph.
        - Set of fixes from Keith, around surprise removal. Also kills
          the no-merge flag, so it supports merging.

   - Set of fixes for lightnvm from Matias, Javier, and Wenwei.

   - Fix null_blk oops when asked for lightnvm, but not available.  From
     Matias.

   - Copy-to-user EINTR fix from Hannes, fixing a case where SG_IO fails
     if interrupted by a signal.

   - Two floppy fixes from Jiri, fixing signal handling and blocking
     open.

   - A use-after-free fix for O_DIRECT, from Mike Krinkin.

   - A block module ref count fix from Roman Pen.

   - An fs IO wait accounting fix for O_DSYNC from Stephane Gasparini.

   - Smaller reallo fix for xen-blkfront from Bob Liu.

   - Removal of an unused struct member in the deadline IO scheduler,
     from Tahsin.

   - Also from Tahsin, properly initialize inode struct members
     associated with cgroup writeback, if enabled.

   - From Tejun, ensure that we keep the superblock pinned during cgroup
     writeback"

* 'for-linus' of git://git.kernel.dk/linux-block: (25 commits)
  blk: fix overflow in queue_discard_max_hw_show
  writeback: initialize inode members that track writeback history
  writeback: keep superblock pinned during cgroup writeback association switches
  bio: return EINTR if copying to user space got interrupted
  NVMe: Rate limit nvme IO warnings
  NVMe: Poll device while still active during remove
  NVMe: Requeue requests on suspended queues
  NVMe: Allow request merges
  NVMe: Fix io incapable return values
  blk-mq: End unstarted requests on dying queue
  block: Initialize max_dev_sectors to 0
  null_blk: oops when initializing without lightnvm
  block: fix module reference leak on put_disk() call for cgroups throttle
  nvme: fix Kconfig description for BLK_DEV_NVME_SCSI
  kernel/fs: fix I/O wait not accounted for RW O_DSYNC
  floppy: refactor open() flags handling
  lightnvm: allow to force mm initialization
  lightnvm: check overflow and correct mlc pairs
  lightnvm: fix request intersection locking in rrpc
  lightnvm: warn if irqs are disabled in lock laddr
  ...
2016-02-17 11:59:23 -08:00
Kinglong Mee
c89757061a pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page
unreferenced object 0xffffc90000abf000 (size 16900):
  comm "fsync02", pid 15765, jiffies 4297431627 (age 423.772s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 a0 c2 19 00 88 ff ff  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8174d54e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff811b9b91>] __vmalloc_node_range+0x231/0x280
    [<ffffffff811b9c2a>] __vmalloc+0x4a/0x50
    [<ffffffffa02c9ec1>] ext_tree_prepare_commit+0x231/0x2e0 [blocklayoutdriver]
    [<ffffffffa02c700e>] bl_prepare_layoutcommit+0xe/0x10 [blocklayoutdriver]
    [<ffffffffa0596a6c>] pnfs_layoutcommit_inode+0x29c/0x330 [nfsv4]
    [<ffffffffa0596b13>] pnfs_generic_sync+0x13/0x20 [nfsv4]
    [<ffffffffa0585188>] nfs4_file_fsync+0x58/0x150 [nfsv4]
    [<ffffffff81228e5b>] vfs_fsync_range+0x4b/0xb0
    [<ffffffff81228f1d>] do_fsync+0x3d/0x70
    [<ffffffff812291d0>] SyS_fsync+0x10/0x20
    [<ffffffff81757def>] entry_SYSCALL_64_fastpath+0x12/0x76
    [<ffffffffffffffff>] 0xffffffffffffffff

v2, add missing include header

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-17 11:44:45 -05:00
Christoph Hellwig
4bdf87ebda nfs4: fix stateid handling for the NFS v4.2 operations
The newly added NFS v4.2 operations (ALLOCATE, DEALLOCATE, SEEK and CLONE)
use a helper called nfs42_set_rw_stateid to select a stateid that is sent
to the server.  But they don't set the inode and state fields in the
nfs4_exception structure, and this don't partake in the stateid recovery
protocol.  Because of this they will simply return errors insted of trying
to recover a stateid when the server return a BAD_STATEID error.

Additionally CLONE has the problem that it operates on two files and thus
two stateids, and thus needs to call the exception handler twice to
recover stateids.

While we're at it stop grabbing an addititional reference to the open
context in all these operations - having the file open guarantees that
the open context won't go away.

All this can be produces with the generic/168 and generic/170 tests in
xfstests which stress the CLONE stateid handling.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-17 11:38:07 -05:00
Benjamin Coddington
d9dfd8d741 NFSv4: Fix a dentry leak on alias use
In the case where d_add_unique() finds an appropriate alias to use it will
have already incremented the reference count.  An additional dget() to swap
the open context's dentry is unnecessary and will leak a reference.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: 275bb30786 ("NFSv4: Move dentry instantiation into the NFSv4-...")
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-17 11:35:25 -05:00
Tahsin Erdogan
3d65ae4634 writeback: initialize inode members that track writeback history
inode struct members that track cgroup writeback information
should be reinitialized when inode gets allocated from
kmem_cache. Otherwise, their values remain and get used by the
new inode.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Fixes: d10c809552 ("writeback: implement foreign cgroup inode bdi_writeback switching")
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-16 14:57:21 -07:00
Linus Torvalds
65c23c65be Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "A small set of cifs fixes.

  I am still reviewing some more, recently submitted SMB3 fixes, but
  these three are small and safe and ready now"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix erroneous return value
  cifs: fix potential overflow in cifs_compose_mount_options
  cifs: remove redundant check for null string pointer
2016-02-16 10:52:59 -08:00
Tejun Heo
5ff8eaac16 writeback: keep superblock pinned during cgroup writeback association switches
If cgroup writeback is in use, an inode is associated with a cgroup
for writeback.  If the inode's main dirtier changes to another cgroup,
the association gets updated asynchronously.  Nothing was pinning the
superblock while such switches are in progress and superblock could go
away while async switching is pending or in progress leading to
crashes like the following.

 kernel BUG at fs/jbd2/transaction.c:319!
 invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
 CPU: 1 PID: 29158 Comm: kworker/1:10 Not tainted 4.5.0-rc3 #51
 Hardware name: Google Google, BIOS Google 01/01/2011
 Workqueue: events inode_switch_wbs_work_fn
 task: ffff880213dbbd40 ti: ffff880209264000 task.ti: ffff880209264000
 RIP: 0010:[<ffffffff803e6922>]  [<ffffffff803e6922>] start_this_handle+0x382/0x3e0
 RSP: 0018:ffff880209267c30  EFLAGS: 00010202
 ...
 Call Trace:
  [<ffffffff803e6be4>] jbd2__journal_start+0xf4/0x190
  [<ffffffff803cfc7e>] __ext4_journal_start_sb+0x4e/0x70
  [<ffffffff803b31ec>] ext4_evict_inode+0x12c/0x3d0
  [<ffffffff8035338b>] evict+0xbb/0x190
  [<ffffffff80354190>] iput+0x130/0x190
  [<ffffffff80360223>] inode_switch_wbs_work_fn+0x343/0x4c0
  [<ffffffff80279819>] process_one_work+0x129/0x300
  [<ffffffff80279b16>] worker_thread+0x126/0x480
  [<ffffffff8027ed14>] kthread+0xc4/0xe0
  [<ffffffff809771df>] ret_from_fork+0x3f/0x70

Fix it by bumping s_active while cgroup association switching is in
flight.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Tahsin Erdogan <tahsin@google.com>
Link: http://lkml.kernel.org/g/CAAeU0aNCq7LGODvVGRU-oU_o-6enii5ey0p1c26D1ZzYwkDc5A@mail.gmail.com
Fixes: d10c809552 ("writeback: implement foreign cgroup inode bdi_writeback switching")
Cc: stable@vger.kernel.org #v4.5+
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-16 11:34:07 -07:00
Ingo Molnar
4682c211a8 * Prevent accidental deletion of EFI variables through efivarfs that
may brick machines. We use a whitelist of known-safe variables to
    allow things like installing distributions to work out of the box, and
    instead restrict vendor-specific variable deletion by making
    non-whitelist variables immutable - Peter Jones
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJWvbEYAAoJEC84WcCNIz1VatYP/1kkly4lIuSYmaQrvF9V/L75
 lYNHjEURT55EDq4VAHH/wey3SbDkwy3wBsmfkkJTV1zhA+SHSAG2k097xGyLP6Xr
 X+htIj//HH7U3SRWk66UiwkY/866sXCqVRN2vvjBxvP9Z/rTDKe7zRQdVVdCt80P
 88H/1Nxy1S8eDExMGCvq8TbtWCSKV6P8197rUqUMf37Sbqr7yBM/sYDitdwOiGTW
 gzLwJjWJgDsKw+BWaj5NNZzVAb1Dgof5oEL5WGCU7gJSis08i4cHoRiwutYk2g8f
 ZbMnKvlFmiHGbjriowyNPm+pgRVDbS8JvJtORA1qXuVJFPtqV7Wdvdh+jJpdYXLp
 bO8EB/yfc7PTH8ScbNbIcgmCknsItRh2SDNXxM/BY/dzaSkzVI/Wr6GauWKInQJ6
 IypOMijITmnJ5Sij0V4aMUTZWS5btZt15iqAg3xUqWT9DJ61bIER+eGEhV6hx+7S
 pSydQylbaVFpyswdCpJRsfxHfW5j0G9BxnKZGTh+LHeb6dXaughUq2EIdUNHWyEZ
 3geJPC3Mh50MngO8phIq+DzjA4K84JZ9j6M3O27+x3bfLAqiktZS6HiaTSmSGNyM
 95swhpyHREeLQqYUUTOWiz1rlQ9cW+Bmkhy7Wn3RBZ033YNtmpyoZup0432mwkMm
 Wur3Jxd0GFz7zUkqvN3O
 =F4YT
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent

Pull EFI fixes from Matt Fleming:

 * Prevent accidental deletion of EFI variables through efivarfs that
   may brick machines. We use a whitelist of known-safe variables to
   allow things like installing distributions to work out of the box, and
   instead restrict vendor-specific variable deletion by making
   non-whitelist variables immutable (Peter Jones)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16 13:14:57 +01:00
Kirill Tkhai
c906f38e88 ext4: fix memleak in ext4_readdir()
When ext4_bread() fails, fname_crypto_str remains
allocated after return. Fix that.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
CC: Dmitry Monakhov <dmonakhov@virtuozzo.com>
2016-02-16 00:20:19 -05:00
Filipe Manana
1636d1d77e Btrfs: fix direct IO requests not reporting IO error to user space
If a bio for a direct IO request fails, we were not setting the error in
the parent bio (the main DIO bio), making us not return the error to
user space in btrfs_direct_IO(), that is, it made __blockdev_direct_IO()
return the number of bytes issued for IO and not the error a bio created
and submitted by btrfs_submit_direct() got from the block layer.
This essentially happens because when we call:

   dio_end_io(dio_bio, bio->bi_error);

It does not set dio_bio->bi_error to the value of the second argument.
So just add this missing assignment in endio callbacks, just as we do in
the error path at btrfs_submit_direct() when we fail to clone the dio bio
or allocate its private object. This follows the convention of what is
done with other similar APIs such as bio_endio() where the caller is
responsible for setting the bi_error field in the bio it passes as an
argument to bio_endio().

This was detected by the new generic test cases in xfstests: 271, 272,
276 and 278. Which essentially setup a dm error target, then load the
error table, do a direct IO write and unload the error table. They
expect the write to fail with -EIO, which was not getting reported
when testing against btrfs.

Cc: stable@vger.kernel.org  # 4.3+
Fixes: 4246a0b63b ("block: add a bi_error field to struct bio")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
2016-02-16 03:41:26 +00:00
Trond Myklebust
e0fa0d0189 pNFS: Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode
When setting the layout return mode, we must always also set the
NFS_LAYOUT_RETURN_REQUESTED flag to ensure that we send a layoutreturn.
Otherwise pnfs_error_mark_layout_for_return() could set the mode, but
fail to send the layoutreturn because another is already in flight.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-15 13:03:30 -05:00
Trond Myklebust
2f21596882 pNFS: Fix pnfs_mark_matching_lsegs_return()
We don't need to schedule a layoutreturn if the layout segment can
be freed immediately.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-15 12:56:01 -05:00
Linus Torvalds
779ee19da7 tty/serial fixes for 4.5-rc4
Here are a number of small tty and serial driver fixes for 4.5-rc4 that
 resolve some reported issues.
 
 One of them got reverted as it wasn't correct based on testing, and all
 have been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlbAzkEACgkQMUfUDdst+ylE4QCfXW10ziXSblRUIJubEm45Qhn2
 WJAAoLFMd/eER2TFkBl4E2Y3I7HUaL5d
 =V2Vb
 -----END PGP SIGNATURE-----

Merge tag 'tty-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
 "Here are a number of small tty and serial driver fixes for 4.5-rc4
  that resolve some reported issues.

  One of them got reverted as it wasn't correct based on testing, and
  all have been in linux-next for a while"

* tag 'tty-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "8250: uniphier: allow modular build with 8250 console"
  pty: make sure super_block is still valid in final /dev/tty close
  pty: fix possible use after free of tty->driver_data
  tty: Add support for PCIe WCH382 2S multi-IO card
  serial/omap: mark wait_for_xmitr as __maybe_unused
  serial: omap: Prevent DoS using unprivileged ioctl(TIOCSRS485)
  8250: uniphier: allow modular build with 8250 console
  tty: Drop krefs for interrupted tty lock
2016-02-14 12:29:59 -08:00
Al Viro
1357d06d49 get rid of bufmap argument of orangefs_bufmap_put()
it's always equal to __orangefs_bufmap and the latter can't change
until we are done

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-12 15:05:33 -05:00
Al Viro
c0eae8cd77 orangefs: get rid of handle_io_error()
the second caller never needs to cancel, actually

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-12 15:05:32 -05:00
Al Viro
7b9761af86 orangefs: wait_for_direct_io(): restore the position in iter when restarting
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-12 15:05:32 -05:00
Al Viro
e17be9fd4d orangefs: avoid freeing a slot twice in wait_for_direct_io()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-12 15:05:32 -05:00
Linus Torvalds
27c9d772e5 Merge branch 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "This has a few fixes from Filipe, along with a readdir fix from Dave
  that we've been testing for some time"

* 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: properly set the termination value of ctx->pos in readdir
  Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl
  Btrfs: remove no longer used function extent_read_full_page_nolock()
  Btrfs: fix page reading in extent_same ioctl leading to csum errors
  Btrfs: fix invalid page accesses in extent_same (dedup) ioctl
2016-02-12 09:21:28 -08:00
Linus Torvalds
dfc852864d xfs: updates for 4.5-rc4
Contains:
 o fix for endian conversion issue in new CRC validation in
   log recovery that was discovered on a ppc64 platform.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJWvVMvAAoJEK3oKUf0dfodgugP/ioNOzuHYWUabT9Eltq+5ca4
 r4/vp8OcQfxQDepWacPhO3K5sR8yI7eM5KG0+NmrIO06HypLjQpLT9h6RB/oj7G2
 bIpSzYwWC/D2fCt3HNMLfHmbZRCAOSP8szeKaWp+G3VfF9cZKVpiqm0QJdQc6xi2
 52ePFw/Irx2AaThW41U6UP+B9Efw7bBQkSd4ogHXMdilG1tqsWFxTAee7V7+Z3oQ
 Pj5g6DpOpV5we3EoYGgdVu4+b66AS5I5srhm8to2wHGu22IE3z4rn1B151gRNVx7
 qrtDGdPQcgJkg5g9Ez3LLX1ikN3cBn67ZNGNXpkDveABdS17trgQ5IWxNB6Z57zq
 +nnSQnwhmr/CCZoP9F2OzVE4WUtqr5fk07NEx6eOyHcVMQeeW2fzwjzL/BXZze4/
 55GwrQU86VYHjMXeD8876qRz108v5/tlBqBrGJ6qxAMEPdKizxJFbQEE06LMz1Mi
 v9hga7ssJilo4BDIsrDZ/nkdT01rY8HZiIMUd117TWdhwafvH8wyp9a/wtdbRvu7
 tcea+n2gTGkkGcfTmgWQBvtqKU/gpDDLtpIO5zZzNE1kCiITcwnANKus9Ha0/DNj
 WBdO/a+bN+i32G0+SI9qnYMCeGMgG6FBLGVk9ZytBvRLr4Fkq2eVFmK1bWpvTMTE
 vn8QunELgutZxR2PnOyc
 =ti5p
 -----END PGP SIGNATURE-----

Merge tag 'xfs-fixes-for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs

Pull xfs fix from Dve Chinner:
 "This contains a fix for an endian conversion issue in new CRC
  validation in log recovery that was discovered on a ppc64 platform"

* tag 'xfs-fixes-for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
  xfs: fix endianness error when checking log block crc on big endian platforms
2016-02-12 09:17:03 -08:00
Eryu Guan
56263b4ceb ext4: remove unused parameter "newblock" in convert_initialized_extent()
The "newblock" parameter is not used in convert_initialized_extent(),
remove it.

Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-12 01:23:00 -05:00
Eryu Guan
bcff24887d ext4: don't read blocks from disk after extents being swapped
I notice ext4/307 fails occasionally on ppc64 host, reporting md5
checksum mismatch after moving data from original file to donor file.

The reason is that move_extent_per_page() calls __block_write_begin()
and block_commit_write() to write saved data from original inode blocks
to donor inode blocks, but __block_write_begin() not only maps buffer
heads but also reads block content from disk if the size is not block
size aligned.  At this time the physical block number in mapped buffer
head is pointing to the donor file not the original file, and that
results in reading wrong data to page, which get written to disk in
following block_commit_write call.

This also can be reproduced by the following script on 1k block size ext4
on x86_64 host:

    mnt=/mnt/ext4
    donorfile=$mnt/donor
    testfile=$mnt/testfile
    e4compact=~/xfstests/src/e4compact

    rm -f $donorfile $testfile

    # reserve space for donor file, written by 0xaa and sync to disk to
    # avoid EBUSY on EXT4_IOC_MOVE_EXT
    xfs_io -fc "pwrite -S 0xaa 0 1m" -c "fsync" $donorfile

    # create test file written by 0xbb
    xfs_io -fc "pwrite -S 0xbb 0 1023" -c "fsync" $testfile

    # compute initial md5sum
    md5sum $testfile | tee md5sum.txt
    # drop cache, force e4compact to read data from disk
    echo 3 > /proc/sys/vm/drop_caches

    # test defrag
    echo "$testfile" | $e4compact -i -v -f $donorfile
    # check md5sum
    md5sum -c md5sum.txt

Fix it by creating & mapping buffer heads only but not reading blocks
from disk, because all the data in page is guaranteed to be up-to-date
in mext_page_mkuptodate().

Cc: stable@vger.kernel.org
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-12 01:20:43 -05:00
Insu Yun
46901760b4 ext4: fix potential integer overflow
Since sizeof(ext_new_group_data) > sizeof(ext_new_flex_group_data),
integer overflow could be happened.
Therefore, need to fix integer overflow sanitization.

Cc: stable@vger.kernel.org
Signed-off-by: Insu Yun <wuninsu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-12 01:15:59 -05:00
Huaitong Han
802cf1f9f5 ext4: add a line break for proc mb_groups display
This patch adds a line break for proc mb_groups display.

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2016-02-12 00:17:16 -05:00
Anton Protopopov
fdde368e7c ext4: ioctl: fix erroneous return value
The ext4_ioctl_setflags() function which is used in the ioctls
EXT4_IOC_SETFLAGS and EXT4_IOC_FSSETXATTR may return the positive value
EPERM instead of -EPERM in case of error. This bug was introduced by a
recent commit 9b7365fc.

The following program can be used to illustrate the wrong behavior:

    #include <sys/types.h>
    #include <sys/ioctl.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    #include <err.h>

    #define FS_IOC_GETFLAGS _IOR('f', 1, long)
    #define FS_IOC_SETFLAGS _IOW('f', 2, long)
    #define FS_IMMUTABLE_FL 0x00000010

    int main(void)
    {
        int fd;
        long flags;

        fd = open("file", O_RDWR|O_CREAT, 0600);
        if (fd < 0)
            err(1, "open");

        if (ioctl(fd, FS_IOC_GETFLAGS, &flags) < 0)
            err(1, "ioctl: FS_IOC_GETFLAGS");

        flags |= FS_IMMUTABLE_FL;

        if (ioctl(fd, FS_IOC_SETFLAGS, &flags) < 0)
            err(1, "ioctl: FS_IOC_SETFLAGS");

        warnx("ioctl returned no error");

        return 0;
    }

Running it gives the following result:

    $ strace -e ioctl ./test
    ioctl(3, FS_IOC_GETFLAGS, 0x7ffdbd8bfd38) = 0
    ioctl(3, FS_IOC_SETFLAGS, 0x7ffdbd8bfd38) = 1
    test: ioctl returned no error
    +++ exited with 0 +++

Running the program on a kernel with the bug fixed gives the proper result:

    $ strace -e ioctl ./test
    ioctl(3, FS_IOC_GETFLAGS, 0x7ffdd2768258) = 0
    ioctl(3, FS_IOC_SETFLAGS, 0x7ffdd2768258) = -1 EPERM (Operation not permitted)
    test: ioctl: FS_IOC_SETFLAGS: Operation not permitted
    +++ exited with 1 +++

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-11 23:57:21 -05:00
Jan Kara
05145bd799 ext4: fix scheduling in atomic on group checksum failure
When block group checksum is wrong, we call ext4_error() while holding
group spinlock from ext4_init_block_bitmap() or
ext4_init_inode_bitmap() which results in scheduling while in atomic.
Fix the issue by calling ext4_error() later after dropping the spinlock.

CC: stable@vger.kernel.org
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2016-02-11 23:15:12 -05:00
David Sterba
bc4ef7592f btrfs: properly set the termination value of ctx->pos in readdir
The value of ctx->pos in the last readdir call is supposed to be set to
INT_MAX due to 32bit compatibility, unless 'pos' is intentially set to a
larger value, then it's LLONG_MAX.

There's a report from PaX SIZE_OVERFLOW plugin that "ctx->pos++"
overflows (https://forums.grsecurity.net/viewtopic.php?f=1&t=4284), on a
64bit arch, where the value is 0x7fffffffffffffff ie. LLONG_MAX before
the increment.

We can get to that situation like that:

* emit all regular readdir entries
* still in the same call to readdir, bump the last pos to INT_MAX
* next call to readdir will not emit any entries, but will reach the
  bump code again, finds pos to be INT_MAX and sets it to LLONG_MAX

Normally this is not a problem, but if we call readdir again, we'll find
'pos' set to LLONG_MAX and the unconditional increment will overflow.

The report from Victor at
(http://thread.gmane.org/gmane.comp.file-systems.btrfs/49500) with debugging
print shows that pattern:

 Overflow: e
 Overflow: 7fffffff
 Overflow: 7fffffffffffffff
 PAX: size overflow detected in function btrfs_real_readdir
   fs/btrfs/inode.c:5760 cicus.935_282 max, count: 9, decl: pos; num: 0;
   context: dir_context;
 CPU: 0 PID: 2630 Comm: polkitd Not tainted 4.2.3-grsec #1
 Hardware name: Gigabyte Technology Co., Ltd. H81ND2H/H81ND2H, BIOS F3 08/11/2015
  ffffffff81901608 0000000000000000 ffffffff819015e6 ffffc90004973d48
  ffffffff81742f0f 0000000000000007 ffffffff81901608 ffffc90004973d78
  ffffffff811cb706 0000000000000000 ffff8800d47359e0 ffffc90004973ed8
 Call Trace:
  [<ffffffff81742f0f>] dump_stack+0x4c/0x7f
  [<ffffffff811cb706>] report_size_overflow+0x36/0x40
  [<ffffffff812ef0bc>] btrfs_real_readdir+0x69c/0x6d0
  [<ffffffff811dafc8>] iterate_dir+0xa8/0x150
  [<ffffffff811e6d8d>] ? __fget_light+0x2d/0x70
  [<ffffffff811dba3a>] SyS_getdents+0xba/0x1c0
 Overflow: 1a
  [<ffffffff811db070>] ? iterate_dir+0x150/0x150
  [<ffffffff81749b69>] entry_SYSCALL_64_fastpath+0x12/0x83

The jump from 7fffffff to 7fffffffffffffff happens when new dir entries
are not yet synced and are processed from the delayed list. Then the code
could go to the bump section again even though it might not emit any new
dir entries from the delayed list.

The fix avoids entering the "bump" section again once we've finished
emitting the entries, both for synced and delayed entries.

References: https://forums.grsecurity.net/viewtopic.php?f=1&t=4284
Reported-by: Victor <services@swwu.com>
CC: stable@vger.kernel.org
Signed-off-by: David Sterba <dsterba@suse.com>
Tested-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-02-11 07:01:59 -08:00
Anton Protopopov
4b550af519 cifs: fix erroneous return value
The setup_ntlmv2_rsp() function may return positive value ENOMEM instead
of -ENOMEM in case of kmalloc failure.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-02-10 18:23:31 -06:00
Insu Yun
f34d69c3e5 cifs: fix potential overflow in cifs_compose_mount_options
In worst case, "ip=" + sb_mountdata + ipv6 can be copied into mountdata.
Therefore, for safe, it is better to add more size when allocating memory.

Signed-off-by: Insu Yun <wuninsu@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-02-10 18:04:56 -06:00
Colin Ian King
997152f627 cifs: remove redundant check for null string pointer
server_RFC1001_name is declared as a RFC1001_NAME_LEN_WITH_NULL sized
char array in struct TCP_Server_Info so the null pointer check on
server_RFC1001_name is redundant and can be removed.  Detected with
smatch:

fs/cifs/connect.c:2982 ip_rfc1001_connect() warn: this array is probably
  non-NULL. 'server->server_RFC1001_name'

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2016-02-10 18:04:53 -06:00