Commit Graph

40483 Commits

Author SHA1 Message Date
Miklos Szeredi
77cd9d488b fuse: add req flag for private list
When an unlocked request is aborted, it is moved from fpq->io to a private
list.  Then, after unlocking fpq->lock, the private list is processed and
the requests are finished off.

To protect the private list, we need to mark the request with a flag, so if
in the meantime the request is unlocked the list is not corrupted.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:06 +02:00
Miklos Szeredi
45a91cb1a4 fuse: pqueue locking
Add a fpq->lock for protecting members of struct fuse_pqueue and FR_LOCKED
request flag.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:06 +02:00
Miklos Szeredi
24b4d33d46 fuse: abort: group pqueue accesses
Rearrange fuse_abort_conn() so that processing queue accesses are grouped
together.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:05 +02:00
Miklos Szeredi
82cbdcd320 fuse: cleanup fuse_dev_do_read()
- locked list_add() + list_del_init() cancel out

 - common handling of case when request is ended here in the read phase

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:05 +02:00
Miklos Szeredi
f377cb799e fuse: move list_del_init() from request_end() into callers
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-07-01 16:26:04 +02:00
Miklos Szeredi
e96edd94d0 fuse: duplicate ->connected in pqueue
This will allow checking ->connected just with the processing queue lock.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:04 +02:00
Miklos Szeredi
3a2b5b9cd9 fuse: separate out processing queue
This is just two fields: fc->io and fc->processing.

This patch just rearranges the fields, no functional change.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:04 +02:00
Miklos Szeredi
5250921bb0 fuse: simplify request_wait()
wait_event_interruptible_exclusive_locked() will do everything
request_wait() does, so replace it.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:03 +02:00
Miklos Szeredi
fd22d62ed0 fuse: no fc->lock for iqueue parts
Remove fc->lock protection from input queue members, now protected by
fiq->waitq.lock.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:03 +02:00
Miklos Szeredi
8f7bb368db fuse: allow interrupt queuing without fc->lock
Interrupt is only queued after the request has been sent to userspace.
This is either done in request_wait_answer() or fuse_dev_do_read()
depending on which state the request is in at the time of the interrupt.
If it's not yet sent, then queuing the interrupt is postponed until the
request is read.  Otherwise (the request has already been read and is
waiting for an answer) the interrupt is queued immedidately.

We want to call queue_interrupt() without fc->lock protection, in which
case there can be a race between the two functions:

 - neither of them queue the interrupt (thinking the other one has already
   done it).

 - both of them queue the interrupt

The first one is prevented by adding memory barriers, the second is
prevented by checking (under fiq->waitq.lock) if the interrupt has already
been queued.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-07-01 16:26:03 +02:00
Miklos Szeredi
4ce6081260 fuse: iqueue locking
Use fiq->waitq.lock for protecting members of struct fuse_iqueue and
FR_PENDING request flag, previously protected by fc->lock.

Following patches will remove fc->lock protection from these members.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:02 +02:00
Miklos Szeredi
ef75925886 fuse: dev read: split list_move
Different lists will need different locks.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:02 +02:00
Miklos Szeredi
8c91189a2a fuse: abort: group iqueue accesses
Rearrange fuse_abort_conn() so that input queue accesses are grouped
together.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:02 +02:00
Miklos Szeredi
e16714d875 fuse: duplicate ->connected in iqueue
This will allow checking ->connected just with the input queue lock.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:01 +02:00
Miklos Szeredi
f88996a933 fuse: separate out input queue
The input queue contains normal requests (fc->pending), forgets
(fc->forget_*) and interrupts (fc->interrupts).  There's also fc->waitq and
fc->fasync for waking up the readers of the fuse device when a request is
available.

The fc->reqctr is also moved to the input queue (assigned to the request
when the request is added to the input queue.

This patch just rearranges the fields, no functional change.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:01 +02:00
Miklos Szeredi
33e14b4dfd fuse: req state use flags
Use flags for representing the state in fuse_req.  This is needed since
req->list will be protected by different locks in different states, hence
we'll want the state itself to be split into distinct bits, each protected
with the relevant lock in that state.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-07-01 16:26:01 +02:00
Miklos Szeredi
7a3b2c7547 fuse: simplify req states
FUSE_REQ_INIT is actually the same state as FUSE_REQ_PENDING and
FUSE_REQ_READING and FUSE_REQ_WRITING can be merged into a common
FUSE_REQ_IO state.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:00 +02:00
Miklos Szeredi
c47752673a fuse: don't hold lock over request_wait_answer()
Only hold fc->lock over sections of request_wait_answer() that actually
need it.  If wait_event_interruptible() returns zero, it means that the
request finished.  Need to add memory barriers, though, to make sure that
all relevant data in the request is synchronized.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-07-01 16:26:00 +02:00
Miklos Szeredi
7d2e0a099c fuse: simplify unique ctr
Since it's a 64bit counter, it's never gonna wrap around.  Remove code
dealing with that possibility.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:00 +02:00
Miklos Szeredi
41f982747e fuse: rework abort
Splice fc->pending and fc->processing lists into a common kill list while
holding fc->lock.

By the time we release fc->lock, pending and processing lists are empty and
the io list contains only locked requests.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:25:59 +02:00
Miklos Szeredi
b716d42538 fuse: fold helpers into abort
Fold end_io_requests() and end_queued_requests() into fuse_abort_conn().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:25:59 +02:00
Miklos Szeredi
dc00809a53 fuse: use per req lock for lock/unlock_request()
Reuse req->waitq.lock for protecting FR_ABORTED and FR_LOCKED flags.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:25:58 +02:00
Miklos Szeredi
825d6d3395 fuse: req use bitops
Finer grained locking will mean there's no single lock to protect
modification of bitfileds in fuse_req.

So move to using bitops.  Can use the non-atomic variants for those which
happen while the request definitely has only one reference.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:25:58 +02:00
Miklos Szeredi
0d8e84b043 fuse: simplify request abort
- don't end the request while req->locked is true

 - make unlock_request() return an error if the connection was aborted

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:25:58 +02:00
Miklos Szeredi
ccd0a0bd16 fuse: call fuse_abort_conn() in dev release
fuse_abort_conn() does all the work done by fuse_dev_release() and more.
"More" consists of:

	end_io_requests(fc);
	wake_up_all(&fc->waitq);
	kill_fasync(&fc->fasync, SIGIO, POLL_IN);

All of which should be no-op (WARN_ON's added).

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:25:57 +02:00
Miklos Szeredi
f0139aa819 fuse: fold fuse_request_send_nowait() into single caller
And the same with fuse_request_send_nowait_locked().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:25:57 +02:00
Miklos Szeredi
de15522646 fuse: check conn_error earlier
fc->conn_error is set once in FUSE_INIT reply and never cleared.  Check it
in request allocation, there's no sense in doing all the preparation if
sending will surely fail.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:25:57 +02:00
Miklos Szeredi
5437f24172 fuse: account as waiting before queuing for background
Move accounting of fc->num_waiting to the point where the request actually
starts waiting.  This is earlier than the current queue_request() for
background requests, since they might be waiting on the fc->bg_queue before
being queued on fc->pending.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:25:56 +02:00
Miklos Szeredi
73e0e73844 fuse: reset waiting
Reset req->waiting in fuse_put_request().  This is needed for correct
accounting in fc->num_waiting for reserved requests.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2015-07-01 16:25:56 +02:00
Miklos Szeredi
42dc6211c5 fuse: fix background request if not connected
request_end() expects fc->num_background and fc->active_background to have
been incremented, which is not the case in fuse_request_send_nowait()
failure path.  So instead just call the ->end() callback (which is actually
set by all callers).

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:25:56 +02:00
Miklos Szeredi
0ad0b3255a fuse: initialize fc->release before calling it
fc->release is called from fuse_conn_put() which was used in the error
cleanup before fc->release was initialized.

[Jeremiah Mahler <jmmahler@gmail.com>: assign fc->release after calling
fuse_conn_init(fc) instead of before.]

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Fixes: a325f9b922 ("fuse: update fuse_conn_init() and separate out fuse_conn_kill()")
Cc: <stable@vger.kernel.org> #v2.6.31+
2015-07-01 16:25:55 +02:00
Sasha Levin
161f873b89 vfs: read file_handle only once in handle_to_path
We used to read file_handle twice.  Once to get the amount of extra
bytes, and once to fetch the entire structure.

This may be problematic since we do size verifications only after the
first read, so if the number of extra bytes changes in userspace between
the first and second calls, we'll have an incoherent view of
file_handle.

Instead, read the constant size once, and copy that over to the final
structure without having to re-read it again.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-02 10:29:07 -07:00
Linus Torvalds
8ba64dc338 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fix from Al Viro:
 "Off-by-one in d_walk()/__dentry_kill() race fix.

  It's very hard to hit; possible in the same conditions as the original
  bug, except that you need the skipped branch to contain all the
  remaining evictables, so that the d_walk()-calling loop in
  d_invalidate() decides there's nothing more to do and doesn't go for
  another pass - otherwise that next pass will sweep the sucker.

  So it's not too urgent, but seeing that the fix is obvious and the
  original commit has spread into all -stable branches..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  d_walk() might skip too much
2015-05-31 16:00:34 -07:00
Linus Torvalds
1be44e234b xfs: update for 4.1-rc6
Changes in this update:
 o regression fix for new rename whiteout code
 o regression fixes for new superblock generic per-cpu counter code
 o fix for incorrect error return sign introduced in 3.17
 o metadata corruption fixes that need to go back to -stable kernels
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJVaO0JAAoJEK3oKUf0dfodd4UQANRdfXnUrpyQGhVS7HFoFoVt
 FIQ52pPGbMu72+DqHc+Q41uvgAPe65LFB2VUL6CUGCMExstF72F5+QonzppMgkMo
 unPER3eB8ya03SY+Kp+803ZGgzI2Nl2M6w8Kof730/RUk56PTGYIx4eLXd6iZSli
 RsYjw8JDbeue5OQo5FPmLCSQ/Kr5ZJXbgWVPyWkKg9aCcXLN5YSJIV3xcMTK9Q2I
 LqqODkyatnGc6YxGAKddS7Xzt1ntlZgbe5mndQw04a2g0Lf6emPH5r8b0UJXIu96
 advOBX0pEbad4FeFS6Mo5D+nNCaaNP4WzN7wgdb+BYNVw3ss4Ebam7+yY6Gexg6y
 bzZOEkk9saL4YeBDgyYICNu7kG4BRVKRQiiX220G6SFXM3nqbl7qBPb3kVFyDpcI
 RRuFJ0ZV0kFJ+3IQ4xVnIh6nootceRk/mvZaK5HhLhQLzklpZ8fj4HF3oBDUAnvN
 wNd+7GoZy7zldjCkbF4BP3GjUeW+b9ngrCNc+bFXi5cUbdECXAa2krjxyY+MlQF2
 veNVVcsoRdfeM0VjJh2/piGJxMWIlRqXdKzPKsfMWnlIaJ6YyslfbSq+2K7LxgGR
 Ho3Sjt0oUuPMZ9F/Mjj+XDqwmzgooUHXNyDBxhGXBNBPjApcRLcb2vQ2SrWEmeGJ
 vZmC2R1ZoGdBJg8a55BT
 =w5SP
 -----END PGP SIGNATURE-----

Merge tag 'xfs-for-linus-4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs

Pull xfs fixes from Dave Chinner:
 "This is a little larger than I'd like late in the release cycle, but
  all the fixes are for regressions introduced in the 4.1-rc1 merge, or
  are needed back in -stable kernels fairly quickly as they are
  filesystem corruption or userspace visible correctness issues.

  Changes in this update:

   - regression fix for new rename whiteout code

   - regression fixes for new superblock generic per-cpu counter code

   - fix for incorrect error return sign introduced in 3.17

   - metadata corruption fixes that need to go back to -stable kernels"

* tag 'xfs-for-linus-4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
  xfs: fix broken i_nlink accounting for whiteout tmpfile inode
  xfs: xfs_iozero can return positive errno
  xfs: xfs_attr_inactive leaves inconsistent attr fork state behind
  xfs: extent size hints can round up extents past MAXEXTLEN
  xfs: inode and free block counters need to use __percpu_counter_compare
  percpu_counter: batch size aware __percpu_counter_compare()
  xfs: use percpu_counter_read_positive for mp->m_icount
2015-05-29 16:45:45 -07:00
Al Viro
2159184ea0 d_walk() might skip too much
when we find that a child has died while we'd been trying to ascend,
we should go into the first live sibling itself, rather than its sibling.

Off-by-one in question had been introduced in "deal with deadlock in
d_walk()" and the fix needs to be backported to all branches this one
has been backported to.

Cc: stable@vger.kernel.org # 3.2 and later
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-28 23:45:30 -04:00
Bob Copeland
5a6b2b36a8 omfs: fix potential integer overflow in allocator
Both 'i' and 'bits_per_entry' are signed integers but the result is a
u64 block number.  Cast i to u64 to avoid truncation on 32-bit targets.

Found by Coverity (CID 200679).

Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-28 18:25:19 -07:00
Bob Copeland
c0345ee57d omfs: fix sign confusion for bitmap loop counter
The count variable is used to iterate down to (below) zero from the size
of the bitmap and handle the one-filling the remainder of the last
partial bitmap block.  The loop conditional expects count to be signed
in order to detect when the final block is processed, after which count
goes negative.

Unfortunately, a recent change made this unsigned along with some other
related fields.  The result of is this is that during mount,
omfs_get_imap will overrun the bitmap array and corrupt memory unless
number of blocks happens to be a multiple of 8 * blocksize.

Fix by changing count back to signed: it is guaranteed to fit in an s32
without overflow due to an enforced limit on the number of blocks in the
filesystem.

Signed-off-by: Bob Copeland <me@bobcopeland.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-28 18:25:19 -07:00
Bob Copeland
3a281f9466 omfs: set error return when d_make_root() fails
A static checker found the following issue in the error path for
omfs_fill_super:

    fs/omfs/inode.c:552 omfs_fill_super()
    warn: missing error code here? 'd_make_root()' failed. 'ret' = '0'

Fix by returning -ENOMEM in this case.

Signed-off-by: Bob Copeland <me@bobcopeland.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-28 18:25:18 -07:00
Sasha Levin
dcbff39da3 fs, omfs: add NULL terminator in the end up the token list
match_token() expects a NULL terminator at the end of the token list so
that it would know where to stop.  Not having one causes it to overrun
to invalid memory.

In practice, passing a mount option that omfs didn't recognize would
sometimes panic the system.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-28 18:25:18 -07:00
Andrew Morton
2b1d3ae940 fs/binfmt_elf.c:load_elf_binary(): return -EINVAL on zero-length mappings
load_elf_binary() returns `retval', not `error'.

Fixes: a87938b2e2 ("fs/binfmt_elf.c: fix bug in loading of PIE binaries")
Reported-by: James Hogan <james.hogan@imgtec.com>
Cc: Michael Davidson <md@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-28 18:25:18 -07:00
Brian Foster
22419ac9fe xfs: fix broken i_nlink accounting for whiteout tmpfile inode
XFS uses the internal tmpfile() infrastructure for the whiteout inode
used for RENAME_WHITEOUT operations. For tmpfile inodes, XFS allocates
the inode, drops di_nlink, adds the inode to the agi unlinked list,
calls d_tmpfile() which correspondingly drops i_nlink of the vfs inode,
and then finishes the common inode setup (e.g., clear I_NEW and unlock).

The d_tmpfile() call was originally made inxfs_create_tmpfile(), but was
pulled up out of that function as part of the following commit to
resolve a deadlock issue:

	330033d6 xfs: fix tmpfile/selinux deadlock and initialize security

As a result, callers of xfs_create_tmpfile() are responsible for either
calling d_tmpfile() or fixing up i_nlink appropriately. The whiteout
tmpfile allocation helper does neither. As a result, the vfs ->i_nlink
becomes inconsistent with the on-disk ->di_nlink once xfs_rename() links
it back into the source dentry and calls xfs_bumplink().

Update the assert in xfs_rename() to help detect this problem in the
future and update xfs_rename_alloc_whiteout() to decrement the link
count as part of the manual tmpfile inode setup.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29 08:14:55 +10:00
Dave Chinner
cddc116228 xfs: xfs_iozero can return positive errno
It was missed when we converted everything in XFs to use negative error
numbers, so fix it now. Bug introduced in 3.17 by commit 2451337 ("xfs: global
error sign conversion"), and should go back to stable kernels.

Thanks to Brian Foster for noticing it.

cc: <stable@vger.kernel.org> # 3.17, 3.18, 3.19, 4.0
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29 07:40:32 +10:00
Dave Chinner
6dfe5a049f xfs: xfs_attr_inactive leaves inconsistent attr fork state behind
xfs_attr_inactive() is supposed to clean up the attribute fork when
the inode is being freed. While it removes attribute fork extents,
it completely ignores attributes in local format, which means that
there can still be active attributes on the inode after
xfs_attr_inactive() has run.

This leads to problems with concurrent inode writeback - the in-core
inode attribute fork is removed without locking on the assumption
that nothing will be attempting to access the attribute fork after a
call to xfs_attr_inactive() because it isn't supposed to exist on
disk any more.

To fix this, make xfs_attr_inactive() completely remove all traces
of the attribute fork from the inode, regardless of it's state.
Further, also remove the in-core attribute fork structure safely so
that there is nothing further that needs to be done by callers to
clean up the attribute fork. This means we can remove the in-core
and on-disk attribute forks atomically.

Also, on error simply remove the in-memory attribute fork. There's
nothing that can be done with it once we have failed to remove the
on-disk attribute fork, so we may as well just blow it away here
anyway.

cc: <stable@vger.kernel.org> # 3.12 to 4.0
Reported-by: Waiman Long <waiman.long@hp.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29 07:40:08 +10:00
Dave Chinner
6dea405eee xfs: extent size hints can round up extents past MAXEXTLEN
This results in BMBT corruption, as seen by this test:

# mkfs.xfs -f -d size=40051712b,agcount=4 /dev/vdc
....
# mount /dev/vdc /mnt/scratch
# xfs_io -ft -c "extsize 16m" -c "falloc 0 30g" -c "bmap -vp" /mnt/scratch/foo

which results in this failure on a debug kernel:

XFS: Assertion failed: (blockcount & xfs_mask64hi(64-BMBT_BLOCKCOUNT_BITLEN)) == 0, file: fs/xfs/libxfs/xfs_bmap_btree.c, line: 211
....
Call Trace:
 [<ffffffff814cf0ff>] xfs_bmbt_set_allf+0x8f/0x100
 [<ffffffff814cf18d>] xfs_bmbt_set_all+0x1d/0x20
 [<ffffffff814f2efe>] xfs_iext_insert+0x9e/0x120
 [<ffffffff814c7956>] ? xfs_bmap_add_extent_hole_real+0x1c6/0xc70
 [<ffffffff814c7956>] xfs_bmap_add_extent_hole_real+0x1c6/0xc70
 [<ffffffff814caaab>] xfs_bmapi_write+0x72b/0xed0
 [<ffffffff811c72ac>] ? kmem_cache_alloc+0x15c/0x170
 [<ffffffff814fe070>] xfs_alloc_file_space+0x160/0x400
 [<ffffffff81ddcc29>] ? down_write+0x29/0x60
 [<ffffffff815063eb>] xfs_file_fallocate+0x29b/0x310
 [<ffffffff811d2bc8>] ? __sb_start_write+0x58/0x120
 [<ffffffff811e3e18>] ? do_vfs_ioctl+0x318/0x570
 [<ffffffff811cd680>] vfs_fallocate+0x140/0x260
 [<ffffffff811ce6f8>] SyS_fallocate+0x48/0x80
 [<ffffffff81ddec09>] system_call_fastpath+0x12/0x17

The tracepoint that indicates the extent that triggered the assert
failure is:

xfs_iext_insert:   idx 0 offset 0 block 16777224 count 2097152 flag 1

Clearly indicating that the extent length is greater than MAXEXTLEN,
which is 2097151. A prior trace point shows the allocation was an
exact size match and that a length greater than MAXEXTLEN was asked
for:

xfs_alloc_size_done:  agno 1 agbno 8 minlen 2097152 maxlen 2097152
					    ^^^^^^^        ^^^^^^^

We don't see this problem with extent size hints through the IO path
because we can't do single IOs large enough to trigger MAXEXTLEN
allocation. fallocate(), OTOH, is not limited in it's allocation
sizes and so needs help here.

The issue is that the extent size hint alignment is rounding up the
extent size past MAXEXTLEN, because xfs_bmapi_write() is not taking
into account extent size hints when calculating the maximum extent
length to allocate. xfs_bmapi_reserve_delalloc() is already doing
this, but direct extent allocation is not.

Unfortunately, the calculation in xfs_bmapi_reserve_delalloc() is
wrong, and it works only because delayed allocation extents are not
limited in size to MAXEXTLEN in the in-core extent tree. hence this
calculation does not work for direct allocation, and the delalloc
code needs fixing. This may, in fact be the underlying bug that
occassionally causes transaction overruns in delayed allocation
extent conversion, so now we know it's wrong we should fix it, too.
Many thanks to Brian Foster for finding this problem during review
of this patch.

Hence the fix, after much code reading, is to allow
xfs_bmap_extsize_align() to align partial extents when full
alignment would extend the alignment past MAXEXTLEN. We can safely
do this because all callers have higher layer allocation loops that
already handle short allocations, and so will simply run another
allocation to cover the remainder of the requested allocation range
that we ignored during alignment. The advantage of this approach is
that it also removes the need for callers to do anything other than
limit their requests to MAXEXTLEN - they don't really need to be
aware of extent size hints at all.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29 07:40:06 +10:00
Dave Chinner
8c1903d308 xfs: inode and free block counters need to use __percpu_counter_compare
Because the counters use a custom batch size, the comparison
functions need to be aware of that batch size otherwise the
comparison does not work correctly. This leads to ASSERT failures
on generic/027 like this:

 XFS: Assertion failed: 0, file: fs/xfs/xfs_mount.c, line: 1099
 ------------[ cut here ]------------
....
 Call Trace:
  [<ffffffff81522a39>] xfs_mod_icount+0x99/0xc0
  [<ffffffff815285cb>] xfs_trans_unreserve_and_mod_sb+0x28b/0x5b0
  [<ffffffff8152f941>] xfs_log_commit_cil+0x321/0x580
  [<ffffffff81528e17>] xfs_trans_commit+0xb7/0x260
  [<ffffffff81503d4d>] xfs_bmap_finish+0xcd/0x1b0
  [<ffffffff8151da41>] xfs_inactive_ifree+0x1e1/0x250
  [<ffffffff8151dbe0>] xfs_inactive+0x130/0x200
  [<ffffffff81523a21>] xfs_fs_evict_inode+0x91/0xf0
  [<ffffffff811f3958>] evict+0xb8/0x190
  [<ffffffff811f433b>] iput+0x18b/0x1f0
  [<ffffffff811e8853>] do_unlinkat+0x1f3/0x320
  [<ffffffff811d548a>] ? filp_close+0x5a/0x80
  [<ffffffff811e999b>] SyS_unlinkat+0x1b/0x40
  [<ffffffff81e0892e>] system_call_fastpath+0x12/0x71

This is a regression introduced by commit 501ab32 ("xfs: use generic
percpu counters for inode counter").

This patch fixes the same problem for both the inode counter and the
free block counter in the superblocks.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29 07:39:34 +10:00
George Wang
74f9ce1cf2 xfs: use percpu_counter_read_positive for mp->m_icount
Function percpu_counter_read just return the current counter, which can be
negative. This will cause the checking of "allocated inode
counts <= m_maxicount" false positive. Use percpu_counter_read_positive can
solve this problem, and be consistent with the purpose to introduce percpu
mechanism to xfs.

Signed-off-by: George Wang <xuw2015@gmail.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29 07:39:34 +10:00
Linus Torvalds
de182468d1 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "Back from SambaXP - now have 8 small CIFS bug fixes to merge"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Fix race condition on RFC1002_NEGATIVE_SESSION_RESPONSE
  Fix to convert SURROGATE PAIR
  cifs: potential missing check for posix_lock_file_wait
  Fix to check Unique id and FileType when client refer file directly.
  CIFS: remove an unneeded NULL check
  [cifs] fix null pointer check
  Fix that several functions handle incorrect value of mapchars
  cifs: Don't replace dentries for dfs mounts
2015-05-27 14:09:16 -07:00
Linus Torvalds
3cfd4ba7d3 Merge branch 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull two overlayfs fixes from Miklos Szeredi:
 "Overlayfs rmdir() failed to check for emptiness in one case; this was
  introduced in 4.0.  The other bug was there since day one: failure to
  mount if upper fs is full, which bit some OpenWRT folks"

* 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: mount read-only if workdir can't be created
  ovl: don't remove non-empty opaque directory
2015-05-27 09:47:57 -07:00
Linus Torvalds
7ce14f6ff2 Merge branch 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "I fixed up a regression from 4.0 where conversion between different
  raid levels would sometimes bail out without converting.

  Filipe tracked down a race where it was possible to double allocate
  chunks on the drive.

  Mark has a fix for fiemap.  All three will get bundled off for stable
  as well"

* 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix regression in raid level conversion
  Btrfs: fix racy system chunk allocation when setting block group ro
  btrfs: clear 'ret' in btrfs_check_shared() loop
2015-05-23 11:14:10 -07:00
Federico Sauter
4afe260bab CIFS: Fix race condition on RFC1002_NEGATIVE_SESSION_RESPONSE
This patch fixes a race condition that occurs when connecting
to a NT 3.51 host without specifying a NetBIOS name.
In that case a RFC1002_NEGATIVE_SESSION_RESPONSE is received
and the SMB negotiation is reattempted, but under some conditions
it leads SendReceive() to hang forever while waiting for srv_mutex.
This, in turn, sets the calling process to an uninterruptible sleep
state and makes it unkillable.

The solution is to unlock the srv_mutex acquired in the demux
thread *before* going to sleep (after the reconnect error) and
before reattempting the connection.
2015-05-20 13:25:55 -05:00