This prints out eb->bflags since it contains some useful information,
e.g. whether eb is dirty.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Merge misc fixes from Andrew Morton:
"13 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, thp: do not cause memcg oom for thp
mm/vmscan: wake up flushers for legacy cgroups too
Revert "mm: page_alloc: skip over regions of invalid pfns where possible"
mm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink()
mm/thp: do not wait for lock_page() in deferred_split_scan()
mm/khugepaged.c: convert VM_BUG_ON() to collapse fail
x86/mm: implement free pmd/pte page interfaces
mm/vmalloc: add interfaces to free unmapped page table
h8300: remove extraneous __BIG_ENDIAN definition
hugetlbfs: check for pgoff value overflow
lockdep: fix fs_reclaim warning
MAINTAINERS: update Mark Fasheh's e-mail
mm/mempolicy.c: avoid use uninitialized preferred_node
A vma with vm_pgoff large enough to overflow a loff_t type when
converted to a byte offset can be passed via the remap_file_pages system
call. The hugetlbfs mmap routine uses the byte offset to calculate
reservations and file size.
A sequence such as:
mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
will result in the following when task exits/file closed,
kernel BUG at mm/hugetlb.c:749!
Call Trace:
hugetlbfs_evict_inode+0x2f/0x40
evict+0xcb/0x190
__dentry_kill+0xcb/0x150
__fput+0x164/0x1e0
task_work_run+0x84/0xa0
exit_to_usermode_loop+0x7d/0x80
do_syscall_64+0x18b/0x190
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
The overflowed pgoff value causes hugetlbfs to try to set up a mapping
with a negative range (end < start) that leaves invalid state which
causes the BUG.
The previous overflow fix to this code was incomplete and did not take
the remap_file_pages system call into account.
[mike.kravetz@oracle.com: v3]
Link: http://lkml.kernel.org/r/20180309002726.7248-1-mike.kravetz@oracle.com
[akpm@linux-foundation.org: include mmdebug.h]
[akpm@linux-foundation.org: fix -ve left shift count on sh]
Link: http://lkml.kernel.org/r/20180308210502.15952-1-mike.kravetz@oracle.com
Fixes: 045c7a3f53 ("hugetlbfs: fix offset overflow in hugetlbfs mmap")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Nic Losby <blurbdust@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from David Miller:
1) Always validate XFRM esn replay attribute, from Florian Westphal.
2) Fix RCU read lock imbalance in xfrm_get_tos(), from Xin Long.
3) Don't try to get firmware dump if not loaded in iwlwifi, from Shaul
Triebitz.
4) Fix BPF helpers to deal with SCTP GSO SKBs properly, from Daniel
Axtens.
5) Fix some interrupt handling issues in e1000e driver, from Benjamin
Poitier.
6) Use strlcpy() in several ethtool get_strings methods, from Florian
Fainelli.
7) Fix rhlist dup insertion, from Paul Blakey.
8) Fix SKB leak in netem packet scheduler, from Alexey Kodanev.
9) Fix driver unload crash when link is up in smsc911x, from Jeremy
Linton.
10) Purge out invalid socket types in l2tp_tunnel_create(), from Eric
Dumazet.
11) Need to purge the write queue when TCP connections are aborted,
otherwise userspace using MSG_ZEROCOPY can't close the fd. From
Soheil Hassas Yeganeh.
12) Fix double free in error path of team driver, from Arkadi
Sharshevsky.
13) Filter fixes for hv_netvsc driver, from Stephen Hemminger.
14) Fix non-linear packet access in ipv6 ndisc code, from Lorenzo
Bianconi.
15) Properly filter out unsupported feature flags in macvlan driver,
from Shannon Nelson.
16) Don't request loading the diag module for a protocol if the protocol
itself is not even registered. From Xin Long.
17) If datagram connect fails in ipv6, make sure the socket state is
consistent afterwards. From Paolo Abeni.
18) Use after free in qed driver, from Dan Carpenter.
19) If received ipv4 PMTU is less than the min pmtu, lock the mtu in the
entry. From Sabrina Dubroca.
20) Fix sleep in atomic in tg3 driver, from Jonathan Toppins.
21) Fix vlan in vlan untagging in some situations, from Toshiaki Makita.
22) Fix double SKB free in genlmsg_mcast(). From Nicolas Dichtel.
23) Fix NULL derefs in error paths of tcf_*_init(), from Davide Caratti.
24) Unbalanced PM runtime calls in FEC driver, from Florian Fainelli.
25) Memory leak in gemini driver, from Igor Pylypiv.
26) IDR leaks in error paths of tcf_*_init() functions, from Davide
Caratti.
27) Need to use GFP_ATOMIC in seg6_build_state(), from David Lebrun.
28) Missing dev_put() in error path of macsec_newlink(), from Dan
Carpenter.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (201 commits)
macsec: missing dev_put() on error in macsec_newlink()
net: dsa: Fix functional dsa-loop dependency on FIXED_PHY
hv_netvsc: common detach logic
hv_netvsc: change GPAD teardown order on older versions
hv_netvsc: use RCU to fix concurrent rx and queue changes
hv_netvsc: disable NAPI before channel close
net/ipv6: Handle onlink flag with multipath routes
ppp: avoid loop in xmit recursion detection code
ipv6: sr: fix NULL pointer dereference when setting encap source address
ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state
net: aquantia: driver version bump
net: aquantia: Implement pci shutdown callback
net: aquantia: Allow live mac address changes
net: aquantia: Add tx clean budget and valid budget handling logic
net: aquantia: Change inefficient wait loop on fw data reads
net: aquantia: Fix a regression with reset on old firmware
net: aquantia: Fix hardware reset when SPI may rarely hangup
s390/qeth: on channel error, reject further cmd requests
s390/qeth: lock read device while queueing next buffer
s390/qeth: when thread completes, wake up all waiters
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJasXu4AAoJECebzXlCjuG+fNcP/3QYZwLdIYe0pZSIn/9V6sW1
PwmxhTkBOEkeP4RTArI/4spZk3mrTqv7SC+ACYKbkD910uDC8cY3ElcwKxi82NL7
9M4gNGnscKJtUye2LhoamZZLOLDIzr302pEv0IIh3v11sAu0rK1K0bqHSfI1ga/q
r3S1sRtad+UktVH+jJ8MNE4NBfXC/xTqPz6Z+wd30CPmJKaA7Oqle7Wd3479zj0c
A7l+EtleFo/bctHwY7yp+lVs1Yqr3ShS+qu+YAknBTnGco9JJFJsZM6x33DFMZC4
EvFPO0gh4PuN/SkgT4zBvBlSXQjuqUWTU98ohqP7ugTtd0dXEIpR4DLLwWScA2KV
z+YIJIKUh8Ryh8KI+J50Ed9WseVlmACHOBTJjLW7KnSYtey3+mgh5kp6Vcd9IxHc
Gju7YgDK5FOV2JykaHQ/qCaSyL6ao1firapSrtZK4N27LpJ05FreK6qnLJtHmiwU
pqVHyfrH4kfho1T0Nr/bw4bnugBKZVXBbBQLKp2twgJhYkUkP+vxV8QMX3h531hB
jsQV9OIVPkdTjr5L0BmQMJ4GugM1Zr4mIunY6Mw0z5ax1JeBUelWeQccqL1YCSGL
TZr+0N9yOuzQHX+e+mZ8vFYqfUTRF/Xfkn1+ZV3ZQU/yY8oBkDbQddaLoo1hjLUx
2dpR3xIhzyDkP59jrK4M
=oGAf
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.16-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd fix from Bruce Fields:
"Just one fix for an occasional panic from Jeff Layton"
* tag 'nfsd-4.16-1' of git://linux-nfs.org/~bfields/linux:
nfsd: remove blocked locks on client teardown
The sysfs_create_link_nowarn() is going to be used in phylib framework in
subsequent patch which can be built as module. Hence, export
sysfs_create_link_nowarn() to avoid build errors.
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Fixes: a399546049 ("net: phy: Relax error checking on sysfs_create_link()")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We had some reports of panics in nfsd4_lm_notify, and that showed a
nfs4_lockowner that had outlived its so_client.
Ensure that we walk any leftover lockowners after tearing down all of
the stateids, and remove any blocked locks that they hold.
With this change, we also don't need to walk the nbl_lru on nfsd_net
shutdown, as that will happen naturally when we tear down the clients.
Fixes: 76d348fadf (nfsd: have nfsd4_lock use blocking locks for v4.1+ locks)
Reported-by: Frank Sorenson <fsorenso@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # 4.9
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlqrzY4ACgkQxWXV+ddt
WDtV4BAAiiv5XECNB3lIvcoFjst8E7xIJNeKKzFZh8HNm+L94zP00uqrpHkwQSUv
tk9TGSYgbJJZMhw6I1rwYSKsoTkvP6NVMQZjpkJktWto9NXObKPvJzjKMdB+GYQR
TWGLBaHsMhruMTOewhdVsxA5od5+LravygHaLmTDQmho8jZSSEe0qqpqlakRI9sQ
+VZv1PDEI68IDMKMjOT7de7Ssq9c99BpGNXxvi5fy5kijyfgrs/BUfHBNV5r66MS
y0Yw1Ab+nlLC7cE5Ql24/snDojcLoSl7f5eYt7ib+DAmsJucYQzTfUetFcrYf8IC
EAmFs5Br6pbopi5M1IOxGZABNbr0qMS+zLvoF52cyka32nrmK/IMWl1s8bW0xfva
gxkjouzJDOTMq28asSdyu73i9yFyLB+tEHpojECljo+K5ASzvDad34xqVRlxY6vN
UJm4d76OqTfAyTRo7sQRWfLz/Cb1W+/v0mPotPWHWVyF0bUNsBgX5YKJhma3pU/z
rHwhI/fqg0NTXBcNUMCosEwS2u5vnjIfiOHkDTl1Dv747S/jM0AaNEvpjDo1cOuk
lsl4xKoDkQ6RpBCeFxRCW0wk1Nl/Su6RQ217xTtND+aIa4c6G7A9X8QAjj2XNAuK
CktJJD7wHYSBTu1sQ+u2R/c2QEgkAxABr4NpmWTPCBNlIKEjd4A=
=bXFe
-----END PGP SIGNATURE-----
Merge tag 'for-4.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"There's an important revert in this pull request that needs to go to
stable as it causes a corruption on big endian machines.
The other fix is for FIEMAP incorrectly reporting shared extents
before a sync and one fix for a crash in raid56.
So far we got only one report about the BE corruption, the stable
kernels were out for like a week, so hopefully the scope of the damage
is low"
* tag 'for-4.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Revert "btrfs: use proper endianness accessors for super_copy"
btrfs: add missing initialization in btrfs_check_shared
btrfs: Fix NULL pointer exception in find_bio_stripe
This reverts commit 3c181c12c4.
The offending patch was merged in 4.16-rc4 and was promptly applied to
stable kernels 4.14.25 and 4.15.8.
The patch causes a corruption in several superblock items on big-endian
machines because of messed up endianity conversions. The damage is
manually repairable. A filesystem cannot be mounted again after it has
been unmounted once.
We do a full revert and not a fixup so stable can pick that patch ASAP.
Fixes: 3c181c12c4 ("btrfs: use proper endianness accessors for super_copy")
Link: https://lkml.kernel.org/r/1521139304@msgid.manchmal.in-ulm.de
CC: stable@vger.kernel.org # 4.14+
Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull vfs fixes from Al Viro:
- backport-friendly part of lock_parent() race fix
- a fix for an assumption in the heurisic used by path_connected() that
is not true on NFS
- livelock fixes for d_alloc_parallel()
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Teach path_connected to handle nfs filesystems with multiple roots.
fs: dcache: Use READ_ONCE when accessing i_dir_seq
fs: dcache: Avoid livelock between d_alloc_parallel and __d_add
lock_parent() needs to recheck if dentry got __dentry_kill'ed under it
On nfsv2 and nfsv3 the nfs server can export subsets of the same
filesystem and report the same filesystem identifier, so that the nfs
client can know they are the same filesystem. The subsets can be from
disjoint directory trees. The nfsv2 and nfsv3 filesystems provides no
way to find the common root of all directory trees exported form the
server with the same filesystem identifier.
The practical result is that in struct super s_root for nfs s_root is
not necessarily the root of the filesystem. The nfs mount code sets
s_root to the root of the first subset of the nfs filesystem that the
kernel mounts.
This effects the dcache invalidation code in generic_shutdown_super
currently called shrunk_dcache_for_umount and that code for years
has gone through an additional list of dentries that might be dentry
trees that need to be freed to accomodate nfs.
When I wrote path_connected I did not realize nfs was so special, and
it's hueristic for avoiding calling is_subdir can fail.
The practical case where this fails is when there is a move of a
directory from the subtree exposed by one nfs mount to the subtree
exposed by another nfs mount. This move can happen either locally or
remotely. With the remote case requiring that the move directory be cached
before the move and that after the move someone walks the path
to where the move directory now exists and in so doing causes the
already cached directory to be moved in the dcache through the magic
of d_splice_alias.
If someone whose working directory is in the move directory or a
subdirectory and now starts calling .. from the initial mount of nfs
(where s_root == mnt_root), then path_connected as a heuristic will
not bother with the is_subdir check. As s_root really is not the root
of the nfs filesystem this heuristic is wrong, and the path may
actually not be connected and path_connected can fail.
The is_subdir function might be cheap enough that we can call it
unconditionally. Verifying that will take some benchmarking and
the result may not be the same on all kernels this fix needs
to be backported to. So I am avoiding that for now.
Filesystems with snapshots such as nilfs and btrfs do something
similar. But as the directory tree of the snapshots are disjoint
from one another and from the main directory tree rename won't move
things between them and this problem will not occur.
Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Fixes: 397d425dc2 ("vfs: Test for and handle paths that are unreachable from their mnt_root")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch addresses an issue that causes fiemap to falsely
report a shared extent. The test case is as follows:
xfs_io -f -d -c "pwrite -b 16k 0 64k" -c "fiemap -v" /media/scratch/file5
sync
xfs_io -c "fiemap -v" /media/scratch/file5
which gives the resulting output:
wrote 65536/65536 bytes at offset 0
64 KiB, 4 ops; 0.0000 sec (121.359 MiB/sec and 7766.9903 ops/sec)
/media/scratch/file5:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: 24576..24703 128 0x2001
/media/scratch/file5:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: 24576..24703 128 0x1
This is because btrfs_check_shared calls find_parent_nodes
repeatedly in a loop, passing a share_check struct to report
the count of shared extent. But btrfs_check_shared does not
re-initialize the count value to zero for subsequent calls
from the loop, resulting in a false share count value. This
is a regressive behavior from 4.13.
With proper re-initialization the test result is as follows:
wrote 65536/65536 bytes at offset 0
64 KiB, 4 ops; 0.0000 sec (110.035 MiB/sec and 7042.2535 ops/sec)
/media/scratch/file5:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: 24576..24703 128 0x1
/media/scratch/file5:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..127]: 24576..24703 128 0x1
which corrects the regression.
Fixes: 3ec4d3238a ("btrfs: allow backref search checks for shared extents")
Signed-off-by: Edmund Nadolski <enadolski@suse.com>
[ add text from cover letter to changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
While converting ioctx index from a list to a table, db446a08c2
("aio: convert the ioctx list to table lookup v3") missed tagging
kioctx_table->table[] as an array of RCU pointers and using the
appropriate RCU accessors. This introduces a small window in the
lookup path where init and access may race.
Mark kioctx_table->table[] with __rcu and use the approriate RCU
accessors when using the field.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jann Horn <jannh@google.com>
Fixes: db446a08c2 ("aio: convert the ioctx list to table lookup v3")
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org # v3.12+
While fixing refcounting, e34ecee2ae ("aio: Fix a trinity splat")
incorrectly removed explicit RCU grace period before freeing kioctx.
The intention seems to be depending on the internal RCU grace periods
of percpu_ref; however, percpu_ref uses a different flavor of RCU,
sched-RCU. This can lead to kioctx being freed while RCU read
protected dereferences are still in progress.
Fix it by updating free_ioctx() to go through call_rcu() explicitly.
v2: Comment added to explain double bouncing.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jann Horn <jannh@google.com>
Fixes: e34ecee2ae ("aio: Fix a trinity splat")
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org # v3.13+
Hightlights include the following stable fixes:
- NFS: Fix an incorrect type in struct nfs_direct_req
- pNFS: Prevent the layout header refcount going to zero in pnfs_roc()
- NFS: Fix unstable write completion
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaprb9AAoJEGcL54qWCgDysK0P/Rwyc7fhcs4CvxETQT0LhbbB
P6XDAb5pBeU+ca8ZWzqlpe1uKc0z5ykuTLuxLt2QvSb9+6h+8T1gxOa5SibehnwY
6UizLDgHMuY2V4wqm4XLHiZX7J0Evdo4rgNCp7qq2/TinBPYKQsqIz34+s9BZDti
6LG6WiVhOtkhs6s/sRbAn+FfMbSiQn54iQIgFPeO+3zEzaaLzUZqlVHO5yjxVnxh
6oYGkEPVzP09URUO/HJ1VTUZJnso1/axEcH9qhuJzZ+pANCpuBjSMoFzzE6oI2mD
dfD8UJZjwqLb7AsFhgSQUrDXzaI0Xg7ccrImtCp4QLlTjja3TlOYMYbK9kDRHC1/
kT93cSFSIeSGpqLVSdNyiz7pCh2Cps2U0JTbhwE0R1NEskWQPzSwY73jEwiA/bex
JOZBxtjVZt8TgOKHxPO65rvQwaq3T31KrQSDLqv3JReI0epJeqng0h+pwiZluNAs
TFjE1aP0l567A3qQnPrHrGZ4IIs8XYobZ+MBxf9x5PaRWnGoPUkhHXMP2j8JCnwa
ofPBR4Mx5WhnsB7Z5vsYetHtmD99QbJ0+WH9ObilUFt6gUgXoXCrvexIUq+LD3DP
HIIqfbSorAIlQxtCwEz7m85B7VEKByaCWj8OiFt5VVAreEhiazmsncsyh8R5fa88
wJDxRe3QmNu7BEJ9NJKi
=IQGX
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.16-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"Hightlights include the following stable fixes:
- NFS: Fix an incorrect type in struct nfs_direct_req
- pNFS: Prevent the layout header refcount going to zero in
pnfs_roc()
- NFS: Fix unstable write completion"
* tag 'nfs-for-4.16-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Fix unstable write completion
pNFS: Prevent the layout header refcount going to zero in pnfs_roc()
NFS: Fix an incorrect type in struct nfs_direct_req
Pull overlayfs fixes from Miklos Szeredi:
"This fixes a corner case for NFS exporting (introduced in this cycle)
as well as fixing miscellaneous bugs"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: update Kconfig texts
ovl: redirect_dir=nofollow should not follow redirect for opaque lower
ovl: fix ptr_ret.cocci warnings
ovl: check ERR_PTR() return value from ovl_lookup_real()
ovl: check lower ancestry on encode of lower dir file handle
ovl: hash non-dir by lower inode for fsnotify
- Fix some iomap locking problems
- Don't allocate cow blocks when we're zeroing file data
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCgAGBQJamePeAAoJEPh/dxk0SrTryPgP/0sh8xcAFUHR0ad7D74nSp3o
1gjBi6Yl+i+hn5SZ/gkKPa5YFjTMaefcTV8rpZm3uPYbTf75TTscjKDDczFNEqoh
CDKOeyfOchVSjhRI5ExElBh5ToxDgh+HMzzlmxk+olfWA+BXjJj2J2iLh871v9Ym
hQl7xOYUDPu1xZz+jLhDLKG2Gd/R97ez2KZM43gKMAUsQ7eUbz6CtV73cFT+tNfP
jjIe3xp6ohbJYJalMThbNcI2mOOssnCM8BHUDBJib7N4CXucJYTMpDAcGvNoFKul
+K2Icyip1/6CS4LkWDpP29ayiH+sTbZk8/ipioe27hXXHGqKG84HQOBefRZmo4o0
CC64SoomuxQX6V7qL6hf0Sz82QQbl+ToSliZ97xXP5o84/OxhggZGqVdykTR2+DB
4POaJzRPmYJKPO4Q0yT2worB4oepAUrridJnaQE0y9F80xnLTvJxpD53atLCHulj
Ht9NlvnvqndDYaYjQwvucOmdR9vCos3OjZCaXnYgL+EpbgZUvMeQoxuyBUBTdQeQ
CNGq3EDssI75gD67sWeyFc+gOST259XofANw3mvlu3KS2/huviORJkDYdtzi2/LL
gkCN9OW9CmXOaAiylE8dSAu98SAR+c83wmahMeBA2mq5vYrQU4rIDGYZi/Mulne4
tD7eW5nlvps9Fsvf58fd
=T9nM
-----END PGP SIGNATURE-----
Merge tag 'xfs-4.16-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
- Fix some iomap locking problems
- Don't allocate cow blocks when we're zeroing file data
* tag 'xfs-4.16-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: don't block on the ilock for RWF_NOWAIT
xfs: don't start out with the exclusive ilock for direct I/O
xfs: don't allocate COW blocks for zeroing holes or unwritten extents
We do want to respect the FLUSH_SYNC argument to nfs_commit_inode() to
ensure that all outstanding COMMIT requests to the inode in question are
complete. Currently we may exit early from both nfs_commit_inode() and
nfs_write_inode() even if there are COMMIT requests in flight, or unstable
writes on the commit list.
In order to get the right semantics w.r.t. sync_inode(), we don't need
to have nfs_commit_inode() reset the inode dirty flags when called from
nfs_wb_page() and/or nfs_wb_all(). We just need to ensure that
nfs_write_inode() leaves them in the right state if there are outstanding
commits, or stable pages.
Reported-by: Scott Mayhew <smayhew@redhat.com>
Fixes: dc4fd9ab01 ("nfs: don't wait on commit in nfs_commit_inode()...")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Ensure that we hold a reference to the layout header when processing
the pNFS return-on-close so that the refcount value does not inadvertently
go to zero.
Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.10+
Tested-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
The start offset needs to be of type loff_t.
Fixed: 5fadeb47dc ("nfs: count DIO good bytes correctly with mirroring")
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
It turns out that commit 3229c18c0d6b2 'Fixes to "Implement iomap for
block_map"' introduced another bug in gfs2_iomap_begin that can cause
gfs2_block_map to set bh->b_size of an actual buffer to 0. This can
lead to arbitrary incorrect behavior including crashes or disk
corruption. Revert the incorrect part of that commit.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Add some hints about overlayfs kernel config options.
Enabling NFS export by default is especially recommended against, as it
incurs a performance penalty even if the filesystem is not actually
exported.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlqbuK8ACgkQxWXV+ddt
WDsiVQ//eE8Axfw4qWOyHHjozoAIu9kifvOFQIhvKviLiXHJNrD/vBI6YwD1hyD1
rbbLilMsEl1OD1Sq3AeOUMNSSl5qUFEB+CA8vg9GznFTNRobTkL+p96Zt5xlRDu3
lsFFV93tED+dK4D/eSGP+xYbknA8hIk/2gWPkd6hpYKyh2QdsPBYDqCnaEXvd79P
DIP/cAjIfzqQn0FTiZ9wbaES+LPO+NoZgQRC2w9McYQ5CEMc+oAChEmPJRwpPoKy
YdhuwoULniRNHVnVOIVfi4w9jkSPSz7YIgLeRFli/WGBYGcKeHTMFkMa12KdpuUC
JUclOogJ5ZMbFV3C0W8XEJ7Vb9ltIevrH8MgfKP/3BScuZzbJZ+n5KkH2Gf7vcpe
w5cZGOsKDz+35fDCdTmsFpDK9kpGzHq48JlRifOjARbdyqNwVq4emxOeQlO1ygzq
Y9H5UeMpp/FDAm6g/bV8ezCXzwuwUDV9CwAJBD+WCiZhD2nX85FfIp1kfF2zLcUg
Y8irqV6A/J/0BFkF7Iu9AuzTxOdo6zMLkcHEKb+sL7yGxZ2248v5gZxDoyV5iNWR
WY4Y2GaMemZGu6+NyyLFAJzlFyACD5fSy8vT7KD6upCzwmlAtaJ3VULfTrLyi/uM
SNZAKf3WzKBVe5h52GNGRCi5OLTteH6Yqz5m5/h8r74mBO00IrA=
=bF96
-----END PGP SIGNATURE-----
Merge tag 'for-4.16-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- when NR_CPUS is large, a SRCU structure can significantly inflate
size of the main filesystem structure that would not be possible to
allocate by kmalloc, so the kvalloc fallback is used
- improved error handling
- fix endiannes when printing some filesystem attributes via sysfs,
this is could happen when a filesystem is moved between different
endianity hosts
- send fixes: the NO_HOLE mode should not send a write operation for a
file hole
- fix log replay for for special files followed by file hardlinks
- fix log replay failure after unlink and link combination
- fix max chunk size calculation for DUP allocation
* tag 'for-4.16-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: fix log replay failure after unlink and link combination
Btrfs: fix log replay failure after linking special file and fsync
Btrfs: send, fix issuing write op when processing hole in no data mode
btrfs: use proper endianness accessors for super_copy
btrfs: alloc_chunk: fix DUP stripe size handling
btrfs: Handle btrfs_set_extent_delalloc failure in relocate_file_extent_cluster
btrfs: handle failure of add_pending_csums
btrfs: use kvzalloc to allocate btrfs_fs_info
delayed and three error path memory leak fixups from Chengguang.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJamV3qAAoJEEp/3jgCEfOLv6kIAI0RXShR20dz9M3GnL2bDws7
KFkC/aNwbviXMAih3q7Q7ZiwXv6BI/uUK0YIYG/tWZoNHucYGS8ZOFDikdiIMcJR
2VP8Yas1/NvnLlh0ut45s2imvbXinkHcjWOLqgJJmJ2cEP9Ar/mbwz//TgfVMdNm
TSHEm2LPRge9jK3Ab/ZiatExGH6dx5hRJvssSauF+f2iN+q4nKj2aBqle+FKaNWT
Iv0ONi2f7aXnaXRvum4Y1gnKWXzKSFd8Y2Cr5R3tfTrKBv8nmQz54eRsMnnuxK/7
uV6Ujch4LUXBI1etXLqTwRdZQIcw3Li5PYxGrO8fl8mQfJau9yv5REw/rbgB5FU=
=f8gA
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-4.16-rc4' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A cap handling fix from Zhi that ensures that metadata writeback isn't
delayed and three error path memory leak fixups from Chengguang"
* tag 'ceph-for-4.16-rc4' of git://github.com/ceph/ceph-client:
ceph: fix potential memory leak in init_caches()
ceph: fix dentry leak when failing to init debugfs
libceph, ceph: avoid memory leak when specifying same option several times
ceph: flush dirty caps of unlinked inode ASAP
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJamXImAAoJEPfTWPspceCmEP4P/3kkm0JIXtbNZFMb1JZtsjwE
t4OUVEDj4jjRfmZfUVkajPnczM4MSPiXm43PbcOi4NF53mv8k76jyIPhlZREzYzq
MBknibvpqyiWpbii9tBRrR6FGDR/N51//ya9vdPaYBcBssTg6Aqtt4BE5oPfo011
PleGROe1jtrBUNBy2dMy4sHb/MvZ0vRuNPxMsD8Agy5UiVeItAelY/lDn1Hw41BY
O+muE5bw6+yKqB9vGXhV3O4WRh8BofJi1YdzbwbbIzH40ZZK5VTDQc5o19/CFEZ/
uZ8BStOFEWA0LNuarME5fknWcogiedEtszweyiWBbVZo4VqCsfxPoaRCibY/Wg5F
a0UNJ4iSzglhfSMoHJlhvlCAMCyubFSeMSdJjrrpIcyBrziJXpcEXcUnWI43yi4P
FoM8zUni22XnfLWxIdTjVkMRytjtqTLcXOHXdP5N/ESa80jBq3Q76TLmzIKW+kK5
sAre+hgr52NdgovP/NSxsdvsckAolWNe40JI8wLbwNo+lMHr0ckzOG+sAdz1iPRK
iVL0CAlby4A94Wcu+OHCwfY7B9lBrMuMfHsesEM6x1cxgAhd3YNfEJ8g2QolCUEV
KmZizXbV9nnmJfegVC06SgM+D7AR26dwsBG2aoibShuvdxX6jMdUHygyu5DCJdg/
JS+q71jmxb/r1TWe/62r
=AMhV
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20180302' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A collection of fixes for this series. This is a little larger than
usual at this time, but that's mainly because I was out on vacation
last week. Nothing in here is major in any way, it's just two weeks of
fixes. This contains:
- NVMe pull from Keith, with a set of fixes from the usual suspects.
- mq-deadline zone unlock fix from Damien, fixing an issue with the
SMR zone locking added for 4.16.
- two bcache fixes sent in by Michael, with changes from Coly and
Tang.
- comment typo fix from Eric for blktrace.
- return-value error handling fix for nbd, from Gustavo.
- fix a direct-io case where we don't defer to a completion handler,
making us sleep from IRQ device completion. From Jan.
- a small series from Jan fixing up holes around handling of bdev
references.
- small set of regression fixes from Jiufei, mostly fixing problems
around the gendisk pointer -> partition index change.
- regression fix from Ming, fixing a boundary issue with the discard
page cache invalidation.
- two-patch series from Ming, fixing both a core blk-mq-sched and
kyber issue around token freeing on a requeue condition"
* tag 'for-linus-20180302' of git://git.kernel.dk/linux-block: (24 commits)
block: fix a typo
block: display the correct diskname for bio
block: fix the count of PGPGOUT for WRITE_SAME
mq-deadline: Make sure to always unlock zones
nvmet: fix PSDT field check in command format
nvme-multipath: fix sysfs dangerously created links
nbd: fix return value in error handling path
bcache: fix kcrashes with fio in RAID5 backend dev
bcache: correct flash only vols (check all uuids)
blktrace_api.h: fix comment for struct blk_user_trace_setup
blockdev: Avoid two active bdev inodes for one device
genhd: Fix BUG in blkdev_open()
genhd: Fix use after free in __blkdev_get()
genhd: Add helper put_disk_and_module()
genhd: Rename get_disk() to get_disk_and_module()
genhd: Fix leaked module reference for NVME devices
direct-io: Fix sleep in atomic due to sync AIO
nvme-pci: Fix nvme queue cleanup if IRQ setup fails
block: kyber: fix domain token leak during requeue
blk-mq: don't call io sched's .requeue_request when requeueing rq to ->dispatch
...
Fix xfs_file_iomap_begin to trylock the ilock if IOMAP_NOWAIT is passed,
so that we don't block io_submit callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
There is no reason to take the ilock exclusively at the start of
xfs_file_iomap_begin for direct I/O, given that it will be demoted
just before calling xfs_iomap_write_direct anyway.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The iomap zeroing interface is smart enough to skip zeroing holes or
unwritten extents. Don't subvert this logic for reflink files.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
There is lack of cache destroy operation for ceph_file_cachep
when failing from fscache register.
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
If we have a file with 2 (or more) hard links in the same directory,
remove one of the hard links, create a new file (or link an existing file)
in the same directory with the name of the removed hard link, and then
finally fsync the new file, we end up with a log that fails to replay,
causing a mount failure.
Example:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ mkdir /mnt/testdir
$ touch /mnt/testdir/foo
$ ln /mnt/testdir/foo /mnt/testdir/bar
$ sync
$ unlink /mnt/testdir/bar
$ touch /mnt/testdir/bar
$ xfs_io -c "fsync" /mnt/testdir/bar
<power failure>
$ mount /dev/sdb /mnt
mount: mount(2) failed: /mnt: No such file or directory
When replaying the log, for that example, we also see the following in
dmesg/syslog:
[71813.671307] BTRFS info (device dm-0): failed to delete reference to bar, inode 258 parent 257
[71813.674204] ------------[ cut here ]------------
[71813.675694] BTRFS: Transaction aborted (error -2)
[71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 __btrfs_unlink_inode+0x17b/0x355 [btrfs]
[71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last unloaded: btrfs]
[71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: G W 4.15.0-rc9-btrfs-next-56+ #1
[71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs]
[71813.679669] RSP: 0018:ffffc90001cef738 EFLAGS: 00010286
[71813.679669] RAX: 0000000000000025 RBX: ffff880217ce4708 RCX: 0000000000000001
[71813.679669] RDX: 0000000000000000 RSI: ffffffff81c14bae RDI: 00000000ffffffff
[71813.679669] RBP: ffffc90001cef7c0 R08: 0000000000000001 R09: 0000000000000001
[71813.679669] R10: ffffc90001cef5e0 R11: ffffffff8343f007 R12: ffff880217d474c8
[71813.679669] R13: 00000000fffffffe R14: ffff88021ccf1548 R15: 0000000000000101
[71813.679669] FS: 00007f7cee84c480(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
[71813.679669] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[71813.679669] CR2: 00007f7cedc1abf9 CR3: 00000002354b4003 CR4: 00000000001606e0
[71813.679669] Call Trace:
[71813.679669] btrfs_unlink_inode+0x17/0x41 [btrfs]
[71813.679669] drop_one_dir_item+0xfa/0x131 [btrfs]
[71813.679669] add_inode_ref+0x71e/0x851 [btrfs]
[71813.679669] ? __lock_is_held+0x39/0x71
[71813.679669] ? replay_one_buffer+0x53/0x53a [btrfs]
[71813.679669] replay_one_buffer+0x4a4/0x53a [btrfs]
[71813.679669] ? rcu_read_unlock+0x3a/0x57
[71813.679669] ? __lock_is_held+0x39/0x71
[71813.679669] walk_up_log_tree+0x101/0x1d2 [btrfs]
[71813.679669] walk_log_tree+0xad/0x188 [btrfs]
[71813.679669] btrfs_recover_log_trees+0x1fa/0x31e [btrfs]
[71813.679669] ? replay_one_extent+0x544/0x544 [btrfs]
[71813.679669] open_ctree+0x1cf6/0x2209 [btrfs]
[71813.679669] btrfs_mount_root+0x368/0x482 [btrfs]
[71813.679669] ? trace_hardirqs_on_caller+0x14c/0x1a6
[71813.679669] ? __lockdep_init_map+0x176/0x1c2
[71813.679669] ? mount_fs+0x64/0x10b
[71813.679669] mount_fs+0x64/0x10b
[71813.679669] vfs_kern_mount+0x68/0xce
[71813.679669] btrfs_mount+0x13e/0x772 [btrfs]
[71813.679669] ? trace_hardirqs_on_caller+0x14c/0x1a6
[71813.679669] ? __lockdep_init_map+0x176/0x1c2
[71813.679669] ? mount_fs+0x64/0x10b
[71813.679669] mount_fs+0x64/0x10b
[71813.679669] vfs_kern_mount+0x68/0xce
[71813.679669] do_mount+0x6e5/0x973
[71813.679669] ? memdup_user+0x3e/0x5c
[71813.679669] SyS_mount+0x72/0x98
[71813.679669] entry_SYSCALL_64_fastpath+0x1e/0x8b
[71813.679669] RIP: 0033:0x7f7cedf150ba
[71813.679669] RSP: 002b:00007ffca71da688 EFLAGS: 00000206
[71813.679669] Code: 7f a0 e8 51 0c fd ff 48 8b 43 50 f0 0f ba a8 30 2c 00 00 02 72 17 41 83 fd fb 74 11 44 89 ee 48 c7 c7 7d 11 7f a0 e8 38 f5 8d e0 <0f> ff 44 89 e9 ba 20 10 00 00 eb 4d 48 8b 4d b0 48 8b 75 88 4c
[71813.679669] ---[ end trace 83bd473fc5b4663b ]---
[71813.854764] BTRFS: error (device dm-0) in __btrfs_unlink_inode:4128: errno=-2 No such entry
[71813.886994] BTRFS: error (device dm-0) in btrfs_replay_log:2307: errno=-2 No such entry (Failed to recover log tree)
[71813.903357] BTRFS error (device dm-0): cleaner transaction attach returned -30
[71814.128078] BTRFS error (device dm-0): open_ctree failed
This happens because the log has inode reference items for both inode 258
(the first file we created) and inode 259 (the second file created), and
when processing the reference item for inode 258, we replace the
corresponding item in the subvolume tree (which has two names, "foo" and
"bar") witht he one in the log (which only has one name, "foo") without
removing the corresponding dir index keys from the parent directory.
Later, when processing the inode reference item for inode 259, which has
a name of "bar" associated to it, we notice that dir index entries exist
for that name and for a different inode, so we attempt to unlink that
name, which fails because the inode reference item for inode 258 no longer
has the name "bar" associated to it, making a call to btrfs_unlink_inode()
fail with a -ENOENT error.
Fix this by unlinking all the names in an inode reference item from a
subvolume tree that are not present in the inode reference item found in
the log tree, before overwriting it with the item from the log tree.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If in the same transaction we rename a special file (fifo, character/block
device or symbolic link), create a hard link for it having its old name
then sync the log, we will end up with a log that can not be replayed and
at when attempting to replay it, an EEXIST error is returned and mounting
the filesystem fails. Example scenario:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt
$ mkdir /mnt/testdir
$ mkfifo /mnt/testdir/foo
# Make sure everything done so far is durably persisted.
$ sync
# Create some unrelated file and fsync it, this is just to create a log
# tree. The file must be in the same directory as our special file.
$ touch /mnt/testdir/f1
$ xfs_io -c "fsync" /mnt/testdir/f1
# Rename our special file and then create a hard link with its old name.
$ mv /mnt/testdir/foo /mnt/testdir/bar
$ ln /mnt/testdir/bar /mnt/testdir/foo
# Create some other unrelated file and fsync it, this is just to persist
# the log tree which was modified by the previous rename and link
# operations. Alternatively we could have modified file f1 and fsync it.
$ touch /mnt/f2
$ xfs_io -c "fsync" /mnt/f2
<power failure>
$ mount /dev/sdc /mnt
mount: mount /dev/sdc on /mnt failed: File exists
This happens because when both the log tree and the subvolume's tree have
an entry in the directory "testdir" with the same name, that is, there
is one key (258 INODE_REF 257) in the subvolume tree and another one in
the log tree (where 258 is the inode number of our special file and 257
is the inode for directory "testdir"). Only the data of those two keys
differs, in the subvolume tree the index field for inode reference has
a value of 3 while the log tree it has a value of 5. Because the same key
exists in both trees, but have different index, the log replay fails with
an -EEXIST error when attempting to replay the inode reference from the
log tree.
Fix this by setting the last_unlink_trans field of the inode (our special
file) to the current transaction id when a hard link is created, as this
forces logging the parent directory inode, solving the conflict at log
replay time.
A new generic test case for fstests was also submitted.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When doing an incremental send of a filesystem with the no-holes feature
enabled, we end up issuing a write operation when using the no data mode
send flag, instead of issuing an update extent operation. Fix this by
issuing the update extent operation instead.
Trivial reproducer:
$ mkfs.btrfs -f -O no-holes /dev/sdc
$ mkfs.btrfs -f /dev/sdd
$ mount /dev/sdc /mnt/sdc
$ mount /dev/sdd /mnt/sdd
$ xfs_io -f -c "pwrite -S 0xab 0 32K" /mnt/sdc/foobar
$ btrfs subvolume snapshot -r /mnt/sdc /mnt/sdc/snap1
$ xfs_io -c "fpunch 8K 8K" /mnt/sdc/foobar
$ btrfs subvolume snapshot -r /mnt/sdc /mnt/sdc/snap2
$ btrfs send /mnt/sdc/snap1 | btrfs receive /mnt/sdd
$ btrfs send --no-data -p /mnt/sdc/snap1 /mnt/sdc/snap2 \
| btrfs receive -vv /mnt/sdd
Before this change the output of the second receive command is:
receiving snapshot snap2 uuid=f6922049-8c22-e544-9ff9-fc6755918447...
utimes
write foobar, offset 8192, len 8192
utimes foobar
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=f6922049-8c22-e544-9ff9-...
After this change it is:
receiving snapshot snap2 uuid=564d36a3-ebc8-7343-aec9-bf6fda278e64...
utimes
update_extent foobar: offset=8192, len=8192
utimes foobar
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=564d36a3-ebc8-7343-aec9-bf6fda278e64...
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The fs_info::super_copy is a byte copy of the on-disk structure and all
members must use the accessor macros/functions to obtain the right
value. This was missing in update_super_roots and in sysfs readers.
Moving between opposite endianness hosts will report bogus numbers in
sysfs, and mount may fail as the root will not be restored correctly. If
the filesystem is always used on a same endian host, this will not be a
problem.
Fix this by using the btrfs_set_super...() functions to set
fs_info::super_copy values, and for the sysfs, use the cached
fs_info::nodesize/sectorsize values.
CC: stable@vger.kernel.org
Fixes: df93589a17 ("btrfs: export more from FS_INFO to sysfs")
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
In case of using DUP, we search for enough unallocated disk space on a
device to hold two stripes.
The devices_info[ndevs-1].max_avail that holds the amount of unallocated
space found is directly assigned to stripe_size, while it's actually
twice the stripe size.
Later on in the code, an unconditional division of stripe_size by
dev_stripes corrects the value, but in the meantime there's a check to
see if the stripe_size does not exceed max_chunk_size. Since during this
check stripe_size is twice the amount as intended, the check will reduce
the stripe_size to max_chunk_size if the actual correct to be used
stripe_size is more than half the amount of max_chunk_size.
The unconditional division later tries to correct stripe_size, but will
actually make sure we can't allocate more than half the max_chunk_size.
Fix this by moving the division by dev_stripes before the max chunk size
check, so it always contains the right value, instead of putting a duct
tape division in further on to get it fixed again.
Since in all other cases than DUP, dev_stripes is 1, this change only
affects DUP.
Other attempts in the past were made to fix this:
* 37db63a400 "Btrfs: fix max chunk size check in chunk allocator" tried
to fix the same problem, but still resulted in part of the code acting
on a wrongly doubled stripe_size value.
* 86db25785a "Btrfs: fix max chunk size on raid5/6" unintentionally
broke this fix again.
The real problem was already introduced with the rest of the code in
73c5de0051.
The user visible result however will be that the max chunk size for DUP
will suddenly double, while it's actually acting according to the limits
in the code again like it was 5 years ago.
Reported-by: Naohiro Aota <naohiro.aota@wdc.com>
Link: https://www.spinics.net/lists/linux-btrfs/msg69752.html
Fixes: 73c5de0051 ("btrfs: quasi-round-robin for chunk allocation")
Fixes: 86db25785a ("Btrfs: fix max chunk size on raid5/6")
Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
Essentially duplicate the error handling from the above block which
handles the !PageUptodate(page) case and additionally clear
EXTENT_BOUNDARY.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
add_pending_csums was added as part of the new data=ordered
implementation in e6dcd2dc9c ("Btrfs: New data=ordered
implementation"). Even back then it called the btrfs_csum_file_blocks
which can fail but it never bothered handling the failure. In ENOMEM
situation this could lead to the filesystem failing to write the
checksums for a particular extent and not detect this. On read this
could lead to the filesystem erroring out due to crc mismatch. Fix it by
propagating failure from add_pending_csums and handling them.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The srcu_struct in btrfs_fs_info scales in size with NR_CPUS. On
kernels built with NR_CPUS=8192, this can result in kmalloc failures
that prevent mounting.
There is work in progress to try to resolve this for every user of
srcu_struct but using kvzalloc will work around the failures until
that is complete.
As an example with NR_CPUS=512 on x86_64: the overall size of
subvol_srcu is 3460 bytes, fs_info is 6496.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
- Fix some compiler warnings
- Fix block rservations for transactions created during log recovery
- Fix resource leaks when respecifying mount options
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCgAGBQJalEvWAAoJEPh/dxk0SrTrdDcQAKgYcpg8Ip6uKNqc38hm79l+
RXjHQFbpgKiojBzjgI8+lPNsVBEhSSh65rXB6nEVSS/TFS0ONScRbNKrcH9Selkn
cH1RsuhKk1NvKWaLMMFJWMTZK5Z6cHLtJU2szqnPdCsv/2EqdS6NylyYSFhljtSl
xbD8vffjnJ1HU9ijZsSoZkiu0DO1yoXYu7EUQkWKPRSb/el+qYIMKSwFC8qQdxrp
KGPyYH1CENm2jarKXAgTgqmUmrdJ9ikLHLT0sXQPZ9AbOjOoGlZ9Bn0KmvggQFmq
TyE0je7L2EWrDRJ6R+lcFKC0fHDgo7ec8Iz/CJlOiExSebbNZgtsn0bbPMfq2Rnz
8IMYPAV+NBY8RQWumgBN2aOjyjV9EUd+TkeJh5aIubyFOE2GtEmHjlR4p0bG9os9
yOZJv+5JDF09oN1dLDf/xpEwXsHho6KHDYtVqbKhBWfQiw84sAlW9/NwfQOEugJS
6RXN3LaExSvpFSc9qcGqsrdGvEuMcNLo+XtTwz9g8DNR0Ztp1bFE64xh4KYlNsx5
QQj256Hx56R7vZ2/DC73MiT/hOSgfPpqnwOZP+Yc7I3DeO65DLwdCt9m2c0f1Odn
6xi3ZPXuq+QutJEp3iAfj5XXVmwSVgcub8EmtVmyK2fttiK3+SXbZdvo6JqyuEMX
ZxwBaZkNuYWikHhj5iAp
=A2Bo
-----END PGP SIGNATURE-----
Merge tag 'xfs-4.16-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
- fix some compiler warnings
- fix block reservations for transactions created during log recovery
- fix resource leaks when respecifying mount options
* tag 'xfs-4.16-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix potential memory leak in mount option parsing
xfs: reserve blocks for refcount / rmap log item recovery
xfs: use memset to initialize xfs_scrub_agfl_info
When specifying string type mount option (e.g., logdev)
several times in a mount, current option parsing may
cause memory leak. Hence, call kfree for previous one
in this case.
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
When blkdev_open() races with device removal and creation it can happen
that unhashed bdev inode gets associated with newly created gendisk
like:
CPU0 CPU1
blkdev_open()
bdev = bd_acquire()
del_gendisk()
bdev_unhash_inode(bdev);
remove device
create new device with the same number
__blkdev_get()
disk = get_gendisk()
- gets reference to gendisk of the new device
Now another blkdev_open() will not find original 'bdev' as it got
unhashed, create a new one and associate it with the same 'disk' at
which point problems start as we have two independent page caches for
one device.
Fix the problem by verifying that the bdev inode didn't get unhashed
before we acquired gendisk reference. That way we make sure gendisk can
get associated only with visible bdev inodes.
Tested-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When two blkdev_open() calls race with device removal and recreation,
__blkdev_get() can use looked up gendisk after it is freed:
CPU0 CPU1 CPU2
del_gendisk(disk);
bdev_unhash_inode(inode);
blkdev_open() blkdev_open()
bdev = bd_acquire(inode);
- creates and returns new inode
bdev = bd_acquire(inode);
- returns the same inode
__blkdev_get(devt) __blkdev_get(devt)
disk = get_gendisk(devt);
- got structure of device going away
<finish device removal>
<new device gets
created under the same
device number>
disk = get_gendisk(devt);
- got new device structure
if (!bdev->bd_openers) {
does the first open
}
if (!bdev->bd_openers)
- false
} else {
put_disk_and_module(disk)
- remember this was old device - this was last ref and disk is
now freed
}
disk_unblock_events(disk); -> oops
Fix the problem by making sure we drop reference to disk in
__blkdev_get() only after we are really done with it.
Reported-by: Hou Tao <houtao1@huawei.com>
Tested-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a proper counterpart to get_disk_and_module() -
put_disk_and_module(). Currently it is opencoded in several places.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit e864f39569 "fs: add RWF_DSYNC aand RWF_SYNC" added additional
way for direct IO to become synchronous and thus trigger fsync from the
IO completion handler. Then commit 9830f4be15 "fs: Use RWF_* flags for
AIO operations" allowed these flags to be set for AIO as well. However
that commit forgot to update the condition checking whether the IO
completion handling should be defered to a workqueue and thus AIO DIO
with RWF_[D]SYNC set will call fsync() from IRQ context resulting in
sleep in atomic.
Fix the problem by checking directly iocb flags (the same way as it is
done in dio_complete()) instead of checking all conditions that could
lead to IO being synchronous.
CC: Christoph Hellwig <hch@lst.de>
CC: Goldwyn Rodrigues <rgoldwyn@suse.com>
CC: stable@vger.kernel.org
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 9830f4be15
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
redirect_dir=nofollow should not follow a redirect. But in a specific
configuration it can still follow it. For example try this.
$ mkdir -p lower0 lower1/foo upper work merged
$ touch lower1/foo/lower-file.txt
$ setfattr -n "trusted.overlay.opaque" -v "y" lower1/foo
$ mount -t overlay -o lowerdir=lower1:lower0,workdir=work,upperdir=upper,redirect_dir=on none merged
$ cd merged
$ mv foo foo-renamed
$ umount merged
# mount again. This time with redirect_dir=nofollow
$ mount -t overlay -o lowerdir=lower1:lower0,workdir=work,upperdir=upper,redirect_dir=nofollow none merged
$ ls merged/foo-renamed/
# This lists lower-file.txt, while it should not have.
Basically, we are doing redirect check after we check for d.stop. And
if this is not last lower, and we find an opaque lower, d.stop will be
set.
ovl_lookup_single()
if (!d->last && ovl_is_opaquedir(this)) {
d->stop = d->opaque = true;
goto out;
}
To fix this, first check redirect is allowed. And after that check if
d.stop has been set or not.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Fixes: 438c84c2f0 ("ovl: don't follow redirects if redirect_dir=off")
Cc: <stable@vger.kernel.org> #v4.15
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
When failing from ceph_fs_debugfs_init() in ceph_real_mount(),
there is lack of dput of root_dentry and it causes slab errors,
so change the calling order of ceph_fs_debugfs_init() and
open_root_dentry() and do some cleanups to avoid this issue.
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
When parsing string option, in order to avoid memory leak we need to
carefully free it first in case of specifying same option several times.
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
fs/overlayfs/export.c:459:10-16: WARNING: PTR_ERR_OR_ZERO can be used
Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
Generated by: scripts/coccinelle/api/ptr_ret.cocci
Fixes: 4b91c30a5a ("ovl: lookup connected ancestor of dir in inode cache")
CC: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Hightlights include:
- Fix a broken cast in nfs4_callback_recallany()
- Fix an Oops during NFSv4 migration events
- make struct nlmclnt_fl_close_lock_ops static
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJakuNIAAoJEGcL54qWCgDykYYQAITHRrWP7tQ6aSpZxW5+Un5z
6K3RRbfxFjHWVyaePCBzRMOtPTA/puqO0ggx9H+2D4+u2GeXhFl7FMdIuLueGKrc
rh0wzB6+KiHvqK8NT3g4c2VzZbGJ8IWB6jlNaA3ZyHRJcO+Oi3rQhYBNZpVqP6ny
M1C3yXQTUtA13aOLjeThoAKIJyknwdZcsiMTptJslvSsQ9PL0w6m6jZKrVHu6Rc+
Hg12FFptaKien/gj2IUYJb6Z2Mz3arJu1Y7cm1P/zH/NBs37ynMUsrb9AvPbzvRm
PvPRT4ugNOlTgDTaIT2JHwP2bhlp2JF+Tdzq7WYE3ek2CEPUD49jv07MlgTHxy/w
+tcp/322ZCxvKLjHTeEWqGn4T0TZ1TdPrd4dIJsjox9Ffy72Z3rOjvLKt5UrRV0i
8IhiE3/ruHFpB75Yfi7ABEIH8aEwwmchQTf5bth0ZKoZdaEPHmy3xnJkxa+wlD1V
Hp6KoqMNlDXEFx2Ih/SD6j50MFKszq6+cjUk2D8iclLnelXhu9iddFQ+PFNfsxVZ
WSo4AWZoPbtbLnn9Ez9dkdsJILKv86LbEvYxLX6/LnxLzzX70E34tbRQa+drVeR3
Na6czRpld85juKWgiFkzNx+zD4TaBAxUMbQH2Gbwngiz5RoT6SnkkdkAK3EW7nhR
pIJgZGiq/NG3NLiCxIgs
=l2jv
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.16-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- fix a broken cast in nfs4_callback_recallany()
- fix an Oops during NFSv4 migration events
- make struct nlmclnt_fl_close_lock_ops static
* tag 'nfs-for-4.16-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: make struct nlmclnt_fl_close_lock_ops static
nfs: system crashes after NFS4ERR_MOVED recovery
NFSv4: Fix broken cast in nfs4_callback_recallany()
i_dir_seq is subject to concurrent modification by a cmpxchg or
store-release operation, so ensure that the relaxed access in
d_alloc_parallel uses READ_ONCE.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If d_alloc_parallel runs concurrently with __d_add, it is possible for
d_alloc_parallel to continuously retry whilst i_dir_seq has been
incremented to an odd value by __d_add:
CPU0:
__d_add
n = start_dir_add(dir);
cmpxchg(&dir->i_dir_seq, n, n + 1) == n
CPU1:
d_alloc_parallel
retry:
seq = smp_load_acquire(&parent->d_inode->i_dir_seq) & ~1;
hlist_bl_lock(b);
bit_spin_lock(0, (unsigned long *)b); // Always succeeds
CPU0:
__d_lookup_done(dentry)
hlist_bl_lock
bit_spin_lock(0, (unsigned long *)b); // Never succeeds
CPU1:
if (unlikely(parent->d_inode->i_dir_seq != seq)) {
hlist_bl_unlock(b);
goto retry;
}
Since the simple bit_spin_lock used to implement hlist_bl_lock does not
provide any fairness guarantees, then CPU1 can starve CPU0 of the lock
and prevent it from reaching end_dir_add(dir), therefore CPU1 cannot
exit its retry loop because the sequence number always has the bottom
bit set.
This patch resolves the livelock by not taking hlist_bl_lock in
d_alloc_parallel if the sequence counter is odd, since any subsequent
masked comparison with i_dir_seq will fail anyway.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: Naresh Madhusudana <naresh.madhusudana@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In case when dentry passed to lock_parent() is protected from freeing only
by the fact that it's on a shrink list and trylock of parent fails, we
could get hit by __dentry_kill() (and subsequent dentry_kill(parent))
between unlocking dentry and locking presumed parent. We need to recheck
that dentry is alive once we lock both it and parent *and* postpone
rcu_read_unlock() until after that point. Otherwise we could return
a pointer to struct dentry that already is rcu-scheduled for freeing, with
->d_lock held on it; caller's subsequent attempt to unlock it can end
up with memory corruption.
Cc: stable@vger.kernel.org # 3.12+, counting backports
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull siginfo fix from Eric Biederman:
"This fixes a build error that only shows up on blackfin"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
fs/signalfd: fix build error for BUS_MCEERR_AR
During log recovery, the per-AG reservations aren't yet set up, so log
recovery has to reserve enough blocks to handle all possible btree
splits.
Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Apparently different gcc versions have competing and
incompatible notions of how to initialize at declaration,
so just give up and fall back to the time-tested memset().
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Fix build error in fs/signalfd.c by using same method that is used in
kernel/signal.c: separate blocks for different signal si_code values.
./fs/signalfd.c: error: 'BUS_MCEERR_AR' undeclared (first use in this function)
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Each read from a file in efivarfs results in two calls to EFI
(one to get the file size, another to get the actual data).
On X86 these EFI calls result in broadcast system management
interrupts (SMI) which affect performance of the whole system.
A malicious user can loop performing reads from efivarfs bringing
the system to its knees.
Linus suggested per-user rate limit to solve this.
So we add a ratelimit structure to "user_struct" and initialize
it for the root user for no limit. When allocating user_struct for
other users we set the limit to 100 per second. This could be used
for other places that want to limit the rate of some detrimental
user action.
In efivarfs if the limit is exceeded when reading, we take an
interruptible nap for 50ms and check the rate limit again.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The structure nlmclnt_fl_close_lock_ops s local to the source and does
not need to be in global scope, so make it static.
Cleans up sparse warning:
fs/nfs/nfs3proc.c:876:33: warning: symbol 'nlmclnt_fl_close_lock_ops' was not
declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
nfs4_update_server unconditionally releases the nfs_client for the
source server. If migration fails, this can cause the source server's
nfs_client struct to be left with a low reference count, resulting in
use-after-free. Also, adjust reference count handling for ELOOP.
NFS: state manager: migration failed on NFSv4 server nfsvmu10 with error 6
WARNING: CPU: 16 PID: 17960 at fs/nfs/client.c:281 nfs_put_client+0xfa/0x110 [nfs]()
nfs_put_client+0xfa/0x110 [nfs]
nfs4_run_state_manager+0x30/0x40 [nfsv4]
kthread+0xd8/0xf0
BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8
nfs4_xdr_enc_write+0x6b/0x160 [nfsv4]
rpcauth_wrap_req+0xac/0xf0 [sunrpc]
call_transmit+0x18c/0x2c0 [sunrpc]
__rpc_execute+0xa6/0x490 [sunrpc]
rpc_async_schedule+0x15/0x20 [sunrpc]
process_one_work+0x160/0x470
worker_thread+0x112/0x540
? rescuer_thread+0x3f0/0x3f0
kthread+0xd8/0xf0
This bug was introduced by 32e62b7c ("NFS: Add nfs4_update_server"),
but the fix applies cleanly to 52442f9b ("NFS4: Avoid migration loops")
Reported-by: Helen Chao <helen.chao@oracle.com>
Fixes: 52442f9b11 ("NFS4: Avoid migration loops")
Signed-off-by: Bill Baker <bill.baker@oracle.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlqG8poACgkQxWXV+ddt
WDuHSA//eC+69XpHwohI6pcPQ7Jbr9UCj1L/Gt0U96YSzijGW4Hv3OQEWLIRBu4c
nZbzQYtUunpguLYwfXgUUgXRHBTo2Y5bXZNmF2MtL7JcPOLhLh4h/IcGY7eRd2Vq
qvv2bqr3yAcQo7s6z5U/D8ulohzHQTxG7Jaq/BkVxQqhvu+vdu/9T8ikAWnmSTjw
lONu8soR5QO7tewxz23Cguw/t1bWe1aMXG9Ykd4avyhQHtgzNE+l82i4DYUhK2CM
x8M5/CxnDLPe73IJuA2INCUtpPvR4Qufi5Nz6EN3BrJNCGBkmg18sPIvWlH6LsVh
bsm4Lwz/piq+hkDq2GG+Z79uiGAfCVUWAsnm7yYHwpVyMvwHKlfrcVSAuRZixw5E
/NZ0JEkEOtvzpv4inZFYbAgD+oKfvYvwj9BW5BXfu2aH6hJBImfAeMSd1aHB3uZI
kGgy52k2v2P3WKQOFUbmW417P05DvvGmRvRmU+tSFpB+lXAZqRzoiVIuFm0xwhf1
1SmnYgnSYzPmzIRXAMsSYQeK/8NXDdMZMutaw/AYwX+QBEdIAErf6MWcjI6XZRyG
g8Gr8JcpwSa+H5/LKN5uswfXxfSAsqVHnZhbOVrjyGX0wyR4KJg3ag3KsHd9SCxb
LDEjPSYEDn9yfmw6pK2Q6J26FGYiKpuUXaNiYVNymGe6162IiBM=
=VeA/
-----END PGP SIGNATURE-----
Merge tag 'for-4.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"We have a few assorted fixes, some of them show up during fstests so I
gave them more testing"
* tag 'for-4.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: Fix use-after-free when cleaning up fs_devs with a single stale device
Btrfs: fix null pointer dereference when replacing missing device
btrfs: remove spurious WARN_ON(ref->count < 0) in find_parent_nodes
btrfs: Ignore errors from btrfs_qgroup_trace_extent_post
Btrfs: fix unexpected -EEXIST when creating new inode
Btrfs: fix use-after-free on root->orphan_block_rsv
Btrfs: fix btrfs_evict_inode to handle abnormal inodes correctly
Btrfs: fix extent state leak from tree log
Btrfs: fix crash due to not cleaning up tree log block's dirty bits
Btrfs: fix deadlock in run_delalloc_nocow
This change relaxes copy up on encode of merge dir with lower layer > 1
and handles the case of encoding a merge dir with lower layer 1, where an
ancestor is a non-indexed merge dir. In that case, decode of the lower
file handle will not have been possible if the non-indexed ancestor is
redirected before or after encode.
Before encoding a non-upper directory file handle from real layer N, we
need to check if it will be possible to reconnect an overlay dentry from
the real lower decoded dentry. This is done by following the overlay
ancestry up to a "layer N connected" ancestor and verifying that all
parents along the way are "layer N connectable". If an ancestor that is
NOT "layer N connectable" is found, we need to copy up an ancestor, which
is "layer N connectable", thus making that ancestor "layer N connected".
For example:
layer 1: /a
layer 2: /a/b/c
The overlay dentry /a is NOT "layer 2 connectable", because if dir /a is
copied up and renamed, upper dir /a will be indexed by lower dir /a from
layer 1. The dir /a from layer 2 will never be indexed, so the algorithm
in ovl_lookup_real_ancestor() (*) will not be able to lookup a connected
overlay dentry from the connected lower dentry /a/b/c.
To avoid this problem on decode time, we need to copy up an ancestor of
/a/b/c, which is "layer 2 connectable", on encode time. That ancestor is
/a/b. After copy up (and index) of /a/b, it will become "layer 2 connected"
and when the time comes to decode the file handle from lower dentry /a/b/c,
ovl_lookup_real_ancestor() will find the indexed ancestor /a/b and decoding
a connected overlay dentry will be accomplished.
(*) the algorithm in ovl_lookup_real_ancestor() can be improved to lookup
an entry /a in the lower layers above layer N and find the indexed dir /a
from layer 1. If that improvement is made, then the check for "layer N
connected" will need to verify there are no redirects in lower layers above
layer N. In the example above, /a will be "layer 2 connectable". However,
if layer 2 dir /a is a target of a layer 1 redirect, then /a will NOT be
"layer 2 connectable":
layer 1: /A (redirect = /a)
layer 2: /a/b/c
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Commit 31747eda41 ("ovl: hash directory inodes for fsnotify")
fixed an issue of inotify watch on directory that stops getting
events after dropping dentry caches.
A similar issue exists for non-dir non-upper files, for example:
$ mkdir -p lower upper work merged
$ touch lower/foo
$ mount -t overlay -o
lowerdir=lower,workdir=work,upperdir=upper none merged
$ inotifywait merged/foo &
$ echo 2 > /proc/sys/vm/drop_caches
$ cat merged/foo
inotifywait doesn't get the OPEN event, because ovl_lookup() called
from 'cat' allocates a new overlay inode and does not reuse the
watched inode.
Fix this by hashing non-dir overlay inodes by lower real inode in
the following cases that were not hashed before this change:
- A non-upper overlay mount
- A lower non-hardlink when index=off
A helper ovl_hash_bylower() was added to put all the logic and
documentation about which real inode an overlay inode is hashed by
into one place.
The issue dates back to initial version of overlayfs, but this
patch depends on ovl_inode code that was introduced in kernel v4.13.
Cc: <stable@vger.kernel.org> #v4.13
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Pull x86 fixes from Ingo Molnar:
"Misc fixes all across the map:
- /proc/kcore vsyscall related fixes
- LTO fix
- build warning fix
- CPU hotplug fix
- Kconfig NR_CPUS cleanups
- cpu_has() cleanups/robustification
- .gitignore fix
- memory-failure unmapping fix
- UV platform fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages
x86/error_inject: Make just_return_func() globally visible
x86/platform/UV: Fix GAM Range Table entries less than 1GB
x86/build: Add arch/x86/tools/insn_decoder_test to .gitignore
x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU
x86/mm/kcore: Add vsyscall page to /proc/kcore conditionally
vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user page
x86/Kconfig: Further simplify the NR_CPUS config
x86/Kconfig: Simplify NR_CPUS config
x86/MCE: Fix build warning introduced by "x86: do not use print_symbol()"
x86/cpufeature: Update _static_cpu_has() to use all named variables
x86/cpufeature: Reindent _static_cpu_has()
This tag is meant for pulling a patch called gfs2: Fixes to
"Implement iomap for block_map". The patch fixes some
regressions we recently discovered in commit 3974320ca6.
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJahFiJAAoJENeLYdPf93o7JKYH/irlIZM7NPHhiOcot1lXG6HL
x1fV9u6Rjw7QimctgM6ks1lu/R7hamNvOCAPz7TFXIo0grWes2qOcZa7tdWqkpZK
TGmSIv+NfrI9NzB3PwleImClfHR8SOgIh/ZlvHQWu9JvKkPlZ3Ik0mZCXbzUFn0I
Q5ebe+yvaaGeU3QUzsdBgTWuYRE0uQfIylyTz7f8wc9PDp2zB2l01CCCbat/VEWe
Jy1HlXSiQsmR0N5ypm5d3AszXJ0zbHfjQzKpNACP59WrRjnKvxsBan7En5pQBFnP
lhLWClqxgtXlvmSb4Takw+Cu9aS2zCYizQ8eqecX5FKQp1Vufoxs48EqRnq55IY=
=vJqP
-----END PGP SIGNATURE-----
Merge tag 'gfs2-4.16.rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fix from Bob Peterson:
"Fix regressions in the gfs2 iomap for block_map implementation we
recently discovered in commit 3974320ca6"
* tag 'gfs2-4.16.rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Fixes to "Implement iomap for block_map"
It turns out that commit 3974320ca6 "Implement iomap for block_map"
introduced a few bugs that trigger occasional failures with xfstest
generic/476:
In gfs2_iomap_begin, we jump to do_alloc when we determine that we are
beyond the end of the allocated metadata (height > ip->i_height).
There, we can end up calling hole_size with a metapath that doesn't
match the current metadata tree, which doesn't make sense. After
untangling the code at do_alloc, fix this by checking if the block we
are looking for is within the range of allocated metadata.
In addition, add a BUG() in case gfs2_iomap_begin is accidentally called
for reading stuffed files: this is handled separately. Make sure we
don't truncate iomap->length for reads beyond the end of the file; in
that case, the entire range counts as a hole.
Finally, revert to taking a bitmap write lock when doing allocations.
It's unclear why that change didn't lead to any failures during testing.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Commit:
df04abfd18 ("fs/proc/kcore.c: Add bounce buffer for ktext data")
... introduced a bounce buffer to work around CONFIG_HARDENED_USERCOPY=y.
However, accessing the vsyscall user page will cause an SMAP fault.
Replace memcpy() with copy_from_user() to fix this bug works, but adding
a common way to handle this sort of user page may be useful for future.
Currently, only vsyscall page requires KCORE_USER.
Signed-off-by: Jia Zhang <zhang.jia@linux.alibaba.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/1518446694-21124-2-git-send-email-zhang.jia@linux.alibaba.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:
for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
done
with de-mangling cleanups yet to come.
NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do. But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.
The next patch from Al will sort out the final differences, and we
should be all done.
Scripted-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull more poll annotation updates from Al Viro:
"This is preparation to solving the problems you've mentioned in the
original poll series.
After this series, the kernel is ready for running
for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
done
as a for bulk search-and-replace.
After that, the kernel is ready to apply the patch to unify
{de,}mangle_poll(), and then get rid of kernel-side POLL... uses
entirely, and we should be all done with that stuff.
Basically, that's what you suggested wrt KPOLL..., except that we can
use EPOLL... instead - they already are arch-independent (and equal to
what is currently kernel-side POLL...).
After the preparations (in this series) switch to returning EPOLL...
from ->poll() instances is completely mechanical and kernel-side
POLL... can go away. The last step (killing kernel-side POLL... and
unifying {de,}mangle_poll() has to be done after the
search-and-replace job, since we need userland-side POLL... for
unified {de,}mangle_poll(), thus the cherry-pick at the last step.
After that we will have:
- POLL{IN,OUT,...} *not* in __poll_t, so any stray instances of
->poll() still using those will be caught by sparse.
- eventpoll.c and select.c warning-free wrt __poll_t
- no more kernel-side definitions of POLL... - userland ones are
visible through the entire kernel (and used pretty much only for
mangle/demangle)
- same behavior as after the first series (i.e. sparc et.al. epoll(2)
working correctly)"
* 'work.poll2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
annotate ep_scan_ready_list()
ep_send_events_proc(): return result via esed->res
preparation to switching ->poll() to returning EPOLL...
add EPOLLNVAL, annotate EPOLL... and event_poll->event
use linux/poll.h instead of asm/poll.h
xen: fix poll misannotation
smc: missing poll annotations
Pull misc vfs fixes from Al Viro.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
seq_file: fix incomplete reset on read from zero offset
kernfs: fix regression in kernfs_fop_write caused by wrong type
-----BEGIN PGP SIGNATURE-----
iQGcBAABAgAGBQJafSqsAAoJEIosvXAHck9RHmEMAJyzkwc503WOl9/ZyagcaDli
4mJEplVgxL6ZcgmaPZrZ1qaZvHd0JWq5bDbPeuuNv+wyqIu14DYVHivaORswfI7y
Q0p0gslWf+hyS637CcmBajgEZbgAZIAkUktC+KPa7lZcUFDEvgYwHnQNuK3yvhBR
zRrWeiumWn4l25ahc8GBA5nZ7tDM5xkLpv8DfI0ycCbm5E+Bqnf23m13hTMT7Mt3
4hBc6iEdi+/IcRkwf5BHEO94hNeWSb4oERLIWxXXkZ3XTSlYtJteV/pdIoJfhHnr
Th453VUwPfkRVVw3h4feZaIKM6kGPStGg1435+6lBpgTWQgNImd/Kcg3d181U3rs
/+iORX2KLwwl6orVQnX5IBiUpnB2+ePpRGjMAGedIPSztMVInGInxxT1UZQtMCIg
fJ6PQ1eH/OlY7WiY16+3YBYvtWPPqJc98P7gyfDocne7ZoT0XkoQ+2YejaNzI2Sz
8Qkw6Y8gLSQ8tC2duV14evlLmynbB1qRL9n99iD06w==
=Ps35
-----END PGP SIGNATURE-----
Merge tag '4.16-minor-rc-SMB3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"There are a couple additional security fixes that are still being
tested that are not in this set."
* tag '4.16-minor-rc-SMB3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
Add missing structs and defines from recent SMB3.1.1 documentation
address lock imbalance warnings in smbdirect.c
cifs: silence compiler warnings showing up with gcc-8.0.0
Add some missing debug fields in server and tcon structs
But it's also a fairly small update this time around. Some cleanup,
RDMA fixes, overlayfs fixes, and a fix for an NFSv4 state bug.
The bigger deal for nfsd this time around is Jeff Layton's
already-merged i_version patches. This series has a minor conflict with
that one, and the resolution should be obvious. (Stephen Rothwell has
been carrying it in linux-next for what it's worth.)
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJafNVvAAoJECebzXlCjuG+yZUP/2SctFtkW638z9frLcIVt5M6
x5hluw5jtFrVqq/KoMwi7rVaMzhdvcgwwfaLciqrPCOmcMKlOqiWslyCV0wZVCZS
jabkOeinKVAyPTlESesNyArWKBWaB8QaYDwbkQ5Y76U9Ma5gwSghS1wc8vrNduZY
2StieESOiOs9LljXf5SqCC5nN9s7gs4qtCK7aZ3JIt4661Lh39LqyO5zxLnc78eL
USnJKHjTSreY2Vd1/TdNWyZhiim43wdrB+jpy6IoocTqyhYalkCz1iYdJn1arqtP
iIddPpczKxkHekFVj7/Kfa+ATFtdXIpivOBhhOT0oY8HukTd58bh/oUMrFt4BSuP
MQst0R9h1sanBE18XBPlXuIK51sm3AjjOGaQycl/Mzes+dMRgIP/KspAcnwwXHqG
gyZsF3VzliFTc9s0SyiAz2AxNTUnjd+LV3E0DUeivURa6V3pc+sFlQzi8PRxRaep
0gmhYcZsfwdDKZ/kbQyQdSWN48NxOLFke4fYjmoUtoyILa0NAHEqafeJkR5EiRTm
tZsL9H/3THEGWygYlXGGBo/J4w5jE3uL/8KkfeuZefzSo0Ujqu0pBALMTnGFLKRx
Mpw7JEqfUwqIVZ0Qh6q9yIcjr89qWv96UpBqRRIkFX5zOPN7B1BH8C89g8qy3Hyt
gm/5BTw4FPE0uAM9Nhsd
=icEX
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.16' of git://linux-nfs.org/~bfields/linux
Pull nfsd update from Bruce Fields:
"A fairly small update this time around. Some cleanup, RDMA fixes,
overlayfs fixes, and a fix for an NFSv4 state bug.
The bigger deal for nfsd this time around was Jeff Layton's
already-merged i_version patches"
* tag 'nfsd-4.16' of git://linux-nfs.org/~bfields/linux:
svcrdma: Fix Read chunk round-up
NFSD: hide unused svcxdr_dupstr()
nfsd: store stat times in fill_pre_wcc() instead of inode times
nfsd: encode stat->mtime for getattr instead of inode->i_mtime
nfsd: return RESOURCE not GARBAGE_ARGS on too many ops
nfsd4: don't set lock stateid's sc_type to CLOSED
nfsd: Detect unhashed stids in nfsd4_verify_open_stid()
sunrpc: remove dead code in svc_sock_setbufsize
svcrdma: Post Receives in the Receive completion handler
nfsd4: permit layoutget of executable-only files
lockd: convert nlm_rqst.a_count from atomic_t to refcount_t
lockd: convert nlm_lockowner.count from atomic_t to refcount_t
lockd: convert nsm_handle.sm_count from atomic_t to refcount_t
1. don't pass garbage return codes back up the call chain (Mike Marshall)
2. fix stale inode test (Martin Brandenburg)
3. fix off-by-one errors (Xiongfeng Wang)
Also: add Martin as a reviewer in the Maintainers file.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaejneAAoJEM9EDqnrzg2+XhoQAIDF112mOwLwqDPmr4ty0g6/
gBcoHOrRFlYWPlS5aubjoZ3jFX2fAeNuHzYS4LIuqVKUdsC+oTKQ2URJ7KKpvLiK
6zOaz2Y4GLns2sa1ZUKli6nEBbPi6uwoF54FNbwt3b+97wpmJwlnXm9ztyt5REKA
zOHvLgJAcfGNZEJ7gyB1zjwllu4JeD0A4MoN4vJCtkKLAaNClywu4+V0jwZB+SSN
8QjDXNqkcD31ahWhQ/CaU4zXlxOOV+4ZR7/p5IKT693hEhV+ikTvmXy8g0+bksxj
L+FHmQMTO+GqCS5FxuBQd3v1IP5FkoHEmAwvr3C5aMlRAaVJ9eVVIZaC9CpOJBRB
S/CiaG2Mw8vx8VGOm8O93Z+xDi9tCYP8x4i7b5r62h0T9wSyHJSkSIUd6VIkCV9Q
c92bX/N3wHBvCPT+RC898plni5HsFpzs3vSs8hiaAICgp64sC8pIqVlZOAdMtJd8
RL4la/Fited/T+3BpaCTkmnvNk8Ktax7wHYsCt4gSyHN8WRvkzowgC5kV6S30Qlh
zfoXG0K50FcU8T5r3i8slvUHmsiyYxYwJIk/z1iDgXI7y4IIR6FGDxQmw5TxgNS7
+veTo6FCxon6QshtpAOeELCau7qNXhtlDdGqqm4+gDfMWoCn0Jem/LzdA2gPXCOr
iCDwHLiu6WXt7ZHTrgln
=xrih
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs updates from Mike Marshall:
"Mostly cleanups, but three bug fixes:
- don't pass garbage return codes back up the call chain (Mike
Marshall)
- fix stale inode test (Martin Brandenburg)
- fix off-by-one errors (Xiongfeng Wang)
Also add Martin as a reviewer in the Maintainers file"
* tag 'for-linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: reverse sense of is-inode-stale test in d_revalidate
orangefs: simplify orangefs_inode_is_stale
Orangefs: don't propogate whacky error codes
orangefs: use correct string length
orangefs: make orangefs_make_bad_inode static
orangefs: remove ORANGEFS_KERNEL_DEBUG
orangefs: remove gossip_ldebug and gossip_lerr
orangefs: make orangefs_client_debug_init static
MAINTAINERS: update orangefs list and add myself as reviewer
-----BEGIN PGP SIGNATURE-----
iQIVAwUAWnx0Mvu3V2unywtrAQI3ng//Xdv2rxVjv4znzekb/EkE9QIakH3ET3wt
hBewQjaGkOWhZKgyE7DnhCMh7y6OrX/oVNtjPU8H7EEHDHVs+nyoGoDu282jlppr
qO7yMbxZwDtpja7O9hVtIViFZSqlEey/RCq1KKRUl/HDmyyOmAvOZHCpyowUqcYD
KqJs9Z2/onkP43rwmoKIQPEeKHxRfAs6pTiAG7fUPYC4d6aSskiN5K65N0g4dx4F
G6pDC/mIJWx2qeeI//CzSxnqhzWAhkozOs9UtvquSrIoNcYMSOQRHGne50n7OqkK
rZCttm4gSlrEU11cPDNExjKU4z8UM3tmVdudntC8wbng5PFCHTR7JB5nZu1bEjqw
TpIjb302QnUefzu1AGge03ZnysqDKKBAxKKwD1gYBHaj7Y2CrqP4lo+6QA4ePYTv
qD7nRZCiQ8rF3PJOYJ7xe944Jziktf6PhnOXyxOSNCv3IT90YD7meOR3MldMjny/
hM2ahYqfWXjLAjH20Q+B8z7ab9GDdVsBTl06w/ZX+RMrg5CNdDaYe0nfG/tS7H3A
oD7xIjUwWjqxMBqtXNUe/3GAOnU+ilEiKjq8gmNkBSjRlpO6SMxi02jOp66HwnRs
tD5qG3Bn2F3hdvEtwcKcS0cVWX511lLF5vkhlBhSbs/XkS+BXULr3vDsl5XclwAw
/07q8HsHlnM=
=fSB4
-----END PGP SIGNATURE-----
Merge tag 'afs-next-20180208' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull afs updates from David Howells:
"Four fixes:
- add a missing put
- two fixes to reset the address iteration cursor correctly
- fix setting up the fileserver iteration cursor.
Two cleanups:
- remove some dead code
- rearrange a function to be more logically laid out
And one new feature:
- Support AFS dynamic root.
With this one should be able to do, say:
mkdir /afs
mount -t afs none /afs -o dyn
to create a dynamic root and then, provided you have keyutils
installed, do:
ls /afs/grand.central.org
and:
ls /afs/umich.edu
to list the root volumes of both those organisations' AFS cells
without requiring any other setup (the kernel upcall to a program
in the keyutils package to do DNS access as does NFS)"
* tag 'afs-next-20180208' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Support the AFS dynamic root
afs: Rearrange afs_select_fileserver() a little
afs: Remove unused code
afs: Fix server list handling
afs: Need to clear responded flag in addr cursor
afs: Fix missing cursor clearance
afs: Add missing afs_put_cell()
big ticket items slated for the next merge window.
On the CephFS side we have a large number of cap handling improvements,
a fix for our long-standing abuse of ->journal_info in ceph_readpages()
and yet another dentry pointer management patch.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJafGqnAAoJEEp/3jgCEfOLjNcH/R6G/xyytDMfxaN+D8DBqCPF
IaQM7RtgYJeRzDIXYYCkDEBPYqLcD2fjHLzFotFNLcgLdeUcSOyfg7NuCOWWq7o2
t4z6Ekyish3GWZLUmlSdPcToQ+xIlMRshU8ZmzCHTCzx8XjO+CAnCADp5dh8OKZx
mCpRX16sXdc6ozE1hsGKIkUoNrkdj8d3+HseZ2Uxb/4FZBNgH3cmmg7c5y6M+sp6
wT4NEES3baqq2v5cVfw7T+d4MNgRm4/JC1aBy1JBkQlmVFNGteQTT7yzo0X1AfJ+
+kcR10ddg0gD4WGYhL+iZlQCfwyMp7vouHQbgTOgt+rDCitjDy5r1BAamtxnZjM=
=ctaD
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-4.16-rc1' of git://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"Things have been very quiet on the rbd side, as work continues on the
big ticket items slated for the next merge window.
On the CephFS side we have a large number of cap handling
improvements, a fix for our long-standing abuse of ->journal_info in
ceph_readpages() and yet another dentry pointer management patch"
* tag 'ceph-for-4.16-rc1' of git://github.com/ceph/ceph-client:
ceph: improving efficiency of syncfs
libceph: check kstrndup() return value
ceph: try to allocate enough memory for reserved caps
ceph: fix race of queuing delayed caps
ceph: delete unreachable code in ceph_check_caps()
ceph: limit rate of cap import/export error messages
ceph: fix incorrect snaprealm when adding caps
ceph: fix un-balanced fsc->writeback_count update
ceph: track read contexts in ceph_file_info
ceph: avoid dereferencing invalid pointer during cached readdir
ceph: use atomic_t for ceph_inode_info::i_shared_gen
ceph: cleanup traceless reply handling for rename
ceph: voluntarily drop Fx cap for readdir request
ceph: properly drop caps for setattr request
ceph: voluntarily drop Lx cap for link/rename requests
ceph: voluntarily drop Ax cap for requests that create new inode
rbd: whitelist RBD_FEATURE_OPERATIONS feature bit
rbd: don't NULL out ->obj_request in rbd_img_obj_parent_read_full()
rbd: use kmem_cache_zalloc() in rbd_img_request_create()
rbd: obj_request->completion is unused
Commit b9f5fb1800 ("cramfs: fix MTD dependency") did what it says.
Since commit 9059a3493e ("kconfig: fix relational operators for bool
and tristate symbols") it is possible to do it slightly better though.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is now only one caller left for svcxdr_dupstr() and this is inside
of an #ifdef, so we can get a warning when the option is disabled:
fs/nfsd/nfs4xdr.c:241:1: error: 'svcxdr_dupstr' defined but not used [-Werror=unused-function]
This changes the remaining caller to use a nicer IS_ENABLED() check,
which lets the compiler drop the unused code silently.
Fixes: e40d99e6183e ("NFSD: Clean up symlink argument XDR decoders")
Suggested-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The time values in stat and inode may differ for overlayfs and stat time
values are the correct ones to use. This is also consistent with the fact
that fill_post_wcc() also stores stat time values.
This means introducing a stat call that could fail, where previously we
were just copying values out of the inode. To be conservative about
changing behavior, we fall back to copying values out of the inode in
the error case. It might be better just to clear fh_pre_saved (though
note the BUG_ON in set_change_info).
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The values of stat->mtime and inode->i_mtime may differ for overlayfs
and stat->mtime is the correct value to use when encoding getattr.
This is also consistent with the fact that other attr times are also
encoded from stat values.
Both callers of lease_get_mtime() already have the value of stat->mtime,
so the only needed change is that lease_get_mtime() will not overwrite
this value with inode->i_mtime in case the inode does not have an
exclusive lease.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
A client that sends more than a hundred ops in a single compound
currently gets an rpc-level GARBAGE_ARGS error.
It would be more helpful to return NFS4ERR_RESOURCE, since that gives
the client a better idea how to recover (for example by splitting up the
compound into smaller compounds).
This is all a bit academic since we've never actually seen a reason for
clients to send such long compounds, but we may as well fix it.
While we're there, just use NFSD4_MAX_OPS_PER_COMPOUND == 16, the
constant we already use in the 4.1 case, instead of hard-coding 100.
Chances anyone actually uses even 16 ops per compound are small enough
that I think there's a neglible risk or any regression.
This fixes pynfs test COMP6.
Reported-by: "Lu, Xinyu" <luxy.fnst@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJae0mSAAoJEAAOaEEZVoIVs98P+wSbwfgLeyTufmrRYrD9kxfh
EQXfuvnJqPzRHLJIUXfwzTN3IV9RZ1434ci31lZvQE3PKrgb90QuBLiR6OIKULef
UqpYRmjsg7BfFBdAnyUR8xSmmeN94PjXQk7tG+YQn096HJVZ6cG5qCA8RjJ9dFoq
2haDcOfDU+3e8mbtrrF4doP6jGrVwV+okqRsshFBclQv62Kk3m7L5AjQINyZpTM5
ZKX5JIMOAmlJcHsz/2J1qLAIRQKsvEUbRLV43bzp3E03PuVFPhig3dVtpGPUe+Yi
OW0JX49hIoTCrQ4KZk6uweLG7ZpaSoppXggEi2ERNCUkCf3nhejLlScfye+yLx7f
sItgPkOYU0VVF70Y72XH1DbOekZr/XCLZdEEUNCS/P68hnyK0gBNC9zPGetlxMMi
wjjQ9Qe45vD2JFlrvhHrdUdCnxnE05zC9ckBrmM94uRwIfDR0WVgo6pfebfRkAJd
Wp4/PfbaySY7vk4oyaXlNxcDIH2NvWwYkioI/K9rRGbB2KjTdXonQojBy+rT0LeS
f3mufyZYyCxdwu3Wf8WO36H23L+4fseMthKIIPA0aL4wasB9LgD8gDnkyKx28DT4
S32tdK4UALC8SAVsPr+vSaMVzKOZmuNHac+XB2i+5lHl8G/n4M2a+JFTeR4CnKJ/
9LsBEBL5Oj7ZXL7lfFIO
=iEKM
-----END PGP SIGNATURE-----
Merge tag 'iversion-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull inode->i_version cleanup from Jeff Layton:
"Goffredo went ahead and sent a patch to rename this function, and
reverse its sense, as we discussed last week.
The patch is very straightforward and I figure it's probably best to
go ahead and merge this to get the API as settled as possible"
* tag 'iversion-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
iversion: Rename make inode_cmp_iversion{+raw} to inode_eq_iversion{+raw}
Pull UDF and ext2 fixlets from Jan Kara:
"A UDF fix and an ext2 cleanup"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2: drop unneeded newline
udf: Sanitize nanoseconds for time stamps
The last two updates to MS-SMB2 protocol documentation added various
flags and structs (especially relating to SMB3.1.1 tree connect).
Add missing defines and structs to smb2pdu.h
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Although at least one of these was an overly strict sparse warning
in the new smbdirect code, it is cleaner to fix - so no warnings.
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
This bug was fixed before, but came up again with the latest
compiler in another function:
fs/cifs/cifssmb.c: In function 'CIFSSMBSetEA':
fs/cifs/cifssmb.c:6362:3: error: 'strncpy' offset 8 is out of the bounds [0, 4] [-Werror=array-bounds]
strncpy(parm_data->list[0].name, ea_name, name_len);
Let's apply the same fix that was used for the other instances.
Fixes: b2a3ad9ca5 ("cifs: silence compiler warnings showing up with gcc-4.7.0")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steve French <smfrench@gmail.com>
Allow dumping out debug information on dialect, signing, unix extensions
and encryption
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
The pipe buffer limits are accessed without any locking, and may be
changed at any time by the sysctl handlers. In theory this could cause
problems for expressions like the following:
pipe_user_pages_hard && user_bufs > pipe_user_pages_hard
... since the assembly code might reference the 'pipe_user_pages_hard'
memory location multiple times, and if the admin removes the limit by
setting it to 0, there is a very brief window where processes could
incorrectly observe the limit to be exceeded.
Fix this by loading the limits with READ_ONCE() prior to use.
Link: http://lkml.kernel.org/r/20180111052902.14409-8-ebiggers3@gmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: "Luis R . Rodriguez" <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
round_pipe_size() calculates the number of pages the requested size
corresponds to, then rounds the page count up to the next power of 2.
However, it also rounds everything < PAGE_SIZE up to PAGE_SIZE.
Therefore, there's no need to actually translate the size into a page
count; we just need to round the size up to the next power of 2.
We do need to verify the size isn't greater than (1 << 31), since on
32-bit systems roundup_pow_of_two() would be undefined in that case. But
that can just be combined with the UINT_MAX check which we need anyway
now.
Finally, update pipe_set_size() to not redundantly check the return value
of round_pipe_size() for the "invalid size" case twice.
Link: http://lkml.kernel.org/r/20180111052902.14409-7-ebiggers3@gmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Luis R . Rodriguez" <mcgrof@kernel.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A pipe's size is represented as an 'unsigned int'. As expected, writing a
value greater than UINT_MAX to /proc/sys/fs/pipe-max-size fails with
EINVAL. However, the F_SETPIPE_SZ fcntl silently truncates such values to
32 bits, rather than failing with EINVAL as expected. (It *does* fail
with EINVAL for values above (1 << 31) but <= UINT_MAX.)
Fix this by moving the check against UINT_MAX into round_pipe_size() which
is called in both cases.
Link: http://lkml.kernel.org/r/20180111052902.14409-6-ebiggers3@gmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Luis R . Rodriguez" <mcgrof@kernel.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With pipe-user-pages-hard set to 'N', users were actually only allowed up
to 'N - 1' buffers; and likewise for pipe-user-pages-soft.
Fix this to allow up to 'N' buffers, as would be expected.
Link: http://lkml.kernel.org/r/20180111052902.14409-5-ebiggers3@gmail.com
Fixes: b0b91d18e2 ("pipe: fix limit checking in pipe_set_size()")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Willy Tarreau <w@1wt.eu>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Luis R . Rodriguez" <mcgrof@kernel.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pipe-user-pages-hard and pipe-user-pages-soft are only supposed to apply
to unprivileged users, as documented in both Documentation/sysctl/fs.txt
and the pipe(7) man page.
However, the capabilities are actually only checked when increasing a
pipe's size using F_SETPIPE_SZ, not when creating a new pipe. Therefore,
if pipe-user-pages-hard has been set, the root user can run into it and be
unable to create pipes. Similarly, if pipe-user-pages-soft has been set,
the root user can run into it and have their pipes limited to 1 page each.
Fix this by allowing the privileged override in both cases.
Link: http://lkml.kernel.org/r/20180111052902.14409-4-ebiggers3@gmail.com
Fixes: 759c01142a ("pipe: limit the per-user amount of pages allocated in pipes")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Luis R . Rodriguez" <mcgrof@kernel.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pipe_proc_fn() is no longer needed, as it only calls through to
proc_dopipe_max_size(). Just put proc_dopipe_max_size() in the ctl_table
entry directly, and remove the unneeded EXPORT_SYMBOL() and the ENOSYS
stub for it.
(The reason the ENOSYS stub isn't needed is that the pipe-max-size
ctl_table entry is located directly in 'kern_table' rather than being
registered separately. Therefore, the entry is already only defined when
the kernel is built with sysctl support.)
Link: http://lkml.kernel.org/r/20180111052902.14409-3-ebiggers3@gmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Luis R . Rodriguez" <mcgrof@kernel.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "pipe: buffer limits fixes and cleanups", v2.
This series simplifies the sysctl handler for pipe-max-size and fixes
another set of bugs related to the pipe buffer limits:
- The root user wasn't allowed to exceed the limits when creating new
pipes.
- There was an off-by-one error when checking the limits, so a limit of
N was actually treated as N - 1.
- F_SETPIPE_SZ accepted values over UINT_MAX.
- Reading the pipe buffer limits could be racy.
This patch (of 7):
Before validating the given value against pipe_min_size,
do_proc_dopipe_max_size_conv() calls round_pipe_size(), which rounds the
value up to pipe_min_size. Therefore, the second check against
pipe_min_size is redundant. Remove it.
Link: http://lkml.kernel.org/r/20180111052902.14409-2-ebiggers3@gmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Luis R . Rodriguez" <mcgrof@kernel.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 7994e6f725 ("vfs: Move waiting for inode writeback from
end_writeback() to evict_inode()") removed inode_sync_wait() from
end_writeback() and commit dbd5768f87 ("vfs: Rename end_writeback() to
clear_inode()") renamed end_writeback() to clear_inode().
After these patches there is no sleeping operation in clear_inode().
So, remove might_sleep() from it.
Link: http://lkml.kernel.org/r/20171108004354.40308-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When creating a file inside a directory that has the setgid flag set, give
the new file the group ID of the parent, and also the setgid flag if it is
a directory itself.
Link: http://lkml.kernel.org/r/20171204192705.GA6101@debian.home
Signed-off-by: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com>
Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The superblock and segment timestamps are used only internally in nilfs2
and can be read out using sysfs.
Since we are using the old 'get_seconds()' interface and store the data
as timestamps, the behavior differs slightly between 64-bit and 32-bit
kernels, the latter will show incorrect timestamps after 2038 in sysfs,
and presumably fail completely in 2106 as comparisons go wrong.
This changes nilfs2 to use time64_t with ktime_get_real_seconds() to
handle timestamps, making the behavior consistent and correct on both
32-bit and 64-bit machines.
The on-disk format already uses 64-bit timestamps, so nothing changes
there.
Link: http://lkml.kernel.org/r/20180122211050.1286441-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If vm.max_map_count bumped above 2^26 (67+ mil) and system has enough RAM
to allocate all the VMAs (~12.8 GB on Fedora 27 with 200-byte VMAs), then
it should be possible to overflow 32-bit "size", pass paranoia check,
allocate very little vmalloc space and oops while writing into vmalloc
guard page...
But I didn't test this, only coredump of regular process.
Link: http://lkml.kernel.org/r/20180112203427.GA9109@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A single character (line break) should be put into a sequence. Thus use
the corresponding function "seq_putc".
This issue was detected by using the Coccinelle software.
Link: http://lkml.kernel.org/r/04fb69fe-d820-9141-820f-07e9a48f4635@users.sourceforge.net
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rearrange args for smaller code.
lookup revolves around memcmp() which gets len 3rd arg, so propagate
length as 3rd arg.
readdir and lookup add additional arg to VFS ->readdir and ->lookup, so
better add it to the end.
Space savings on x86_64:
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-18 (-18)
Function old new delta
proc_readdir 22 13 -9
proc_lookup 18 9 -9
proc_match() is smaller if not inlined, I promise!
Link: http://lkml.kernel.org/r/20180104175958.GB5204@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
use_pde() is used at every open/read/write/... of every random /proc
file. Negative refcount happens only if PDE is being deleted by module
(read: never). So it gets "likely".
unuse_pde() gets "unlikely" for the same reason.
close_pdeo() gets unlikely as the completion is filled only if there is a
race between PDE removal and close() (read: never ever).
It even saves code on x86_64 defconfig:
add/remove: 0/0 grow/shrink: 1/2 up/down: 2/-20 (-18)
Function old new delta
close_pdeo 183 185 +2
proc_reg_get_unmapped_area 119 111 -8
proc_reg_poll 85 73 -12
Link: http://lkml.kernel.org/r/20180104175657.GA5204@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
/proc/self inode numbers, value of proc_inode_cache and st_nlink of
/proc/$TGID are fixed constants.
Link: http://lkml.kernel.org/r/20180103184707.GA31849@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
struct proc_dir_entry became bit messy over years:
* move 16-bit ->mode_t before namelen to get rid of padding
* make ->in_use first field: it seems to be most used resulting in
smaller code on x86_64 (defconfig):
add/remove: 0/0 grow/shrink: 7/13 up/down: 24/-67 (-43)
Function old new delta
proc_readdir_de 451 455 +4
proc_get_inode 282 286 +4
pde_put 65 69 +4
remove_proc_subtree 294 297 +3
remove_proc_entry 297 300 +3
proc_register 295 298 +3
proc_notify_change 94 97 +3
unuse_pde 27 26 -1
proc_reg_write 89 85 -4
proc_reg_unlocked_ioctl 85 81 -4
proc_reg_read 89 85 -4
proc_reg_llseek 87 83 -4
proc_reg_get_unmapped_area 123 119 -4
proc_entry_rundown 139 135 -4
proc_reg_poll 91 85 -6
proc_reg_mmap 79 73 -6
proc_get_link 55 49 -6
proc_reg_release 108 101 -7
proc_reg_open 298 291 -7
close_pdeo 228 218 -10
* move writeable fields together to a first cacheline (on x86_64),
those include
* ->in_use: reference count, taken every open/read/write/close etc
* ->count: reference count, taken at readdir on every entry
* ->pde_openers: tracks (nearly) every open, dirtied
* ->pde_unload_lock: spinlock protecting ->pde_openers
* ->proc_iops, ->proc_fops, ->data: writeonce fields,
used right together with previous group.
* other rarely written fields go into 1st/2nd and 2nd/3rd cacheline on
32-bit and 64-bit respectively.
Additionally on 32-bit, ->subdir, ->subdir_node, ->namelen, ->name go
fully into 2nd cacheline, separated from writeable fields. They are all
used during lookup.
Link: http://lkml.kernel.org/r/20171220215914.GA7877@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit df04abfd18 ("fs/proc/kcore.c: Add bounce buffer for ktext
data") added a bounce buffer to avoid hardened usercopy checks. Copying
to the bounce buffer was implemented with a simple memcpy() assuming
that it is always valid to read from kernel memory iff the
kern_addr_valid() check passed.
A simple, but pointless, test case like "dd if=/proc/kcore of=/dev/null"
now can easily crash the kernel, since the former execption handling on
invalid kernel addresses now doesn't work anymore.
Also adding a kern_addr_valid() implementation wouldn't help here. Most
architectures simply return 1 here, while a couple implemented a page
table walk to figure out if something is mapped at the address in
question.
With DEBUG_PAGEALLOC active mappings are established and removed all the
time, so that relying on the result of kern_addr_valid() before
executing the memcpy() also doesn't work.
Therefore simply use probe_kernel_read() to copy to the bounce buffer.
This also allows to simplify read_kcore().
At least on s390 this fixes the observed crashes and doesn't introduce
warnings that were removed with df04abfd18 ("fs/proc/kcore.c: Add
bounce buffer for ktext data"), even though the generic
probe_kernel_read() implementation uses uaccess functions.
While looking into this I'm also wondering if kern_addr_valid() could be
completely removed...(?)
Link: http://lkml.kernel.org/r/20171202132739.99971-1-heiko.carstens@de.ibm.com
Fixes: df04abfd18 ("fs/proc/kcore.c: Add bounce buffer for ktext data")
Fixes: f5509cc18d ("mm: Hardened usercopy")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dentry name can be evaluated later, right before calling into VFS.
Also, spend less time under ->mmap_sem.
Link: http://lkml.kernel.org/r/20171110163034.GA2534@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Iterators aren't necessary as you can just grab the first entry and delete
it until no entries left.
Link: http://lkml.kernel.org/r/20171121191121.GA20757@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Current code does:
if (sscanf(dentry->d_name.name, "%lx-%lx", start, end) != 2)
However sscanf() is broken garbage.
It silently accepts whitespace between format specifiers
(did you know that?).
It silently accepts valid strings which result in integer overflow.
Do not use sscanf() for any even remotely reliable parsing code.
OK
# readlink '/proc/1/map_files/55a23af39000-55a23b05b000'
/lib/systemd/systemd
broken
# readlink '/proc/1/map_files/ 55a23af39000-55a23b05b000'
/lib/systemd/systemd
broken
# readlink '/proc/1/map_files/55a23af39000-55a23b05b000 '
/lib/systemd/systemd
very broken
# readlink '/proc/1/map_files/1000000000000000055a23af39000-55a23b05b000'
/lib/systemd/systemd
Andrei said:
: This patch breaks criu. It was a bug in criu. And this bug is on a minor
: path, which works when memfd_create() isn't available. It is a reason why
: I ask to not backport this patch to stable kernels.
:
: In CRIU this bug can be triggered, only if this patch will be backported
: to a kernel which version is lower than v3.16.
Link: http://lkml.kernel.org/r/20171120212706.GA14325@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
READ_ONCE and WRITE_ONCE are useless when there is only one read/write
is being made.
Link: http://lkml.kernel.org/r/20171120204033.GA9446@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
PROC_NUMBUF is 13 which is enough for "negative int + \n + \0".
However PIDs and TGIDs are never negative and newline is not a concern,
so use just 10 per integer.
Link: http://lkml.kernel.org/r/20171120203005.GA27743@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexander Viro <viro@ftp.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If a dentry is deleted, then a dentry is recreated with the same handle
but a different type (i.e. it was a file and now it's a symlink), then
its a different inode. The check was backwards, so d_revalidate would
not have noticed.
Due to the design of the OrangeFS server, this is rather unlikely.
It's also possible for the dentry to be deleted and recreated with the
same type. This would be undetectable. It's a bit of a ship of
Theseus.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Check whether this is a new inode at location of call.
Raises the question of what to do with an unknown inode type. Old code
would've marked the inode bad and returned ESTALE.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
When we get an error return code from userspace (the client-core)
we check to make sure it is a valid code.
This patch maps the whacky return code to -EINVAL instead of
propagating garbage back up the call chain potentially resulting
in a hard-to-find train-wreck.
The client-core doesn't have any business returning whacky return
codes, but if it does, we don't want the kernel to crash as a result.
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
gcc-8 reports
fs/orangefs/dcache.c: In function 'orangefs_d_revalidate':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' specified
bound 256 equals destination size [-Wstringop-truncation]
fs/orangefs/namei.c: In function 'orangefs_rename':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' specified
bound 256 equals destination size [-Wstringop-truncation]
fs/orangefs/super.c: In function 'orangefs_mount':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' specified
bound 256 equals destination size [-Wstringop-truncation]
We need one less byte or call strlcpy() to make it a nul-terminated
string.
Signed-off-by: Xiongfeng Wang <xiongfeng.wang@linaro.org>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
It wasn't possible to enable it, and it would've had very little effect.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
gossip_ldebug is unused.
gossip_lerr is used in two places. The messages are unique so line
numbers are unnecessary.
Also remove support for compiling gossip messages out. It wasn't
possible to enable it anyway.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJacX62AAoJEAhfPr2O5OEVjKYP/R3v+c8ztiHzaeibcZZ8IFNl
58E0Y0yGa8OpoGJx9uqtEOamQmZoHhACfId7joIp/Jv38bgWAdbxOmk3Y4FDCFqG
1bRrpnnmvlfabiMMfLpURLqKhf7rJMtErZkrnmmqg9P/lEMohaZUJAsgBZNfJM8l
fZeacSnCSpzlxVcUb9Bf4vWhLk39R+xFzvFrwzbVUIHf3bDVpf4S4kNorMkhSZSF
HaISYXqVMhpKca7CngVKytbfacUStUY01cXcjdMuB/sD7ySwdtKogbPMvrOSaexz
G/8MB+sGT1JKUgIlh6Qv8hX805KuxBgfP19XSOH46nNU8KbYegdGhN5QXlokwI1m
dAOiozkU93r5yBZl6QzkN3uwXe492PoLgczifg97pzAJP0BfWeFStkYqlugLTwwC
Slmr7g3FZVJajbPl6WyioAGW7xfqBF7ftScZOHYxmhy41CWCGKJctmsJOjncyz5O
GInEIP3KR4CgjR+iM1LoKvE+OvVo4kRc7hrcUsjQNsbfBn6xiixjwH+5M+UVvezA
6UQpmtWGg4pX1djb8j8f6mKF8KZM12Pp3jb4Rl1cLsytN5BOBKaMEKdV3rgL+19P
Yo0x/1wK/unkI20Om71vYyQ0nXVF9j7Tpeij5u0M57TeTVYCwloQgHmrcvQJdo8+
Pqw5XEUiDpAIjvKp0XGh
=H9AS
-----END PGP SIGNATURE-----
Merge tag 'media/v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- videobuf2 was moved to a media/common dir, as it is now used by the
DVB subsystem too
- Digital TV core memory mapped support interface
- new sensor driver: ov7740
- several improvements at ddbridge driver
- new V4L2 driver: IPU3 CIO2 CSI-2 receiver unit, found on some Intel
SoCs
- new tuner driver: tda18250
- finally got rid of all LIRC staging drivers
- as we don't have old lirc drivers anymore, restruct the lirc device
code
- add support for UVC metadata
- add a new staging driver for NVIDIA Tegra Video Decoder Engine
- DVB kAPI headers moved to include/media
- synchronize the kAPI and uAPI for the DVB subsystem, removing the gap
for non-legacy APIs
- reduce the kAPI gap for V4L2
- lots of other driver enhancements, cleanups, etc.
* tag 'media/v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (407 commits)
media: v4l2-compat-ioctl32.c: make ctrl_is_pointer work for subdevs
media: v4l2-compat-ioctl32.c: refactor compat ioctl32 logic
media: v4l2-compat-ioctl32.c: don't copy back the result for certain errors
media: v4l2-compat-ioctl32.c: drop pr_info for unknown buffer type
media: v4l2-compat-ioctl32.c: copy clip list in put_v4l2_window32
media: v4l2-compat-ioctl32.c: fix ctrl_is_pointer
media: v4l2-compat-ioctl32.c: copy m.userptr in put_v4l2_plane32
media: v4l2-compat-ioctl32.c: avoid sizeof(type)
media: v4l2-compat-ioctl32.c: move 'helper' functions to __get/put_v4l2_format32
media: v4l2-compat-ioctl32.c: fix the indentation
media: v4l2-compat-ioctl32.c: add missing VIDIOC_PREPARE_BUF
media: v4l2-ioctl.c: don't copy back the result for -ENOTTY
media: v4l2-ioctl.c: use check_fmt for enum/g/s/try_fmt
media: vivid: fix module load error when enabling fb and no_error_inj=1
media: dvb_demux: improve debug messages
media: dvb_demux: Better handle discontinuity errors
media: cxusb, dib0700: ignore XC2028_I2C_FLUSH
media: ts2020: avoid integer overflows on 32 bit machines
media: i2c: ov7740: use gpio/consumer.h instead of gpio.h
media: entity: Add a nop variant of media_entity_cleanup
...
* Require struct page by default for filesystem DAX to remove a number of
surprising failure cases. This includes failures with direct I/O, gdb and
fork(2).
* Add support for the new Platform Capabilities Structure added to the NFIT in
ACPI 6.2a. This new table tells us whether the platform supports flushing
of CPU and memory controller caches on unexpected power loss events.
* Revamp vmem_altmap and dev_pagemap handling to clean up code and better
support future future PCI P2P uses.
* Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has become
out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL spec, and
instead rely on the generic ND_CMD_CALL approach used by the two other IOCTL
families, NVDIMM_FAMILY_{HPE,MSFT}.
* Enhance nfit_test so we can test some of the new things added in version 1.6
of the DSM specification. This includes testing firmware download and
simulating the Last Shutdown State (LSS) status.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaeOg0AAoJEJ/BjXdf9fLBAFoQAI/IgcgJ2h9lfEpgjBRTC44t
2p8dxwT1Ofw3Y1aR/tI8nYRXjRtAGuP4UIeRVnb1CL/N7PagJyoMGU+6hmzg+ptY
c7cEDvw6nZOhrFwXx/xn7R53sYG8zH+UE6+jTR/PP/G4mQJfFCg4iF9R72Y7z0n7
aurf82Kz137NPUy6dNr4V9bmPMJWAaOci9WOj5SKddR5ZSNbjoxylTwQRvre5y4r
7HQTScEkirABOdSf1JoXTSUXCH/RC9UFFXR03ScHstGb1HjCj3KdcicVc50Q++Ub
qsEudhE6i44PEW1Hh4Qkg6hjHMEa8qHP+ShBuRuVaUmlghYTQn66niJAYLZilwdz
EVjE7vR+toHA5g3YCalEmYVutUEhIDkh/xfpd7vM6ZorUGJy95a2elEJs2fHBffC
gEhnCip7FROPcK5RDNUM8hBgnG/q5wwWPQMKY+6rKDZQx3mXssCrKp2Vlx7kBwMG
rpblkEpYjPonbLEHxsSU8yTg9Uq55ciIWgnOToffcjZvjbihi8WUVlHcwHUMPf/o
DWElg+4qmG0Sdd4S2NeAGwTl1Ewrf2RrtUGMjHtH4OUFs1wo6ZmfrxFzzMfoZ1Od
ko/s65v4uwtTzECh2o+XQaNsReR5YETXxmA40N/Jpo7/7twABIoZ/ASvj/3ZBYj+
sie+u2rTod8/gQWSfHpJ
=MIMX
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Ross Zwisler:
- Require struct page by default for filesystem DAX to remove a number
of surprising failure cases. This includes failures with direct I/O,
gdb and fork(2).
- Add support for the new Platform Capabilities Structure added to the
NFIT in ACPI 6.2a. This new table tells us whether the platform
supports flushing of CPU and memory controller caches on unexpected
power loss events.
- Revamp vmem_altmap and dev_pagemap handling to clean up code and
better support future future PCI P2P uses.
- Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has
become out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL
spec, and instead rely on the generic ND_CMD_CALL approach used by
the two other IOCTL families, NVDIMM_FAMILY_{HPE,MSFT}.
- Enhance nfit_test so we can test some of the new things added in
version 1.6 of the DSM specification. This includes testing firmware
download and simulating the Last Shutdown State (LSS) status.
* tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (37 commits)
libnvdimm, namespace: remove redundant initialization of 'nd_mapping'
acpi, nfit: fix register dimm error handling
libnvdimm, namespace: make min namespace size 4K
tools/testing/nvdimm: force nfit_test to depend on instrumented modules
libnvdimm/nfit_test: adding support for unit testing enable LSS status
libnvdimm/nfit_test: add firmware download emulation
nfit-test: Add platform cap support from ACPI 6.2a to test
libnvdimm: expose platform persistence attribute for nd_region
acpi: nfit: add persistent memory control flag for nd_region
acpi: nfit: Add support for detect platform CPU cache flush on power loss
device-dax: Fix trailing semicolon
libnvdimm, btt: fix uninitialized err_lock
dax: require 'struct page' by default for filesystem dax
ext2: auto disable dax instead of failing mount
ext4: auto disable dax instead of failing mount
mm, dax: introduce pfn_t_special()
mm: Fix devm_memremap_pages() collision handling
mm: Fix memory size alignment in devm_memremap_pages_release()
memremap: merge find_dev_pagemap into get_dev_pagemap
memremap: change devm_memremap_pages interface to use struct dev_pagemap
...
Support the AFS dynamic root which is a pseudo-volume that doesn't connect
to any server resource, but rather is just a root directory that
dynamically creates mountpoint directories where the name of such a
directory is the name of the cell.
Such a mount can be created thus:
mount -t afs none /afs -o dyn
Dynamic root superblocks aren't shared except by bind mounts and
propagation. Cell root volumes can then be mounted by referring to them by
name, e.g.:
ls /afs/grand.central.org/
ls /afs/.grand.central.org/
The kernel will upcall to consult the DNS if the address wasn't supplied
directly.
Signed-off-by: David Howells <dhowells@redhat.com>
Rearrange afs_select_fileserver() a little to put the use_server chunk
before the next_server chunk so that with the removal of a couple of gotos
the main path through the function is all one sequence.
Signed-off-by: David Howells <dhowells@redhat.com>
Fix server list handling in the following ways:
(1) In afs_alloc_volume(), remove duplicate server list build code. This
was already done by afs_alloc_server_list() which afs_alloc_volume()
previously called. This just results in twice as many VL RPCs.
(2) In afs_deliver_vl_get_entry_by_name_u(), use the number of server
records indicated by ->nServers in the UVLDB record returned by the
VL.GetEntryByNameU RPC call rather than scanning all NMAXNSERVERS
slots. Unused slots may contain garbage.
(3) In afs_alloc_server_list(), don't stop converting a UVLDB record into
a server list just because we can't look up one of the servers. Just
skip that server and go on to the next. If we can't look up any of
the servers then we'll fail at the end.
Without this patch, an attempt to view the umich.edu root cell using
something like "ls /afs/umich.edu" on a dynamic root (future patch) mount
or an autocell mount will result in ENOMEDIUM. The failure is due to kafs
not stopping after nServers'worth of records have been read, but then
trying to access a server with a garbage UUID and getting an error, which
aborts the server list build.
Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Jonathan Billings <jsbillings@jsbillings.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: stable@vger.kernel.org
In afs_select_fileserver(), we need to clear the ->responded flag in the
address list when reusing it. We should also clear it in
afs_select_current_fileserver().
To this end, just memset() the object before initialising it.
Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: stable@vger.kernel.org
afs_select_fileserver() ends the address cursor it is using in the case in
which we get some sort of network error and run out of addresses to iterate
through, before it jumps to try the next server. This also needs to be
done when the server aborts with some sort of error that means we should
try the next server.
Fix this by:
(1) Move the iterate_address afs_end_cursor() call to the next_server
case.
(2) End the cursor in the failed case.
(3) Make afs_end_cursor() clear the ->begun flag and ->addr pointer in the
address cursor.
(4) Make afs_end_cursor() able to be called on an already cleared cursor.
Without this, something like the following oops may occur:
AFS: Assertion failed
18446612134397189888 == 0 is false
0xffff88007c279f00 == 0x0 is false
------------[ cut here ]------------
kernel BUG at fs/afs/rotate.c:360!
RIP: 0010:afs_select_fileserver+0x79b/0xa30 [kafs]
Call Trace:
afs_statfs+0xcc/0x180 [kafs]
? p9_client_statfs+0x9e/0x110 [9pnet]
? _cond_resched+0x19/0x40
statfs_by_dentry+0x6d/0x90
vfs_statfs+0x1b/0xc0
user_statfs+0x4b/0x80
SYSC_statfs+0x15/0x30
SyS_statfs+0xe/0x10
entry_SYSCALL_64_fastpath+0x20/0x83
Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: stable@vger.kernel.org
afs_alloc_volume() needs to release the cell ref it obtained in the case of
an error. Fix this by adding an afs_put_cell() call into the error path.
This can triggered when a lookup for a cell in a dynamic root or an
autocell mount returns an error whilst trying to look up the server (such
as ENOMEDIUM). This results in an assertion failure oops when the module
is unloaded due to outstanding refs on a cell record.
Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: stable@vger.kernel.org
There's no point I can see to
stp->st_stid.sc_type = NFS4_CLOSED_STID;
given release_lock_stateid immediately sets sc_type to 0.
That set of sc_type to 0 should be enough to prevent it being used where
we don't want it to be; NFS4_CLOSED_STID should only be needed for
actual open stateid's that are actually closed.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The state of the stid is guaranteed by 2 locks:
- The nfs4_client 'cl_lock' spinlock
- The nfs4_ol_stateid 'st_mutex' mutex
so it is quite possible for the stid to be unhashed after lookup,
but before calling nfsd4_lock_ol_stateid(). So we do need to check
for a zero value for 'sc_type' in nfsd4_verify_open_stid().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Checuk Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 659aefb68e "nfsd: Ensure we don't recognise lock stateids..."
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
- Print scrub build status in the xfs build info.
- Explicitly call out the remaining two scenarios where we don't
support
reflink and never have.
- Remove EXPERIMENTAL tag from reverse mapping btree!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCgAGBQJaeJjKAAoJEPh/dxk0SrTrZP8P/RT0bcKc1PkmonX6rZBYa9OB
Mz5X7TpVRsXtZPtGSNM3IBIubjIVEZ/f3s5CZefN08uV8s+AFBjEAdHmeAiGtT/X
qakQyvsBJ3mEyVsMyzuI7eu4TU3/5Xad7kSp9TFPnXfW8z09Z4GygyGVJPRqpKRQ
liFzh8BIVgS/IFcpTL+6wKEHdAHEuyz6u/78ylgCtLMuiNiMY1mYv/+U2f7dEV3u
yiRY4oHGQfOiw1aXy3EO2WUdSKcAQwIJIEsLOllYQRe3f5W2milflFCJF9RoEEuE
OLmur4PBwFWpTfLVl1BqGa6rr/nhaY1y7Lyy3mVrmv0QiHlnNM/BQ5UKICZJdx5O
8Ai4ZyaJ5Q/nQxA6USOBHSlkeexMOH82i7gJCCfPtYqW1l0QjStLcoTYjWXa/0u9
ULEkdnocNm/HSCIGocFrd6dzOKR8TxJDVh3DxIFo8VjTj/XI57+ePfbZT7J+0vuB
elhKcho87xKHeF1RQfsVdgh+518GGAXp5zZjAJ3P/6GpxuB9sa+ShEEtR7OzSf0K
sfkXw3P/tH9ladBxWvMC6Gx0tSUSUTAUeYSbfOC1wRio7iI7sf8Gl8SkU65y4RdE
ZhQp8M4i2+vt9JS/E/mbAVxKIn1iF7L9ZiWlycJXyuqFf7bv1uBXG+tTE7lM7nJA
YjSmXBWN5j6kxQeUR0NE
=U54J
-----END PGP SIGNATURE-----
Merge tag 'xfs-4.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull more xfs updates from Darrick Wong:
"As promised, here's a (much smaller) second pull request for the
second week of the merge cycle. This time around we have a couple
patches shutting off unsupported fs configurations, and a couple of
cleanups.
Last, we turn off EXPERIMENTAL for the reverse mapping btree, since
the primary downstream user of that information (online fsck) is now
upstream and I haven't seen any major failures in a few kernel
releases.
Summary:
- Print scrub build status in the xfs build info.
- Explicitly call out the remaining two scenarios where we don't
support reflink and never have.
- Remove EXPERIMENTAL tag from reverse mapping btree!"
* tag 'xfs-4.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: remove experimental tag for reverse mapping
xfs: don't allow reflink + realtime filesystems
xfs: don't allow DAX on reflink filesystems
xfs: add scrub to XFS_BUILD_OPTIONS
xfs: fix u32 type usage in sb validation function
Pull overlayfs updates from Miklos Szeredi:
"This work from Amir adds NFS export capability to overlayfs. NFS
exporting an overlay filesystem is a challange because we want to keep
track of any copy-up of a file or directory between encoding the file
handle and decoding it.
This is achieved by indexing copied up objects by lower layer file
handle. The index is already used for hard links, this patchset
extends the use to NFS file handle decoding"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (51 commits)
ovl: check ERR_PTR() return value from ovl_encode_fh()
ovl: fix regression in fsnotify of overlay merge dir
ovl: wire up NFS export operations
ovl: lookup indexed ancestor of lower dir
ovl: lookup connected ancestor of dir in inode cache
ovl: hash non-indexed dir by upper inode for NFS export
ovl: decode pure lower dir file handles
ovl: decode indexed dir file handles
ovl: decode lower file handles of unlinked but open files
ovl: decode indexed non-dir file handles
ovl: decode lower non-dir file handles
ovl: encode lower file handles
ovl: copy up before encoding non-connectable dir file handle
ovl: encode non-indexed upper file handles
ovl: decode connected upper dir file handles
ovl: decode pure upper file handles
ovl: encode pure upper file handles
ovl: document NFS export
vfs: factor out helpers d_instantiate_anon() and d_alloc_anon()
ovl: store 'has_upper' and 'opaque' as bit flags
...
Commit 4fde46f0cc ("Btrfs: free the stale device") introduced
btrfs_free_stale_device which iterates the device lists for all
registered btrfs filesystems and deletes those devices which aren't
mounted. In a btrfs_devices structure has only 1 device attached to it
and it is unused then btrfs_free_stale_devices will proceed to also free
the btrfs_fs_devices struct itself. Currently this leads to a use after
free since list_for_each_entry will try to perform a check on the
already freed memory to see if it has to terminate the loop.
The fix is to use 'break' when we know we are freeing the current
fs_devs.
Fixes: 4fde46f0cc ("Btrfs: free the stale device")
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Another fix for an issue reported by 0-day robot.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 8ed5eec9d6 ("ovl: encode pure upper file handles")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
A re-factoring patch in NFS export series has passed the wrong argument
to ovl_get_inode() causing a regression in the very recent fix to
fsnotify of overlay merge dir.
The regression has caused merge directory inodes to be hashed by upper
instead of lower real inode, when NFS export and directory indexing is
disabled. That caused an inotify watch to become obsolete after directory
copy up and drop caches.
LTP test inotify07 was improved to catch this regression.
The regression also caused multiple redirect dirs to same origin not to
be detected on lookup with NFS export disabled. An xfstest was added to
cover this case.
Fixes: 0aceb53e73 ("ovl: do not pass overlay dentry to ovl_get_inode()")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlp2R3AACgkQ8vlZVpUN
gaOIdAgApEdlFR2Gf93z2hMj5HxVL5rjkuPJVtVkKu0eH2HMQJyxNmjymrRfuFmM
8W1CrEvVKi5Aj6r8q4KHIdVV247Ya0SVEhLwKM0LX4CvlZUXmwgCmZ/MPDTXA1eq
C4vPVuJAuSNGNVYDlDs3+NiMHINGNVnBVQQFSPBP9P+iNWPD7o486712qaF8maVn
RbfbQ2rWtOIRdlAOD1U5WqgQku59lOsmHk2pc0+X4LHCZFpMoaO80JVjENPAw+BF
daRt6TX+WljMyx6DRIaszqau876CJhe/tqlZcCLOkpXZP0jJS13yodp26dVQmjCh
w8YdiY7uHK2D+S/8eyj7h7DIwzu3vg==
=ZjQP
-----END PGP SIGNATURE-----
Merge tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt
Pull fscrypt updates from Ted Ts'o:
"Refactor support for encrypted symlinks to move common code to fscrypt"
Ted also points out about the merge:
"This makes the f2fs symlink code use the fscrypt_encrypt_symlink()
from the fscrypt tree. This will end up dropping the kzalloc() ->
f2fs_kzalloc() change, which means the fscrypt-specific allocation
won't get tested by f2fs's kmalloc error injection system; which is
fine"
* tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt: (26 commits)
fscrypt: fix build with pre-4.6 gcc versions
fscrypt: remove 'ci' parameter from fscrypt_put_encryption_info()
fscrypt: document symlink length restriction
fscrypt: fix up fscrypt_fname_encrypted_size() for internal use
fscrypt: define fscrypt_fname_alloc_buffer() to be for presented names
fscrypt: calculate NUL-padding length in one place only
fscrypt: move fscrypt_symlink_data to fscrypt_private.h
fscrypt: remove fscrypt_fname_usr_to_disk()
ubifs: switch to fscrypt_get_symlink()
ubifs: switch to fscrypt ->symlink() helper functions
ubifs: free the encrypted symlink target
f2fs: switch to fscrypt_get_symlink()
f2fs: switch to fscrypt ->symlink() helper functions
ext4: switch to fscrypt_get_symlink()
ext4: switch to fscrypt ->symlink() helper functions
fscrypt: new helper function - fscrypt_get_symlink()
fscrypt: new helper functions for ->symlink()
fscrypt: trim down fscrypt.h includes
fscrypt: move fscrypt_is_dot_dotdot() to fs/crypto/fname.c
fscrypt: move fscrypt_valid_enc_modes() to fscrypt_private.h
...
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs. To further
restrict what memory is available for copying, this creates a way to
whitelist specific areas of a given slab cache object for copying to/from
userspace, allowing much finer granularity of access control. Slab caches
that are never exposed to userspace can declare no whitelist for their
objects, thereby keeping them unavailable to userspace via dynamic copy
operations. (Note, an implicit form of whitelisting is the use of constant
sizes in usercopy operations and get_user()/put_user(); these bypass all
hardened usercopy checks since these sizes cannot change at runtime.)
This new check is WARN-by-default, so any mistakes can be found over the
next several releases without breaking anyone's system.
The series has roughly the following sections:
- remove %p and improve reporting with offset
- prepare infrastructure and whitelist kmalloc
- update VFS subsystem with whitelists
- update SCSI subsystem with whitelists
- update network subsystem with whitelists
- update process memory with whitelists
- update per-architecture thread_struct with whitelists
- update KVM with whitelists and fix ioctl bug
- mark all other allocations as not whitelisted
- update lkdtm for more sensible test overage
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Kees Cook <kees@outflux.net>
iQIcBAABCgAGBQJabvleAAoJEIly9N/cbcAmO1kQAJnjVPutnLSbnUteZxtsv7W4
43Cggvokfxr6l08Yh3hUowNxZVKjhF9uwMVgRRg9Nl5WdYCN+vCQbHz+ZdzGJXKq
cGqdKWgexMKX+aBdNDrK7BphUeD46sH7JWR+a/lDV/BgPxBCm9i5ZZCgXbPP89AZ
NpLBji7gz49wMsnm/x135xtNlZ3dG0oKETzi7MiR+NtKtUGvoIszSKy5JdPZ4m8q
9fnXmHqmwM6uQFuzDJPt1o+D1fusTuYnjI7EgyrJRRhQ+BB3qEFZApXnKNDRS9Dm
uB7jtcwefJCjlZVCf2+PWTOEifH2WFZXLPFlC8f44jK6iRW2Nc+wVRisJ3vSNBG1
gaRUe/FSge68eyfQj5OFiwM/2099MNkKdZ0fSOjEBeubQpiFChjgWgcOXa5Bhlrr
C4CIhFV2qg/tOuHDAF+Q5S96oZkaTy5qcEEwhBSW15ySDUaRWFSrtboNt6ZVOhug
d8JJvDCQWoNu1IQozcbv6xW/Rk7miy8c0INZ4q33YUvIZpH862+vgDWfTJ73Zy9H
jR/8eG6t3kFHKS1vWdKZzOX1bEcnd02CGElFnFYUEewKoV7ZeeLsYX7zodyUAKyi
Yp5CImsDbWWTsptBg6h9nt2TseXTxYCt2bbmpJcqzsqSCUwOQNQ4/YpuzLeG0ihc
JgOmUnQNJWCTwUUw5AS1
=tzmJ
-----END PGP SIGNATURE-----
Merge tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardened usercopy whitelisting from Kees Cook:
"Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs.
To further restrict what memory is available for copying, this creates
a way to whitelist specific areas of a given slab cache object for
copying to/from userspace, allowing much finer granularity of access
control.
Slab caches that are never exposed to userspace can declare no
whitelist for their objects, thereby keeping them unavailable to
userspace via dynamic copy operations. (Note, an implicit form of
whitelisting is the use of constant sizes in usercopy operations and
get_user()/put_user(); these bypass all hardened usercopy checks since
these sizes cannot change at runtime.)
This new check is WARN-by-default, so any mistakes can be found over
the next several releases without breaking anyone's system.
The series has roughly the following sections:
- remove %p and improve reporting with offset
- prepare infrastructure and whitelist kmalloc
- update VFS subsystem with whitelists
- update SCSI subsystem with whitelists
- update network subsystem with whitelists
- update process memory with whitelists
- update per-architecture thread_struct with whitelists
- update KVM with whitelists and fix ioctl bug
- mark all other allocations as not whitelisted
- update lkdtm for more sensible test overage"
* tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (38 commits)
lkdtm: Update usercopy tests for whitelisting
usercopy: Restrict non-usercopy caches to size 0
kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
kvm: whitelist struct kvm_vcpu_arch
arm: Implement thread_struct whitelist for hardened usercopy
arm64: Implement thread_struct whitelist for hardened usercopy
x86: Implement thread_struct whitelist for hardened usercopy
fork: Provide usercopy whitelisting for task_struct
fork: Define usercopy region in thread_stack slab caches
fork: Define usercopy region in mm_struct slab caches
net: Restrict unwhitelisted proto caches to size 0
sctp: Copy struct sctp_sock.autoclose to userspace using put_user()
sctp: Define usercopy region in SCTP proto slab cache
caif: Define usercopy region in caif proto slab cache
ip: Define usercopy region in IP proto slab cache
net: Define usercopy region in struct proto slab cache
scsi: Define usercopy region in scsi_sense_cache slab cache
cifs: Define usercopy region in cifs_request slab cache
vxfs: Define usercopy region in vxfs_inode slab cache
ufs: Define usercopy region in ufs_inode_cache slab cache
...
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlp16xMACgkQ8vlZVpUN
gaP1IAf8C48AKVnqy6ftFphzV1CdeGHDwJLL63lChs97fNr1mxo5TZE/6vdYB55j
k7C7huQ582cEiGWQJ0U4/+En0hF85zkAk5mTfnSao5BqxLr9ANsAocwBUNBXdFSp
B7IyMo4Dct7NCkwfmKLPRcEqZ49vwyv99TqM/9wUkgUStkTjPT7bhHgarB6VPbhp
BxoXVnFYgU0sZN0y71IBt8ngWqCK6j7fjw3gsl37oEenG3/h3SO0H9ih1FrysX8S
VOwwLJq6vfAgEwQvZACnBwWKDYsZpH7akNp9WGeDMByo28t514RNRjIi0mvLHEZa
h72I8Sb3bwHO9MJNvHFe/0b1Say4vw==
=dxAX
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Only miscellaneous cleanups and bug fixes for ext4 this cycle"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: create ext4_kset dynamically
ext4: create ext4_feat kobject dynamically
ext4: release kobject/kset even when init/register fail
ext4: fix incorrect indentation of if statement
ext4: correct documentation for grpid mount option
ext4: use 'sbi' instead of 'EXT4_SB(sb)'
ext4: save error to disk in __ext4_grp_locked_error()
jbd2: fix sphinx kernel-doc build warnings
ext4: fix a race in the ext4 shutdown path
mbcache: make sure c_entry_count is not decremented past zero
ext4: no need flush workqueue before destroying it
ext4: fixed alignment and minor code cleanup in ext4.h
ext4: fix ENOSPC handling in DAX page fault handler
dax: pass detailed error code from dax_iomap_fault()
mbcache: revert "fs/mbcache.c: make count_objects() more robust"
mbcache: initialize entry->e_referenced in mb_cache_entry_create()
ext4: fix up remaining files with SPDX cleanups
merged in this time. Both are regressions:
1. The first fixes another kernel build dependency problem.
2. The second fixes a performance regression in glock dumps.
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJadIS1AAoJENeLYdPf93o7i24H/3orp2uf/0EQFRB3WF7vxuhB
aFyymb35V5+pkoSOqBRpV8plQR3oNxeQX1uo+a08n5UzW7VHQBApS5m5to5w03dI
MRZvDUs84weKwjUm+ndhqOgjoUZuTIQ6+/A6bRDu+24AftqwNE5vHrTBvDdZ94zN
WxCy847aHd21TQ7nKIsLVp7wlllmRuxp1D+VEc7Vmn18eNrGp4TDavP5lq/4YR92
Zsj1AfhJK1GuAY9AJGMT3ZiFL6Mdg9oj7qSyJ2HjT7q/QJE+odwI8uUPs4HKpiko
VPBPhTrfgDE2nD4gAYIR41Aog8s8JnLgGK+0P7CqVxB37rq89BSYvApaHQE8yTg=
=4Ha2
-----END PGP SIGNATURE-----
Merge tag 'gfs2-4.16.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull GFS2 fixes from Bob Peterson:
"Andreas Gruenbacher wrote two additional patches that we would like
merged in this time. Both are regressions:
- fix another kernel build dependency problem
- fix a performance regression in glock dumps"
* tag 'gfs2-4.16.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Glock dump performance regression fix
gfs2: Fix the crc32c dependency
Until v4.14, this warning was very infrequent:
WARNING: CPU: 3 PID: 18172 at fs/btrfs/backref.c:1391 find_parent_nodes+0xc41/0x14e0
Modules linked in: [...]
CPU: 3 PID: 18172 Comm: bees Tainted: G D W L 4.11.9-zb64+ #1
Hardware name: System manufacturer System Product Name/M5A78L-M/USB3, BIOS 2101 12/02/2014
Call Trace:
dump_stack+0x85/0xc2
__warn+0xd1/0xf0
warn_slowpath_null+0x1d/0x20
find_parent_nodes+0xc41/0x14e0
__btrfs_find_all_roots+0xad/0x120
? extent_same_check_offsets+0x70/0x70
iterate_extent_inodes+0x168/0x300
iterate_inodes_from_logical+0x87/0xb0
? iterate_inodes_from_logical+0x87/0xb0
? extent_same_check_offsets+0x70/0x70
btrfs_ioctl+0x8ac/0x2820
? lock_acquire+0xc2/0x200
do_vfs_ioctl+0x91/0x700
? __fget+0x112/0x200
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x23/0xc6
? trace_hardirqs_off_caller+0x1f/0x140
Starting with v4.14 (specifically 86d5f99442 ("btrfs: convert prelimary
reference tracking to use rbtrees")) the WARN_ON occurs three orders of
magnitude more frequently--almost once per second while running workloads
like bees.
Replace the WARN_ON() with a comment rationale for its removal.
The rationale is paraphrased from an explanation by Edmund Nadolski
<enadolski@suse.de> on the linux-btrfs mailing list.
Fixes: 8da6d5815c ("Btrfs: added btrfs_find_all_roots()")
Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Running generic/019 with qgroups on the scratch device enabled is almost
guaranteed to trigger the BUG_ON in btrfs_free_tree_block. It's supposed
to trigger only on -ENOMEM, in reality, however, it's possible to get
-EIO from btrfs_qgroup_trace_extent_post. This function just finds the
roots of the extent being tracked and sets the qrecord->old_roots list.
If this operation fails nothing critical happens except the quota
accounting can be considered wrong. In such case just set the
INCONSISTENT flag for the quota and print a warning, rather than killing
off the system. Additionally, it's possible to trigger a BUG_ON in
btrfs_truncate_inode_items as well.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
[ error message adjustments ]
Signed-off-by: David Sterba <dsterba@suse.com>
The highest objectid, which is assigned to new inode, is decided at
the time of initializing fs roots. However, in cases where log replay
gets processed, the btree which fs root owns might be changed, so we
have to search it again for the highest objectid, otherwise creating
new inode would end up with -EEXIST.
cc: <stable@vger.kernel.org> v4.4-rc6+
Fixes: f32e48e925 ("Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots")
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This regression is introduced in
commit 3d48d9810d ("btrfs: Handle uninitialised inode eviction").
There are two problems,
a) it is ->destroy_inode() that does the final free on inode, not
->evict_inode(),
b) clear_inode() must be called before ->evict_inode() returns.
This could end up hitting BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
in evict() because I_CLEAR is set in clear_inode().
Fixes: commit 3d48d9810d ("btrfs: Handle uninitialised inode eviction")
Cc: <stable@vger.kernel.org> # v4.7-rc6+
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It's possible that btrfs_sync_log() bails out after one of the two
btrfs_write_marked_extents() which convert extent state's state bit into
EXTENT_NEED_WAIT from EXTENT_DIRTY/EXTENT_NEW, however only EXTENT_DIRTY
and EXTENT_NEW are searched by free_log_tree() so that those extent states
with EXTENT_NEED_WAIT lead to memory leak.
cc: <stable@vger.kernel.org>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In cases that the whole fs flips into readonly status due to failures in
critical sections, then log tree's blocks are still dirty, and this leads
to a crash during umount time, the crash is about use-after-free,
umount
-> close_ctree
-> stop workers
-> iput(btree_inode)
-> iput_final
-> write_inode_now
-> ...
-> queue job on stop'd workers
cc: <stable@vger.kernel.org> v3.12+
Fixes: 681ae50917 ("Btrfs: cleanup reserved space when freeing tree log on error")
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
@cur_offset is not set back to what it should be (@cow_start) if
btrfs_next_leaf() returns something wrong, and the range [cow_start,
cur_offset) remains locked forever.
cc: <stable@vger.kernel.org>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Reverse mapping has had a while to soak, so remove the experimental tag.
Now that we've landed space metadata cross-referencing in scrub, the
feature actually has a purpose.
Reject rmap filesystems with an rt device until the code to support it
is actually implemented.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
We don't support realtime filesystems with reflink either, so fail
those mounts.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Now that reflink is no longer experimental, reject attempts to mount
with DAX until that whole mess gets sorted out.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Advertise this config option along with the others.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Pull printk updates from Petr Mladek:
- Add a console_msg_format command line option:
The value "default" keeps the old "[time stamp] text\n" format. The
value "syslog" allows to see the syslog-like "<log
level>[timestamp] text" format.
This feature was requested by people doing regression tests, for
example, 0day robot. They want to have both filtered and full logs
at hands.
- Reduce the risk of softlockup:
Pass the console owner in a busy loop.
This is a new approach to the old problem. It was first proposed by
Steven Rostedt on Kernel Summit 2017. It marks a context in which
the console_lock owner calls console drivers and could not sleep.
On the other side, printk() callers could detect this state and use
a busy wait instead of a simple console_trylock(). Finally, the
console_lock owner checks if there is a busy waiter at the end of
the special context and eventually passes the console_lock to the
waiter.
The hand-off works surprisingly well and helps in many situations.
Well, there is still a possibility of the softlockup, for example,
when the flood of messages stops and the last owner still has too
much to flush.
There is increasing number of people having problems with
printk-related softlockups. We might eventually need to get better
solution. Anyway, this looks like a good start and promising
direction.
- Do not allow to schedule in console_unlock() called from printk():
This reverts an older controversial commit. The reschedule helped
to avoid softlockups. But it also slowed down the console output.
This patch is obsoleted by the new console waiter logic described
above. In fact, the reschedule made the hand-off less effective.
- Deprecate "%pf" and "%pF" format specifier:
It was needed on ia64, ppc64 and parisc64 to dereference function
descriptors and show the real function address. It is done
transparently by "%ps" and "pS" format specifier now.
Sergey Senozhatsky found that all the function descriptors were in
a special elf section and could be easily detected.
- Remove printk_symbol() API:
It has been obsoleted by "%pS" format specifier, and this change
helped to remove few continuous lines and a less intuitive old API.
- Remove redundant memsets:
Sergey removed unnecessary memset when processing printk.devkmsg
command line option.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (27 commits)
printk: drop redundant devkmsg_log_str memsets
printk: Never set console_may_schedule in console_trylock()
printk: Hide console waiter logic into helpers
printk: Add console owner and waiter logic to load balance console writes
kallsyms: remove print_symbol() function
checkpatch: add pF/pf deprecation warning
symbol lookup: introduce dereference_symbol_descriptor()
parisc64: Add .opd based function descriptor dereference
powerpc64: Add .opd based function descriptor dereference
ia64: Add .opd based function descriptor dereference
sections: split dereference_function_descriptor()
openrisc: Fix conflicting types for _exext and _stext
lib: do not use print_symbol()
irq debug: do not use print_symbol()
sysfs: do not use print_symbol()
drivers: do not use print_symbol()
x86: do not use print_symbol()
unicore32: do not use print_symbol()
sh: do not use print_symbol()
mn10300: do not use print_symbol()
...
The only place that has any business including asm/poll.h
is linux/poll.h. Fortunately, asm/poll.h had only been
included in 3 places beyond that one, and all of them
are trivial to switch to using linux/poll.h.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Merge KASAN word-at-a-time fixups from Andrey Ryabinin.
The word-at-a-time optimizations have caused headaches for KASAN, since
the whole point is that we access byte streams in bigger chunks, and
KASAN can be unhappy about the potential extra access at the end of the
string.
We used to have a horrible hack in dcache, and then people got
complaints from the strscpy() case. This fixes it all up properly, by
adding an explicit helper for the "access byte stream one word at a
time" case.
* emailed patches from Andrey Ryabinin <aryabinin@virtuozzo.com>:
fs: dcache: Revert "manually unpoison dname after allocation to shut up kasan's reports"
fs/dcache: Use read_word_at_a_time() in dentry_string_cmp()
lib/strscpy: Shut up KASAN false-positives in strscpy()
compiler.h: Add read_word_at_a_time() function.
compiler.h, kasan: Avoid duplicating __read_once_size_nocheck()
This reverts commit df4c0e36f1.
It's no longer needed since dentry_string_cmp() now uses
read_word_at_a_time() to avoid kasan's reports.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dentry_string_cmp() performs the word-at-a-time reads from 'cs' and may
read slightly more than it was requested in kmallac(). Normally this
would make KASAN to report out-of-bounds access, but this was
workarounded by commit df4c0e36f1 ("fs: dcache: manually unpoison
dname after allocation to shut up kasan's reports").
This workaround is not perfect, since it allows out-of-bounds access to
dentry's name for all the code, not just in dentry_string_cmp().
So it would be better to use read_word_at_a_time() instead and revert
commit df4c0e36f1.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Restore an optimization removed in commit 7f19449553 "Fix debugfs glocks
dump": keep the glock hash table iterator active while the glock dump
file is held open. This avoids having to rescan the hash table from the
start for each read, with quadratically rising runtime.
In addition, use rhastable_walk_peek for resuming a glock dump at the
current position: when a glock doesn't fit in the provided buffer
anymore, the next read must revisit the same glock.
Finally, also restart the dump from the first entry when we notice that
the hash table has been resized in gfs2_glock_seq_start.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Depend on LIBCRC32C which uses the crypto API to select the appropriate
crc32c implementation. With the CRYPTO and CRYPTO_CRC32C dependencies,
gfs2 would still need to use the crypto API directly like ext4 and btrfs
do, which isn't necessary.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Here is the set of "big" driver core patches for 4.16-rc1.
The majority of the work here is in the firmware subsystem, with reworks
to try to attempt to make the code easier to handle in the long run, but
no functional change. There's also some tree-wide sysfs attribute
fixups with lots of acks from the various subsystem maintainers, as well
as a handful of other normal fixes and changes.
And finally, some license cleanups for the driver core and sysfs code.
All have been in linux-next for a while with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWnLvPw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynNzACgkzjPoBytJWbpWFt6SR6L33/u4kEAnRFvVCGL
s6ygQPQhZIjKk2Lxa2hC
=Zihy
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the set of "big" driver core patches for 4.16-rc1.
The majority of the work here is in the firmware subsystem, with
reworks to try to attempt to make the code easier to handle in the
long run, but no functional change. There's also some tree-wide sysfs
attribute fixups with lots of acks from the various subsystem
maintainers, as well as a handful of other normal fixes and changes.
And finally, some license cleanups for the driver core and sysfs code.
All have been in linux-next for a while with no reported issues"
* tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (48 commits)
device property: Define type of PROPERTY_ENRTY_*() macros
device property: Reuse property_entry_free_data()
device property: Move property_entry_free_data() upper
firmware: Fix up docs referring to FIRMWARE_IN_KERNEL
firmware: Drop FIRMWARE_IN_KERNEL Kconfig option
USB: serial: keyspan: Drop firmware Kconfig options
sysfs: remove DEBUG defines
sysfs: use SPDX identifiers
drivers: base: add coredump driver ops
sysfs: add attribute specification for /sysfs/devices/.../coredump
test_firmware: fix missing unlock on error in config_num_requests_store()
test_firmware: make local symbol test_fw_config static
sysfs: turn WARN() into pr_warn()
firmware: Fix a typo in fallback-mechanisms.rst
treewide: Use DEVICE_ATTR_WO
treewide: Use DEVICE_ATTR_RO
treewide: Use DEVICE_ATTR_RW
sysfs.h: Use octal permissions
component: add debugfs support
bus: simple-pm-bus: convert bool SIMPLE_PM_BUS to tristate
...
Here is the big Staging and IIO driver patches for 4.16-rc1.
There is the normal amount of new IIO drivers added, like all releases.
The networking IPX and the ncpfs filesystem are moved into the staging
tree, as they are on their way out of the kernel due to lack of use
anymore.
The visorbus subsystem finall has started moving out of the staging tree
to the "real" part of the kernel, and the most and fsl-mc codebases are
almost ready to move out, that will probably happen for 4.17-rc1 if all
goes well.
Other than that, there is a bunch of license header cleanups in the
tree, along with the normal amount of coding style churn that we all
know and love for this codebase. I also got frustrated at the
Meltdown/Spectre mess and took it out on the dgnc tty driver, deleting
huge chunks of it that were never even being used.
Full details of everything is in the shortlog.
All of these patches have been in linux-next for a while with no
reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWnLxoA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yk4vgCgjeMlwhtar65DIticIRj626EFxiQAnjGmH8Kd
d9Xz2Piq8X47uSsC/6AE
=xxMT
-----END PGP SIGNATURE-----
Merge tag 'staging-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO updates from Greg KH:
"Here is the big Staging and IIO driver patches for 4.16-rc1.
There is the normal amount of new IIO drivers added, like all
releases.
The networking IPX and the ncpfs filesystem are moved into the staging
tree, as they are on their way out of the kernel due to lack of use
anymore.
The visorbus subsystem finall has started moving out of the staging
tree to the "real" part of the kernel, and the most and fsl-mc
codebases are almost ready to move out, that will probably happen for
4.17-rc1 if all goes well.
Other than that, there is a bunch of license header cleanups in the
tree, along with the normal amount of coding style churn that we all
know and love for this codebase. I also got frustrated at the
Meltdown/Spectre mess and took it out on the dgnc tty driver, deleting
huge chunks of it that were never even being used.
Full details of everything is in the shortlog.
All of these patches have been in linux-next for a while with no
reported issues"
* tag 'staging-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (627 commits)
staging: rtlwifi: remove redundant initialization of 'cfg_cmd'
staging: rtl8723bs: remove a couple of redundant initializations
staging: comedi: reformat lines to 80 chars or less
staging: lustre: separate a connection destroy from free struct kib_conn
Staging: rtl8723bs: Use !x instead of NULL comparison
Staging: rtl8723bs: Remove dead code
Staging: rtl8723bs: Change names to conform to the kernel code
staging: ccree: Fix missing blank line after declaration
staging: rtl8188eu: remove redundant initialization of 'pwrcfgcmd'
staging: rtlwifi: remove unused RTLHALMAC_ST and RTLPHYDM_ST
staging: fbtft: remove unused FB_TFT_SSD1325 kconfig
staging: comedi: dt2811: remove redundant initialization of 'ns'
staging: wilc1000: fix alignments to match open parenthesis
staging: wilc1000: removed unnecessary defined enums typedef
staging: wilc1000: remove unnecessary use of parentheses
staging: rtl8192u: remove redundant initialization of 'timeout'
staging: sm750fb: fix CamelCase for dispSet var
staging: lustre: lnet/selftest: fix compile error on UP build
staging: rtl8723bs: hal_com_phycfg: Remove unneeded semicolons
staging: rts5208: Fix "seg_no" calculation in reset_ms_card()
...
gcc versions prior to 4.6 require an extra level of braces when using a
designated initializer for a member in an anonymous struct or union.
This caused a compile error with the 'struct qstr' initialization in
__fscrypt_encrypt_symlink().
Fix it by using QSTR_INIT().
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Fixes: 76e81d6d50 ("fscrypt: new helper functions for ->symlink()")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The function inode_cmp_iversion{+raw} is counter-intuitive, because it
returns true when the counters are different and false when these are equal.
Rename it to inode_eq_iversion{+raw}, which will returns true when
the counters are equal and false otherwise.
Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Don't use u32, use uint32_t, because this won't work in xfsprogs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
documentation, errseq documentation, kernel-doc support for nested
structure definitions, the removal of lots of crufty kernel-doc support for
unused formats, SPDX tag documentation, the beginnings of a manual for
subsystem maintainers, and lots of fixes and updates.
As usual, some of the changesets reach outside of Documentation/ to effect
kerneldoc comment fixes. It also adds the new LICENSES directory, of which
Thomas promises I do not need to be the maintainer.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJab11TAAoJEI3ONVYwIuV6i1UP/1LgGPHW9Ygq5qaLFbReZd/u
Mx/orrhHX0PdkbCCE+CbL8Vm1m4UKFDTBdlpk3s542zxeeG0ZBXuTnvq4Kyk+cTN
p4/vsIEzk/Ih13/glGE5MlV+EjiEK+8hK69TIUj7bAyuHmpzofjRz9/1M6RLDGDC
HY6UI58AXG0yOQWMWCGRMYpQAFUGij2equ7Doe1ugXRq14dx7V4RsOhI140iRk7t
bquAq1rS2fXniiuPFmLBUe4dWW28isVa/Vl/aXcaWQDKMyT0OLhjOMW36wWKqtPi
WdVCpHv1NLZNyZZr9S3kvfOwW+BUqpEzfVwssyBLW4h0tsnIx0U0HVhSTY8/TvFZ
QD9yCSana4LB/e5CHXIX5lBHbjHxf+rETXqVV4MgwDaMvM3mCo4X6WUTJDmZADo6
vQISEKeb4su5uWAbc9T9xwRSLhZnFVdJ/QuYdNQ5+EpFJYLhzQ9eBvEz6JstSIXL
p9ASBiPNY3ulpVZ8q0JOHJRBhq5mHJH6Dy8achzbILy2l/ZI4b8lJ53mw9II04cp
puF96E6HpvuZ8Tgjjrg9U3ZdxXNrUgc/tjk2ZDkyTglk1XF2jKSq2tiNSZ3oLrJm
XqJPnpCeyJM5UDvwkIBzgC41WEHwe8uvoNbUnc4X7UJSZegFzcSLQXf5qaprHS5k
XeQ7sbd+S+jzVVjFi0W5
=Z15Z
-----END PGP SIGNATURE-----
Merge tag 'docs-4.16' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"Documentation updates for 4.16.
New stuff includes refcount_t documentation, errseq documentation,
kernel-doc support for nested structure definitions, the removal of
lots of crufty kernel-doc support for unused formats, SPDX tag
documentation, the beginnings of a manual for subsystem maintainers,
and lots of fixes and updates.
As usual, some of the changesets reach outside of Documentation/ to
effect kerneldoc comment fixes. It also adds the new LICENSES
directory, of which Thomas promises I do not need to be the
maintainer"
* tag 'docs-4.16' of git://git.lwn.net/linux: (65 commits)
linux-next: docs-rst: Fix typos in kfigure.py
linux-next: DOC: HWPOISON: Fix path to debugfs in hwpoison.txt
Documentation: Fix misconversion of #if
docs: add index entry for networking/msg_zerocopy
Documentation: security/credentials.rst: explain need to sort group_list
LICENSES: Add MPL-1.1 license
LICENSES: Add the GPL 1.0 license
LICENSES: Add Linux syscall note exception
LICENSES: Add the MIT license
LICENSES: Add the BSD-3-clause "Clear" license
LICENSES: Add the BSD 3-clause "New" or "Revised" License
LICENSES: Add the BSD 2-clause "Simplified" license
LICENSES: Add the LGPL-2.1 license
LICENSES: Add the LGPL 2.0 license
LICENSES: Add the GPL 2.0 license
Documentation: Add license-rules.rst to describe how to properly identify file licenses
scripts: kernel_doc: better handle show warnings logic
fs/*/Kconfig: drop links to 404-compliant http://acl.bestbits.at
doc: md: Fix a file name to md-fault.c in fault-injection.txt
errseq: Add to documentation tree
...
Pull dcache updates from Al Viro:
"Neil Brown's d_move()/d_path() race fix"
* 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
VFS: close race between getcwd() and d_move()
Merge updates from Andrew Morton:
- misc fixes
- ocfs2 updates
- most of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
mm: remove PG_highmem description
tools, vm: new option to specify kpageflags file
mm/swap.c: make functions and their kernel-doc agree
mm, memory_hotplug: fix memmap initialization
mm: correct comments regarding do_fault_around()
mm: numa: do not trap faults on shared data section pages.
hugetlb, mbind: fall back to default policy if vma is NULL
hugetlb, mempolicy: fix the mbind hugetlb migration
mm, hugetlb: further simplify hugetlb allocation API
mm, hugetlb: get rid of surplus page accounting tricks
mm, hugetlb: do not rely on overcommit limit during migration
mm, hugetlb: integrate giga hugetlb more naturally to the allocation path
mm, hugetlb: unify core page allocation accounting and initialization
mm/memcontrol.c: try harder to decrease [memory,memsw].limit_in_bytes
mm/memcontrol.c: make local symbol static
mm/hmm: fix uninitialized use of 'entry' in hmm_vma_walk_pmd()
include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer
mm/compaction.c: fix comment for try_to_compact_pages()
mm/page_ext.c: make page_ext_init a noop when CONFIG_PAGE_EXTENSION but nothing uses it
zsmalloc: use U suffix for negative literals being shifted
...
Nothing actually calls userfaultfd_file_create() besides the
userfaultfd() system call itself. So simplify things by folding it into
the system call and using anon_inode_getfd() instead of
anon_inode_getfile(). Do the same in resolve_userfault_fork() as well.
This removes over 50 lines with no change in functionality.
Link: http://lkml.kernel.org/r/20171229212403.22800-1-ebiggers3@gmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implements memfd sealing, similar to shmem:
- WRITE: deny fallocate(PUNCH_HOLE). mmap() write is denied in
memfd_add_seals(). write() doesn't exist for hugetlbfs.
- SHRINK: added similar check as shmem_setattr()
- GROW: added similar check as shmem_setattr() & shmem_fallocate()
Except write() operation that doesn't exist with hugetlbfs, that should
make sealing as close as it can be to shmem support.
Link: http://lkml.kernel.org/r/20171107122800.25517-5-marcandre.lureau@redhat.com
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
hugetlbfs inode information will need to be accessed by code in
mm/shmem.c for file sealing operations. Move inode information
definition from .c file to header for needed access.
Link: http://lkml.kernel.org/r/20171107122800.25517-4-marcandre.lureau@redhat.com
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Those functions are called for memfd files, backed by shmem or hugetlb
(the next patches will handle hugetlb).
Link: http://lkml.kernel.org/r/20171107122800.25517-3-marcandre.lureau@redhat.com
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the modifed pmdp_invalidate() that returns the previous value of pmd
to transfer dirty and accessed bits.
Link: http://lkml.kernel.org/r/20171213105756.69879-12-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Daney <david.daney@cavium.com>
Cc: David Miller <davem@davemloft.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Nitin Gupta <nitin.m.gupta@oracle.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Several users of unmap_mapping_range() would prefer to express their
range in pages rather than bytes. Unfortuately, on a 32-bit kernel, you
have to remember to cast your page number to a 64-bit type before
shifting it, and four places in the current tree didn't remember to do
that. That's a sign of a bad interface.
Conveniently, unmap_mapping_range() actually converts from bytes into
pages, so hoist the guts of unmap_mapping_range() into a new function
unmap_mapping_pages() and convert the callers which want to use pages.
Link: http://lkml.kernel.org/r/20171206142627.GD32044@bombadil.infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reported-by: "zhangyi (F)" <yi.zhang@huawei.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If THP migration is enabled, for a VMA handled by userfaultfd, consider
the following situation,
do_page_fault()
__do_huge_pmd_anonymous_page()
handle_userfault()
userfault_msg()
/* a huge page is allocated and mapped at fault address */
/* the huge page is under migration, leaves migration entry
in page table */
userfaultfd_must_wait()
/* return true because !pmd_present() */
/* may wait in loop until fatal signal */
That is, it may be possible for userfaultfd_must_wait() encounters a PMD
entry which is !pmd_none() && !pmd_present(). In the current
implementation, we will wait for such PMD entries, which may cause
unnecessary waiting, and potential soft lockup.
This is fixed via avoiding to wait when !pmd_none() && !pmd_present(),
only wait when pmd_none().
This may be not a problem in practice, because userfaultfd_must_wait()
is always called with mm->mmap_sem read-locked. mremap() will
write-lock mm->mmap_sem. And UFFDIO_COPY doesn't support to copy THP
mapping. But the change introduced still makes the code more correct,
and makes the PMD and PTE code more consistent.
Link: http://lkml.kernel.org/r/20171207011752.3292-1-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.UK>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If start_code / end_code pointers are screwed then "VmExe" could be
bigger than total executable virtual memory and "VmLib" becomes
negative:
VmExe: 294320 kB
VmLib: 18446744073709327564 kB
VmExe and VmLib documented as text segment and shared library code size.
Now their sum will be always equal to mm->exec_vm which sums size of
executable and not writable and not stack areas.
I've seen this for huge (>2Gb) statically linked binary which has whole
world inside. For it start_code .. end_code range also covers one of
rodata sections. Probably this is bug in customized linker, elf loader
or both.
Anyway CONFIG_CHECKPOINT_RESTORE allows to change these pointers, thus
we cannot trust them without validation.
Link: http://lkml.kernel.org/r/150728955451.743749.11276392315459539583.stgit@buzz
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We should not reuse the dirty bh in jbd2 directly due to the following
situation:
1. When removing extent rec, we will dirty the bhs of extent rec and
truncate log at the same time, and hand them over to jbd2.
2. The bhs are submitted to jbd2 area successfully.
3. The write-back thread of device help flush the bhs to disk but
encounter write error due to abnormal storage link.
4. After a while the storage link become normal. Truncate log flush
worker triggered by the next space reclaiming found the dirty bh of
truncate log and clear its 'BH_Write_EIO' and then set it uptodate in
__ocfs2_journal_access():
ocfs2_truncate_log_worker
ocfs2_flush_truncate_log
__ocfs2_flush_truncate_log
ocfs2_replay_truncate_records
ocfs2_journal_access_di
__ocfs2_journal_access // here we clear io_error and set 'tl_bh' uptodata.
5. Then jbd2 will flush the bh of truncate log to disk, but the bh of
extent rec is still in error state, and unfortunately nobody will
take care of it.
6. At last the space of extent rec was not reduced, but truncate log
flush worker have given it back to globalalloc. That will cause
duplicate cluster problem which could be identified by fsck.ocfs2.
Sadly we can hardly revert this but set fs read-only in case of ruining
atomicity and consistency of space reclaim.
Link: http://lkml.kernel.org/r/5A6E8092.8090701@huawei.com
Fixes: acf8fdbe6a ("ocfs2: do not BUG if buffer not uptodate in __ocfs2_journal_access")
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We should unlock bh_stat if bg->bg_free_bits_count > bg->bg_bits
Link: http://lkml.kernel.org/r/1516843095-23680-1-git-send-email-ge.changwei@h3c.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Return EAGAIN if any of the following checks fail for direct I/O:
- Cannot get the related locks immediately
- Blocks are not allocated at the write location, it will trigger block
allocation and block IO operations.
[ghe@suse.com: v4]
Link: http://lkml.kernel.org/r/1516007283-29932-4-git-send-email-ghe@suse.com
[ghe@suse.com: v2]
Link: http://lkml.kernel.org/r/1511944612-9629-4-git-send-email-ghe@suse.com
Link: http://lkml.kernel.org/r/1511775987-841-4-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add ocfs2_overwrite_io function, which is used to judge if overwrite
allocated blocks, otherwise, the write will bring extra block allocation
overhead.
[ghe@suse.com: v3]
Link: http://lkml.kernel.org/r/1514455665-16325-3-git-send-email-ghe@suse.com
[ghe@suse.com: v2]
Link: http://lkml.kernel.org/r/1511944612-9629-3-git-send-email-ghe@suse.com
Link: http://lkml.kernel.org/r/1511775987-841-3-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: alex chen <alex.chen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "ocfs2: add nowait aio support", v4.
VFS layer has introduced the non-blocking aio flag IOCB_NOWAIT, which
tells the kernel to bail out if an AIO request will block for reasons
such as file allocations, or writeback triggering, or would block while
allocating requests while performing direct I/O.
Subsequently, pwritev2/preadv2 also can leverage this part of kernel
code. So far, ext4/xfs/btrfs have supported this feature. Add the
related code for the ocfs2 file system.
This patch (of 3):
Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which will be
used in non-blocking IO scenarios.
[ghe@suse.com: v2]
Link: http://lkml.kernel.org/r/1511944612-9629-2-git-send-email-ghe@suse.com
Link: http://lkml.kernel.org/r/1511775987-841-2-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Jun Piao <piaojun@huawei.com>
Acked-by: alex chen <alex.chen@huawei.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ocfs2 supports trimming the underlying disk via the fstrim command. But
there is a problem, ocfs2 is a shared disk cluster file system, if the
user configures a scheduled fstrim job on each file system node, this
will trigger multiple nodes trimming a shared disk simultaneously, which
is very wasteful for CPU and IO consumption. This also might negatively
affect the lifetime of poor-quality SSD devices.
So we introduce a trimfs dlm lock to communicate with each other in this
case, which will make only one fstrim command to do the trimming on a
shared disk among the cluster. The fstrim commands from the other nodes
should wait for the first fstrim to finish and return success directly,
to avoid running the same trim on the shared disk again.
Link: http://lkml.kernel.org/r/1513228484-2084-2-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce a new dlm lock resource, which will be used to communicate
during fstrimming of an ocfs2 device from cluster nodes.
Link: http://lkml.kernel.org/r/1513228484-2084-1-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A crash issue was reported by John Lightsey with a call trace as follows:
ocfs2_split_extent+0x1ad3/0x1b40 [ocfs2]
ocfs2_change_extent_flag+0x33a/0x470 [ocfs2]
ocfs2_mark_extent_written+0x172/0x220 [ocfs2]
ocfs2_dio_end_io+0x62d/0x910 [ocfs2]
dio_complete+0x19a/0x1a0
do_blockdev_direct_IO+0x19dd/0x1eb0
__blockdev_direct_IO+0x43/0x50
ocfs2_direct_IO+0x8f/0xa0 [ocfs2]
generic_file_direct_write+0xb2/0x170
__generic_file_write_iter+0xc3/0x1b0
ocfs2_file_write_iter+0x4bb/0xca0 [ocfs2]
__vfs_write+0xae/0xf0
vfs_write+0xb8/0x1b0
SyS_write+0x4f/0xb0
system_call_fastpath+0x16/0x75
The BUG code told that extent tree wants to grow but no metadata was
reserved ahead of time. From my investigation into this issue, the root
cause it that although enough metadata is not reserved, there should be
enough for following use. Rightmost extent is merged into its left one
due to a certain times of marking extent written. Because during
marking extent written, we got many physically continuous extents. At
last, an empty extent showed up and the rightmost path is removed from
extent tree.
Add a new mechanism to reuse extent block cached in dealloc which were
just unlinked from extent tree to solve this crash issue.
Criteria is that during marking extents *written*, if extent rotation
and merging results in unlinking extent with growing extent tree later
without any metadata reserved ahead of time, try to reuse those extents
in dealloc in which deleted extents are cached.
Also, this patch addresses the issue John reported that ::dw_zero_count
is not calculated properly.
After applying this patch, the issue John reported was gone. Thanks for
the reproducer provided by John. And this patch has passed
ocfs2-test(29 cases) suite running by New H3C Group.
[ge.changwei@h3c.com: fix static checker warnning]
Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373F29196AE@H3CMLB12-EX.srv.huawei-3com.com
[akpm@linux-foundation.org: brelse(NULL) is legal]
Link: http://lkml.kernel.org/r/1515479070-32653-2-git-send-email-ge.changwei@h3c.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Reported-by: John Lightsey <john@nixnuts.net>
Tested-by: John Lightsey <john@nixnuts.net>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Current code assume that ::w_unwritten_list always has only one item on.
This is not right and hard to get understood. So improve how to count
unwritten item.
Link: http://lkml.kernel.org/r/1515479070-32653-1-git-send-email-ge.changwei@h3c.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Reported-by: John Lightsey <john@nixnuts.net>
Tested-by: John Lightsey <john@nixnuts.net>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The race between *set_acl and *get_acl will cause getting incomplete
xattr data as below:
processA processB
ocfs2_set_acl
ocfs2_xattr_set
__ocfs2_xattr_set_handle
ocfs2_get_acl_nolock
ocfs2_xattr_get_nolock:
processB may get incomplete xattr data if processA hasn't set_acl done.
So we should use 'ip_xattr_sem' to protect getting extended attribute in
ocfs2_get_acl_nolock(), as other processes could be changing it
concurrently.
Link: http://lkml.kernel.org/r/5A5DDCFF.7030001@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some stack variables are no longer used but still assigned. Trim them.
Link: http://lkml.kernel.org/r/1516105069-12643-1-git-send-email-ge.changwei@h3c.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Reviewed-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need catch the errno returned by ocfs2_xattr_get_nolock() and assign
it to 'ret' for printing and noticing upper callers.
Link: http://lkml.kernel.org/r/5A571CAF.8050709@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Acked-by: Gang He <ghe@suse.com>
Acked-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If metadata is corrupted such as 'invalid inode block', we will get
failed by calling 'mount()' and then set filesystem readonly as below:
ocfs2_mount
ocfs2_initialize_super
ocfs2_init_global_system_inodes
ocfs2_iget
ocfs2_read_locked_inode
ocfs2_validate_inode_block
ocfs2_error
ocfs2_handle_error
ocfs2_set_ro_flag(osb, 0); // set readonly
In this situation we need return -EROFS to 'mount.ocfs2', so that user
can fix it by fsck. And then mount again. In addition, 'mount.ocfs2'
should be updated correspondingly as it only return 1 for all errno.
And I will post a patch for 'mount.ocfs2' too.
Link: http://lkml.kernel.org/r/5A4302FA.2010606@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Reviewed-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stack variable fe is no longer used, so trim it to save some CPU cycles
and stack space.
Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373F1F5A8DD@H3CMLB14-EX.srv.huawei-3com.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the OCFS2_XATTR_ROOT_SIZE macro improves the readability of the
code.
Link: http://lkml.kernel.org/r/5A2E2488.70301@huawei.com
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When some nodes of cluster face with TCP connection fault, ocfs2 will
pick up a quorum to continue to work and other nodes will be fenced by
resetting host.
In order to decide which node should be fenced, ocfs2 leverages
o2quo_state::qs_holds. If that variable is reduced to zero, then a try
to decide if fence local node is performed. However, under a specific
scenario that local node is not disconnected from others at the same
time, above method has a problem to reduce ::qs_holds to zero.
Because, o2net 90s idle timer corresponding to different nodes is
triggered one after another.
node 2 node 3
90s idle timer elapses
clear ::qs_conn_bm
set hold
40s is passed
90 idle timer elapses
clear ::qs_conn_bm
set hold
still up timer elapses
clear hold (NOT to zero )
90s idle timer elapses AGAIN
still up timer elapses.
clear hold
still up timer elapses
To solve this issue, a node which has already be evicted from
::qs_conn_bm can't set hold again and again invoked from idle timer.
Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373F1F3F93B@H3CMLB12-EX.srv.huawei-3com.com
Signed-off-by: Yang Zhang <zhang.yangB@h3c.com>
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add an obvious error message, due to mismatched cluster names between
on-disk and in the current cluster. We can meet this case during OCFS2
cluster migration.
If we can give the user an obvious tip for why they can not mount the
file system after migration, they can quickly fix this mismatch problem.
Second, also move printing ocfs2_fill_super() errno to the front of
ocfs2_dismount_volume(), since ocfs2_dismount_volume() will also print
its own message.
I looked through all the code of OCFS2 (include o2cb); there is not any
place which returns this error. In fact, the function calling path
ocfs2_fill_super -> ocfs2_mount_volume -> ocfs2_dlm_init ->
dlm_new_lockspace is a very specific one. We can use this errno to give
the user a more clear tip, since this case is a little common during
cluster migration, but the customer can quickly get the failure cause if
there is a error printed. Also, I think it is not possible to add this
errno in the o2cb path during ocfs2_dlm_init(), since the o2cb code has
been stable for a long time.
We only print this error tip when the user uses pcmk stack, since using
the o2cb stack the user will not meet this error.
[ghe@suse.com: v2]
Link: http://lkml.kernel.org/r/1495419305-3780-1-git-send-email-ghe@suse.com
Link: http://lkml.kernel.org/r/1495089336-19312-1-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@versity.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's odd that o2net_msg_handler::nh_func_data is declared as type
o2net_msg_handler_func*. So neaten it.
Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373F1F554DA@H3CMLB14-EX.srv.huawei-3com.com
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>