Firstly, check_shmem_swap variable is actually not necessary, because
it's always set with pte_hole hook; checking each would work.
Meanwhile, the check within smaps_pte_entry is not easy to follow.
E.g., pte_none() check is not needed as "!pte_present && !is_swap_pte"
is the same. Since at it, use the pte_hole() helper rather than dup the
page cache lookup.
Still keep the CONFIG_SHMEM part so the code can be optimized to nop for
!SHMEM.
There will be a very slight functional change in smaps_pte_entry(), that
for !SHMEM we'll return early for pte_none (before checking page==NULL),
but that's even nicer.
Link: https://lkml.kernel.org/r/20210917164756.8586-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm/smaps: Fixes and optimizations on shmem swap handling".
This patch (of 3):
The shmem swap calculation on the privately writable mappings are using
wrong parameters as spotted by Vlastimil. Fix them. This was
introduced in commit 48131e03ca ("mm, proc: reduce cost of
/proc/pid/smaps for unpopulated shmem mappings"), when shmem_swap_usage
was reworked to shmem_partial_swap_usage.
Test program:
void main(void)
{
char *buffer, *p;
int i, fd;
fd = memfd_create("test", 0);
assert(fd > 0);
/* isize==2M*3, fill in pages, swap them out */
ftruncate(fd, SIZE_2M * 3);
buffer = mmap(NULL, SIZE_2M * 3, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
assert(buffer);
for (i = 0, p = buffer; i < SIZE_2M * 3 / 4096; i++) {
*p = 1;
p += 4096;
}
madvise(buffer, SIZE_2M * 3, MADV_PAGEOUT);
munmap(buffer, SIZE_2M * 3);
/*
* Remap with private+writtable mappings on partial of the inode (<= 2M*3),
* while the size must also be >= 2M*2 to make sure there's a none pmd so
* smaps_pte_hole will be triggered.
*/
buffer = mmap(NULL, SIZE_2M * 2, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
printf("pid=%d, buffer=%p\n", getpid(), buffer);
/* Check /proc/$PID/smap_rollup, should see 4MB swap */
sleep(1000000);
}
Before the patch, smaps_rollup shows <4MB swap and the number will be
random depending on the alignment of the buffer of mmap() allocated.
After this patch, it'll show 4MB.
Link: https://lkml.kernel.org/r/20210917164756.8586-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20210917164756.8586-2-peterx@redhat.com
Fixes: 48131e03ca ("mm, proc: reduce cost of /proc/pid/smaps for unpopulated shmem mappings")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "ocfs2: Truncate data corruption fix".
As further testing has shown, commit 5314454ea3 ("ocfs2: fix data
corruption after conversion from inline format") didn't fix all the data
corruption issues the customer started observing after 6dbf7bb555
("fs: Don't invalidate page buffers in block_write_full_page()") This
time I have tracked them down to two bugs in ocfs2 truncation code.
One bug (truncating page cache before clearing tail cluster and setting
i_size) could cause data corruption even before 6dbf7bb555, but before
that commit it needed a race with page fault, after 6dbf7bb555 it
started to be pretty deterministic.
Another bug (zeroing pages beyond old i_size) used to be harmless
inefficiency before commit 6dbf7bb555. But after commit 6dbf7bb555
in combination with the first bug it resulted in deterministic data
corruption.
Although fixing only the first problem is needed to stop data
corruption, I've fixed both issues to make the code more robust.
This patch (of 2):
ocfs2_truncate_file() did unmap invalidate page cache pages before
zeroing partial tail cluster and setting i_size. Thus some pages could
be left (and likely have left if the cluster zeroing happened) in the
page cache beyond i_size after truncate finished letting user possibly
see stale data once the file was extended again. Also the tail cluster
zeroing was not guaranteed to finish before truncate finished causing
possible stale data exposure. The problem started to be particularly
easy to hit after commit 6dbf7bb555 "fs: Don't invalidate page buffers
in block_write_full_page()" stopped invalidation of pages beyond i_size
from page writeback path.
Fix these problems by unmapping and invalidating pages in the page cache
after the i_size is reduced and tail cluster is zeroed out.
Link: https://lkml.kernel.org/r/20211025150008.29002-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20211025151332.11301-1-jack@suse.cz
Fixes: ccd979bdbc ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the length of holes reported at the end of a file: the length is
relative to the beginning of the extent, not the seek position which is
rounded down to the filesystem block size.
This bug went unnoticed for some time, but is now caught by the
following assertion in iomap_iter_done():
WARN_ON_ONCE(iter->iomap.offset + iter->iomap.length <= iter->pos)
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Before this patch, evict would clear the iopen glock's gl_object after
releasing the inode glock. In the meantime, another process could reuse
the same block and thus glocks for a new inode. It would lock the inode
glock (exclusively), and then the iopen glock (shared). The shared
locking mode doesn't provide any ordering against the evict, so by the
time the iopen glock is reused, evict may not have gotten to setting
gl_object to NULL.
Fix that by releasing the iopen glock before the inode glock in
gfs2_evict_inode.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>gl_object
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
For creating fattrs with the label field already allocated for us. I
also update nfs_free_fattr() to free the label in the end.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
We're about to add a check in nfs_free_fattr() for whether or not the
label is non-zero.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The return value of xdr_inline_decode() is not being checked, leading to
a potential Oops. Just replace the open coded array decode with the
generic XDR version.
Reported-by: <rtm@csail.mit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The server is supposed to return the same tag that the client sends in
the outgoing RPC call, but we should still sanity check the length just
in case.
Reported-by: <rtm@csail.mit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Replace test_bit() + set_bit() with test_and_set_bit() where we need an atomic
operation. Use clear_and_wake_up_bit() instead of open coding it.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The kernel test robot correctly identifies that we store sqe twice,
remove the earlier one that is done before validating the index.
Fixes: f75d118349 ("io_uring: harder fdinfo sq/cq ring iterating")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move SMB2_SessionSetup, SMB2_Close, SMB2_Read, SMB2_Write and
SMB2_ChangeNotify commands into smbfs_common/smb2pdu.h
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
This file will contain all the definitions we need for SMB2 packets
and will follow the naming convention of MS-SMB2.PDF as closely
as possible to make it easier to cross-reference beween the definitions
and the standard.
The content of this file will mostly consist of migration of existing
definitions in the cifs/smb2.pdu.h and ksmbd/smb2pdu.h files
with some additional tweaks as the two files have diverged.
This patch introduces the new smbfs_common/smb2pdu.h file
and migrates the SMB2 header as well as TREE_CONNECT and TREE_DISCONNECT
to the shared file.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull driver core updates from Greg KH:
"Here is the big set of driver core changes for 5.16-rc1.
All of these have been in linux-next for a while now with no reported
problems.
Included in here are:
- big update and cleanup of the sysfs abi documentation files and
scripts from Mauro. We are almost at the place where we can
properly check that the running kernel's sysfs abi is documented
fully.
- firmware loader updates
- dyndbg updates
- kernfs cleanups and fixes from Christoph
- device property updates
- component fix
- other minor driver core cleanups and fixes"
* tag 'driver-core-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (122 commits)
device property: Drop redundant NULL checks
x86/build: Tuck away built-in firmware under FW_LOADER
vmlinux.lds.h: wrap built-in firmware support under FW_LOADER
firmware_loader: move struct builtin_fw to the only place used
x86/microcode: Use the firmware_loader built-in API
firmware_loader: remove old DECLARE_BUILTIN_FIRMWARE()
firmware_loader: formalize built-in firmware API
component: do not leave master devres group open after bind
dyndbg: refine verbosity 1-4 summary-detail
gpiolib: acpi: Replace custom code with device_match_acpi_handle()
i2c: acpi: Replace custom function with device_match_acpi_handle()
driver core: Provide device_match_acpi_handle() helper
dyndbg: fix spurious vNpr_info change
dyndbg: no vpr-info on empty queries
dyndbg: vpr-info on remove-module complete, not starting
device property: Add missed header in fwnode.h
Documentation: dyndbg: Improve cli param examples
dyndbg: Remove support for ddebug_query param
dyndbg: make dyndbg a known cli param
dyndbg: show module in vpr-info in dd-exec-queries
...
ext4_abort will eventually call ext4_errno_to_code, which translates the
errno to an EXT4_ERR specific error. This means that ext4_abort expects
an errno. By using EXT4_ERR_ here, it gets misinterpreted (as an errno),
and ends up saving EXT4_ERR_EBUSY on the superblock during an abort,
which makes no sense.
ESHUTDOWN will get properly translated to EXT4_ERR_SHUTDOWN, so use that
instead.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Link: https://lore.kernel.org/r/20211026173302.84000-1-krisman@collabora.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Since there are no blocks in an inline data inode, there's no point in
fixing iblocks field in fast commit replay path for this inode.
Similarly, there's no point in fixing any block bitmaps / global block
counters with respect to such an inode. Just bail out from these
functions if an inline data inode is encountered.
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20211015182513.395917-2-harshads@google.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>