Commit Graph

1216 Commits

Author SHA1 Message Date
Baokun Li
ed5d285b3f ext4: make ext4_es_remove_extent() return void
Now ext4_es_remove_extent() never fails, so make it return void.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230424033846.4732-10-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26 19:35:12 -04:00
Baokun Li
de25d6e961 ext4: only update i_reserved_data_blocks on successful block allocation
In our fault injection test, we create an ext4 file, migrate it to
non-extent based file, then punch a hole and finally trigger a WARN_ON
in the ext4_da_update_reserve_space():

EXT4-fs warning (device sda): ext4_da_update_reserve_space:369:
ino 14, used 11 with only 10 reserved data blocks

When writing back a non-extent based file, if we enable delalloc, the
number of reserved blocks will be subtracted from the number of blocks
mapped by ext4_ind_map_blocks(), and the extent status tree will be
updated. We update the extent status tree by first removing the old
extent_status and then inserting the new extent_status. If the block range
we remove happens to be in an extent, then we need to allocate another
extent_status with ext4_es_alloc_extent().

       use old    to remove   to add new
    |----------|------------|------------|
              old extent_status

The problem is that the allocation of a new extent_status failed due to a
fault injection, and __es_shrink() did not get free memory, resulting in
a return of -ENOMEM. Then do_writepages() retries after receiving -ENOMEM,
we map to the same extent again, and the number of reserved blocks is again
subtracted from the number of blocks in that extent. Since the blocks in
the same extent are subtracted twice, we end up triggering WARN_ON at
ext4_da_update_reserve_space() because used > ei->i_reserved_data_blocks.

For non-extent based file, we update the number of reserved blocks after
ext4_ind_map_blocks() is executed, which causes a problem that when we call
ext4_ind_map_blocks() to create a block, it doesn't always create a block,
but we always reduce the number of reserved blocks. So we move the logic
for updating reserved blocks to ext4_ind_map_blocks() to ensure that the
number of reserved blocks is updated only after we do succeed in allocating
some new blocks.

Fixes: 5f634d064c ("ext4: Fix quota accounting error with fallocate")
Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230424033846.4732-2-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26 19:34:56 -04:00
Ritesh Harjani
d19500da4b ext4: Make ext4_write_inline_data_end() use folio
ext4_write_inline_data_end() is completely converted to work with folio.
Also all callers of ext4_write_inline_data_end() already works on folio
except ext4_da_write_end(). Mostly for consistency and saving few
instructions maybe, this patch just converts ext4_da_write_end() to work
with folio which makes the last caller of ext4_write_inline_data_end()
also converted to work with folio.
We then make ext4_write_inline_data_end() take folio instead of page.

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/1bcea771720ff451a5a59b3f1bcd5fae51cb7ce7.1684122756.git.ritesh.list@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-15 00:02:10 -04:00
Ritesh Harjani
80be8c5cc9 ext4: Make mpage_journal_page_buffers use folio
This patch converts mpage_journal_page_buffers() to use folio and also
removes the PAGE_SIZE assumption.

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/ebc3ac80e6a54b53327740d010ce684a83512021.1684122756.git.ritesh.list@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-15 00:02:10 -04:00
Ritesh Harjani
36c9b45040 ext4: Change remaining tracepoints to use folio
ext4_readpage() is converted to ext4_read_folio() hence change the
related tracepoint from trace_ext4_readpage(page) to
trace_ext4_read_folio(folio). Do the same for
trace_ext4_releasepage(page) to trace_ext4_release_folio(folio)

As a minor bit of optimization to avoid an extra dereferencing,
since both of the above functions already were dereferencing
folio->mapping->host, hence change the tracepoint argument to take
(inode, folio).

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/caba2b3c0147bed4ea7706767dc1d19cd0e29ab0.1684122756.git.ritesh.list@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-15 00:02:10 -04:00
Theodore Ts'o
2bc7e7c1a3 ext4: disallow ea_inodes with extended attributes
An ea_inode stores the value of an extended attribute; it can not have
extended attributes itself, or this will cause recursive nightmares.
Add a check in ext4_iget() to make sure this is the case.

Cc: stable@kernel.org
Reported-by: syzbot+e44749b6ba4d0434cd47@syzkaller.appspotmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230524034951.779531-4-tytso@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-30 15:33:57 -04:00
Theodore Ts'o
b3e6bcb945 ext4: add EA_INODE checking to ext4_iget()
Add a new flag, EXT4_IGET_EA_INODE which indicates whether the inode
is expected to have the EA_INODE flag or not.  If the flag is not
set/clear as expected, then fail the iget() operation and mark the
file system as corrupted.

This commit also makes the ext4_iget() always perform the
is_bad_inode() check even when the inode is already inode cache.  This
allows us to remove the is_bad_inode() check from the callers of
ext4_iget() in the ea_inode code.

Reported-by: syzbot+cbb68193bdb95af4340a@syzkaller.appspotmail.com
Reported-by: syzbot+62120febbd1ee3c3c860@syzkaller.appspotmail.com
Reported-by: syzbot+edce54daffee36421b4c@syzkaller.appspotmail.com
Cc: stable@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230524034951.779531-2-tytso@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-28 14:18:03 -04:00
Baokun Li
fa83c34e3e ext4: check iomap type only if ext4_iomap_begin() does not fail
When ext4_iomap_overwrite_begin() calls ext4_iomap_begin() map blocks may
fail for some reason (e.g. memory allocation failure, bare disk write), and
later because "iomap->type ! = IOMAP_MAPPED" triggers WARN_ON(). When ext4
iomap_begin() returns an error, it is normal that the type of iomap->type
may not match the expectation. Therefore, we only determine if iomap->type
is as expected when ext4_iomap_begin() is executed successfully.

Cc: stable@kernel.org
Reported-by: syzbot+08106c4b7d60702dbc14@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/00000000000015760b05f9b4eee9@google.com
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230505132429.714648-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:04 -04:00
Jan Kara
00d873c17e ext4: avoid deadlock in fs reclaim with page writeback
Ext4 has a filesystem wide lock protecting ext4_writepages() calls to
avoid races with switching of journalled data flag or inode format. This
lock can however cause a deadlock like:

CPU0                            CPU1

ext4_writepages()
  percpu_down_read(sbi->s_writepages_rwsem);
                                ext4_change_inode_journal_flag()
                                  percpu_down_write(sbi->s_writepages_rwsem);
                                    - blocks, all readers block from now on
  ext4_do_writepages()
    ext4_init_io_end()
      kmem_cache_zalloc(io_end_cachep, GFP_KERNEL)
        fs_reclaim frees dentry...
          dentry_unlink_inode()
            iput() - last ref =>
              iput_final() - inode dirty =>
                write_inode_now()...
                  ext4_writepages() tries to acquire sbi->s_writepages_rwsem
                    and blocks forever

Make sure we cannot recurse into filesystem reclaim from writeback code
to avoid the deadlock.

Reported-by: syzbot+6898da502aef574c5f8a@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/0000000000004c66b405fa108e27@google.com
Fixes: c8585c6fca ("ext4: fix races between changing inode journal mode and ext4_writepages")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230504124723.20205-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:04 -04:00
Linus Torvalds
06936aaf49 Some ext4 regression and bug fixes for -rc1
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmROracACgkQ8vlZVpUN
 gaPqLQf/ZvzvspL4o3SNsHE/M2tKNBVY/z/vsfmAZwMgrGoK5qCkDsNA7c7+oUwE
 xjiHiVHOaYjJVWwkdODAwe7xNbWB6FoKptBaBi89fAyibMY/N7BZ8rad69NQTvyc
 JbKjorvEBc+qgsUEt2+ZpMogN9KHlVh3NJwlovesmucQtg2gWLKs8wrxW2bC7uAh
 2uR9GWUnhDrs6jHbjHkG3/lgB0aS0StLRxfsbchjZvCsniTDZymLmmgkA1ln17ce
 6iRg2ESjYUryPX09YFtUuQVvObtUTM+z8DzwyQuAJ4VfmdoPA4L6mpdqzPGFuKQc
 gJrLSB8VZJDvPoGjaHZ+Qdl1tHlFRw==
 =2SEf
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Some ext4 regression and bug fixes"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: clean up error handling in __ext4_fill_super()
  ext4: reflect error codes from ext4_multi_mount_protect() to its callers
  ext4: fix lost error code reporting in __ext4_fill_super()
  ext4: fix unused iterator variable warnings
  ext4: fix use-after-free read in ext4_find_extent for bigalloc + inline
  ext4: fix i_disksize exceeding i_size problem in paritally written case
2023-05-01 11:00:04 -07:00
Zhihao Cheng
1dedde6903 ext4: fix i_disksize exceeding i_size problem in paritally written case
It is possible for i_disksize can exceed i_size, triggering a warning.

generic_perform_write
 copied = iov_iter_copy_from_user_atomic(len) // copied < len
 ext4_da_write_end
 | ext4_update_i_disksize
 |  new_i_size = pos + copied;
 |  WRITE_ONCE(EXT4_I(inode)->i_disksize, newsize) // update i_disksize
 | generic_write_end
 |  copied = block_write_end(copied, len) // copied = 0
 |   if (unlikely(copied < len))
 |    if (!PageUptodate(page))
 |     copied = 0;
 |  if (pos + copied > inode->i_size) // return false
 if (unlikely(copied == 0))
  goto again;
 if (unlikely(iov_iter_fault_in_readable(i, bytes))) {
  status = -EFAULT;
  break;
 }

We get i_disksize greater than i_size here, which could trigger WARNING
check 'i_size_read(inode) < EXT4_I(inode)->i_disksize' while doing dio:

ext4_dio_write_iter
 iomap_dio_rw
  __iomap_dio_rw // return err, length is not aligned to 512
 ext4_handle_inode_extension
  WARN_ON_ONCE(i_size_read(inode) < EXT4_I(inode)->i_disksize) // Oops

 WARNING: CPU: 2 PID: 2609 at fs/ext4/file.c:319
 CPU: 2 PID: 2609 Comm: aa Not tainted 6.3.0-rc2
 RIP: 0010:ext4_file_write_iter+0xbc7
 Call Trace:
  vfs_write+0x3b1
  ksys_write+0x77
  do_syscall_64+0x39

Fix it by updating 'copied' value before updating i_disksize just like
ext4_write_inline_data_end() does.

A reproducer can be found in the buganizer link below.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217209
Fixes: 64769240bd ("ext4: Add delayed allocation support in data=writeback mode")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230321013721.89818-1-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-28 11:07:19 -04:00
Linus Torvalds
7fa8a8ee94 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread.
 
 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav.
 
 - zsmalloc performance improvements from Sergey Senozhatsky.
 
 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.
 
 - VFS rationalizations from Christoph Hellwig:
 
   - removal of most of the callers of write_one_page().
 
   - make __filemap_get_folio()'s return value more useful
 
 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing.  Use `mount -o noswap'.
 
 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.
 
 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).
 
 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.
 
 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather
   than exclusive, and has fixed a bunch of errors which were caused by its
   unintuitive meaning.
 
 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.
 
 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.
 
 - Cleanups to do_fault_around() by Lorenzo Stoakes.
 
 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.
 
 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.
 
 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.
 
 - Lorenzo has also contributed further cleanups of vma_merge().
 
 - Chaitanya Prakash provides some fixes to the mmap selftesting code.
 
 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in ->map_page(), a step towards RCUification of pagefaults.
 
 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.
 
 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.
 
 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.
 
 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.
 
 - Yosry Ahmed has improved the performance of memcg statistics flushing.
 
 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.
 
 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.
 
 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.
 
 - Pankaj Raghav has made page_endio() unneeded and has removed it.
 
 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.
 
 - Yosry Ahmed has fixed an issue around memcg's page recalim accounting.
 
 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.
 
 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.
 
 - Peter Xu fixes a few issues with uffd-wp at fork() time.
 
 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA
 jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96
 eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY=
 =J+Dj
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
   switching from a user process to a kernel thread.

 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj
   Raghav.

 - zsmalloc performance improvements from Sergey Senozhatsky.

 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.

 - VFS rationalizations from Christoph Hellwig:
     - removal of most of the callers of write_one_page()
     - make __filemap_get_folio()'s return value more useful

 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing. Use `mount -o noswap'.

 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.

 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).

 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.

 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive
   rather than exclusive, and has fixed a bunch of errors which were
   caused by its unintuitive meaning.

 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.

 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.

 - Cleanups to do_fault_around() by Lorenzo Stoakes.

 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.

 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.

 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.

 - Lorenzo has also contributed further cleanups of vma_merge().

 - Chaitanya Prakash provides some fixes to the mmap selftesting code.

 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in ->map_page(), a step towards RCUification of pagefaults.

 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.

 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.

 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.

 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.

 - Yosry Ahmed has improved the performance of memcg statistics
   flushing.

 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.

 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.

 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.

 - Pankaj Raghav has made page_endio() unneeded and has removed it.

 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.

 - Yosry Ahmed has fixed an issue around memcg's page recalim
   accounting.

 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.

 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.

 - Peter Xu fixes a few issues with uffd-wp at fork() time.

 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.

* tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
  mm,unmap: avoid flushing TLB in batch if PTE is inaccessible
  shmem: restrict noswap option to initial user namespace
  mm/khugepaged: fix conflicting mods to collapse_file()
  sparse: remove unnecessary 0 values from rc
  mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area()
  hugetlb: pte_alloc_huge() to replace huge pte_alloc_map()
  maple_tree: fix allocation in mas_sparse_area()
  mm: do not increment pgfault stats when page fault handler retries
  zsmalloc: allow only one active pool compaction context
  selftests/mm: add new selftests for KSM
  mm: add new KSM process and sysfs knobs
  mm: add new api to enable ksm per process
  mm: shrinkers: fix debugfs file permissions
  mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
  migrate_pages_batch: fix statistics for longterm pin retry
  userfaultfd: use helper function range_in_vma()
  lib/show_mem.c: use for_each_populated_zone() simplify code
  mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list()
  fs/buffer: convert create_page_buffers to folio_create_buffers
  fs/buffer: add folio_create_empty_buffers helper
  ...
2023-04-27 19:42:02 -07:00
Jan Kara
d0ab8368c1 Revert "ext4: Fix warnings when freezing filesystem with journaled data"
After making ext4_writepages() properly clean all pages there is no need
for special treatment of filesystem freezing. Revert commit
e6c28a26b7.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-13-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-14 20:02:37 -04:00
Jan Kara
ab382539ad ext4: Update comment in mpage_prepare_extent_to_map()
Since filemap_write_and_wait() is now enough to get journalled data to
final location update the comment in mpage_prepare_extent_to_map().

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-12-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-14 19:58:33 -04:00
Jan Kara
951cafa6b8 ext4: Simplify handling of journalled data in ext4_bmap()
Now that ext4_writepages() gets journalled data into its final location
we just use filemap_write_and_wait() instead of special handling of
journalled data in ext4_bmap(). We can also drop EXT4_STATE_JDATA flag
as it is not used anymore.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-11-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-14 19:58:33 -04:00
Jan Kara
56c2a0e3d9 ext4: Drop special handling of journalled data from ext4_evict_inode()
Now that ext4_writepages() makes sure journalled data is on stable
storage, write_inode_now() call in iput_final() is enough to make
pagecache pages with journalled data really clean (data committed and
checkpointed). So we can drop special handling of journalled data in
ext4_evict_inode().

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-9-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-14 19:56:53 -04:00
Jan Kara
1f1a55f0bf ext4: Commit transaction before writing back pages in data=journal mode
When journalling data we currently just walk over pages, journal those
that are marked for delayed dirtying (only pinned pages dirtied behing
our back these days) and checkpoint other dirty pages. Because some
pages may be part of running transaction the result is that after
filemap_write_and_wait() we are not guaranteed pages are stable on disk.
Thus places that want to flush current pagecache content need to jump
through hoops to make sure journalled data is not lost. This is
manageable in cases completely controlled by ext4 (such as extent
shifting operations or inode eviction) but it gets ugly for stuff like
fsverity. Furthermore it is rather error prone as people often do not
realize journalled data needs special handling.

So change ext4_writepages() to commit transaction with inode's data
before going through the writeback loop in WB_SYNC_ALL mode. As a result
filemap_write_and_wait() is now really getting pages to stable storage
and makes pagecache pages safe to reclaim. Consequently we can remove
the special handling of journalled data from several places in follow up
patches.

Note that this will make fsync(2) for journalled data more expensive as
we will end up not only committing the transaction we need but also
checkpointing the data (which we may have previously skipped if the data
was part of the running transaction). If we really cared, we would need
to introduce special VFS function for writing out & invalidating page
cache for a range, use ->launder_page callback to perform checkpointing,
and use it from all the places that need this functionality. But at this
point I'm not convinced the complexity is worth it.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-5-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-14 19:56:53 -04:00
Jan Kara
5e1bdea639 ext4: Clear dirty bit from pages without data to write
With journalled data it can happen that checkpointing code will write
out page contents without clearing the page dirty bit. The logic in
ext4_page_nomap_can_writeout() then results in us never calling
mpage_submit_page() and thus clearing the dirty bit. Drop the
optimization with ext4_page_nomap_can_writeout() and just always call to
mpage_submit_page(). ext4_bio_write_page() knows when to redirty the
page and the additional clearing & setting of page dirty bit for ordered
mode writeout is not that expensive to jump through the hoops for it.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-4-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-14 19:55:44 -04:00
Jan Kara
265e72efa9 ext4: Keep pages with journalled data dirty
Currently we clear page dirty bit when we checkpoint some buffers from a
page with journalled data or when we perform delayed dirtying of a page
in ext4_writepages(). In a quest to simplify handling of journalled data
we want to keep page dirty as long as it has either buffers to
checkpoint or journalled dirty data. So make sure to keep page dirty in
ext4_writepages() if it still has journalled data attached to it.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-3-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-14 19:44:00 -04:00
Jan Kara
d84c9ebdac ext4: Mark pages with journalled data dirty
Currently pages with journalled data written by write(2) or modified by
block zeroing during truncate(2) are not marked as dirty. They are
dirtied only once the transaction commits. This however makes writeback
code think inode has no pages to write and so ext4_writepages() is not
called to make pages with journalled data persistent. Mark pages with
journalled data dirty (similarly as it happens for writes through mmap)
so that writeback code knows about them and ext4_writepages() can do
what it needs to to the inode.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230329154950.19720-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-14 19:38:50 -04:00
Matthew Wilcox
9ea0e45bd2 ext4: Use a folio in ext4_page_mkwrite()
Convert to the folio API, saving a few calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-26-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:52 -04:00
Matthew Wilcox
86b38c273c ext4: Convert ext4_block_write_begin() to take a folio
All the callers now have a folio, so pass that in and operate on folios.
Removes four calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230324180129.1220691-25-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:52 -04:00
Matthew Wilcox
c0be8e6f08 ext4: Convert ext4_mpage_readpages() to work on folios
This definitely doesn't include support for large folios; there
are all kinds of assumptions about the number of buffers attached
to a folio.  But it does remove several calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-24-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:52 -04:00
Matthew Wilcox
0b5a254395 ext4: Use a folio in ext4_da_write_begin()
Remove a few calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-23-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:52 -04:00
Matthew Wilcox
02e4b04c56 ext4: Convert ext4_page_nomap_can_writeout to ext4_folio_nomap_can_writeout
Its one caller already uses a folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-22-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
9d3973de9a ext4: Convert __ext4_block_zero_page_range() to use a folio
Use folio APIs throughout.  Saves many calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230324180129.1220691-21-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
86324a2162 ext4: Convert ext4_journalled_zero_new_buffers() to use a folio
Remove a call to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-20-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
feb22b77b8 ext4: Use a folio in ext4_journalled_write_end()
Convert the incoming page to a folio to remove a few calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-19-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
64fb313675 ext4: Convert ext4_write_end() to use a folio
Convert the incoming struct page to a folio.  Replaces two implicit
calls to compound_head() with one explicit call.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-18-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
4d934a5e6c ext4: Convert ext4_write_begin() to use a folio
Remove a lot of calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-17-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
3edde93e07 ext4: Convert ext4_readpage_inline() to take a folio
Use the folio API in this function, saves a few calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-10-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
e8d6062c50 ext4: Convert ext4_bio_write_page() to ext4_bio_write_folio()
The only caller now has a folio so pass it in directly and avoid the call
to page_folio() at the beginning.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-9-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
33483b3b6e ext4: Convert mpage_page_done() to mpage_folio_done()
All callers now have a folio so we can pass one in and use the folio
APIs to support large folios as well as save instructions by eliminating
a call to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-8-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:51 -04:00
Matthew Wilcox
81a0d3e126 ext4: Convert mpage_submit_page() to mpage_submit_folio()
All callers now have a folio so we can pass one in and use the folio
APIs to support large folios as well as save instructions by eliminating
calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-7-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:50 -04:00
Matthew Wilcox
4da2f6e3c4 ext4: Turn mpage_process_page() into mpage_process_folio()
The page/folio is only used to extract the buffers, so this is a
simple change.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-6-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-06 13:39:50 -04:00
Christoph Hellwig
66dabbb65d mm: return an ERR_PTR from __filemap_get_folio
Instead of returning NULL for all errors, distinguish between:

 - no entry found and not asked to allocated (-ENOENT)
 - failed to allocate memory (-ENOMEM)
 - would block (-EAGAIN)

so that callers don't have to guess the error based on the passed in
flags.

Also pass through the error through the direct callers: filemap_get_folio,
filemap_lock_folio filemap_grab_folio and filemap_get_incore_folio.

[hch@lst.de: fix null-pointer deref]
  Link: https://lkml.kernel.org/r/20230310070023.GA13563@lst.de
  Link: https://lkml.kernel.org/r/20230310043137.GA1624890@u2004
Link: https://lkml.kernel.org/r/20230307143410.28031-8-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nilfs2]
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 19:42:42 -07:00
Theodore Ts'o
98ccceee3e ext4: fix comment: "start start" -> "start" in mpage_prepare_extent_to_map()
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-23 23:00:07 -04:00
Jan Kara
e6c28a26b7 ext4: Fix warnings when freezing filesystem with journaled data
Test generic/390 in data=journal mode often triggers a warning that
ext4_do_writepages() tries to start a transaction on frozen filesystem.
This happens because although all dirty data is properly written, jbd2
checkpointing code writes data through submit_bh() and as a result only
buffer dirty bits are cleared but page dirty bits stay set. Later when
the filesystem is frozen, writeback code comes, tries to write
supposedly dirty pages and the warning triggers. Fix the problem by
calling sync_filesystem() once more after flushing the whole journal to
clear stray page dirty bits.

[ Applied fixup patches to address crashes when running data=journal
  tests; see links for more details -- TYT ]

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230308142528.12384-1-jack@suse.cz
Reported-by: Eric Biggers <ebiggers@kernel.org>
Link: https://lore.kernel.org/all/20230319183617.GA896@sol.localdomain
Link: https://lore.kernel.org/r/20230323145404.21381-1-jack@suse.cz
Link: https://lore.kernel.org/r/20230323145404.21381-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-23 22:53:00 -04:00
Jan Kara
3f079114bf ext4: Convert data=journal writeback to use ext4_writepages()
Add support for writeback of journalled data directly into
ext4_writepages() instead of offloading it to write_cache_pages(). This
actually significantly simplifies the code and reduces code duplication.
For checkpointing of committed data we can use ext4_writepages()
rightaway the same way as writeback of ordered data uses it on
transaction commit. For journalling of dirty mapped pages, we need to
add a special case to mpage_prepare_extent_to_map() to add all page
buffers to the journal.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230228051319.4085470-8-tytso@mit.edu
2023-03-23 10:06:08 -04:00
Jan Kara
d8be7607de ext4: Move mpage_page_done() calls after error handling
In case mpage_submit_page() returns error, it doesn't really matter
whether we call mpage_page_done() and then return error or whether we
return directly because in that case page cleanup will be done by
mpage_release_unused_pages() instead. Logically, it makes more sense to
leave the cleanup to mpage_release_unused_pages() because we didn't
succeed in writing the page. So move mpage_page_done() calls after the
error handling.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230228051319.4085470-7-tytso@mit.edu
2023-03-23 10:06:08 -04:00
Jan Kara
eaf2ca10ca ext4: Move page unlocking out of mpage_submit_page()
Move page unlocking during page writeback out of mpage_submit_page()
into the callers. This will allow writeback in data=journal mode to keep
the page locked for a bit longer. Since page unlocking it tightly
connected to increment of mpd->first_page (as that determines cleanup of
locked but unwritten pages), move page unlocking as well as
mpd->first_page handling into a helper function.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230228051319.4085470-6-tytso@mit.edu
2023-03-23 10:06:07 -04:00
Jan Kara
f1496362e9 ext4: Don't unlock page in ext4_bio_write_page()
Do not unlock the written page in ext4_bio_write_page(). Instead leave
the page locked and unlock it in the callers. We'll need to keep the
page locked for data=journal writeback for a bit longer.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230228051319.4085470-5-tytso@mit.edu
2023-03-23 10:06:07 -04:00
Jan Kara
3f5d30636d ext4: Mark page for delayed dirtying only if it is pinned
In data=journal mode, page should be dirtied only when it has buffers
for checkpoint or it is writeably mapped. In the first case, we don't
need to do anything special. In the second case, page was already added
to the journal by ext4_page_mkwrite() and since transaction commit
writeprotects mapped pages again, page should be writeable (and thus
dirtied) only while it is part of the running transaction. So nothing
needs to be done either. The only special case is when someone pins the
page and uses this pin for modifying page data. So recognize this
special case and only then mark the page as having data that needs
adding to the journal.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230228051319.4085470-4-tytso@mit.edu
2023-03-23 10:06:07 -04:00
Jan Kara
c8e8e16dbb ext4: Use nr_to_write directly in mpage_prepare_extent_to_map()
When looking up extent of pages to map in mpage_prepare_extent_to_map()
we count how many pages we still need to find in a copy of
wbc->nr_to_write counter. With more complex page handling for
data=journal mode, it will be easier to use wbc->nr_to_write directly so
that we don't forget to carry over changes back to nr_to_write counter.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230228051319.4085470-3-tytso@mit.edu
2023-03-23 10:06:07 -04:00
Jan Kara
9462f770ed ext4: Update stale comment about write constraints
The comment above do_journal_get_write_access() is very stale. Most of
it just does not refer to what the function does today or how jbd2
works. The bit about transaction handling during write(2) is still
correct so just update the function names in that part and move the
comment to a more appropriate place.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230228051319.4085470-2-tytso@mit.edu
2023-03-23 10:06:07 -04:00
Linus Torvalds
40d0c0901e Bug fixes and regressions for ext4, the most serious of which is a
potential deadlock during directory renames that was introduced during
 the merge window discovered by a combination of syzbot and lockdep.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmQNVwIACgkQ8vlZVpUN
 gaMwmgf/ZAasXZEMV0zaQZa8zP4KvMKZjWe6azkcJg4sb/HG9Q7JzeJDCurhhWUj
 8+QnyUcuKTyWKYWjGf0f5CZaYEM5AZYij41UJzu2qMkz5hVXSqBVuY8KywxuiJv5
 kfuIvQh0Onv0Yrg2qAc52/kZkq1lu2sl/F5ertBWjdpTUXdBUdrCxkUk+1BgQWAj
 vNwi1/+gNuX7RxMboHqYmwXFP39vECd+wteNdsiK1hR8bLqL68duLLq8xQdHt4gS
 sbVmJKR4j2Giw4ZnlYi9RiwKIO0beqocanp+cfOPulyj5mTM8X1lr0uvaLZgx2AF
 lqrS3/5ksp45cRT70qCIz8je70hTSg==
 =nN3T
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Bug fixes and regressions for ext4, the most serious of which is a
  potential deadlock during directory renames that was introduced during
  the merge window discovered by a combination of syzbot and lockdep"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: zero i_disksize when initializing the bootloader inode
  ext4: make sure fs error flag setted before clear journal error
  ext4: commit super block if fs record error when journal record without error
  ext4, jbd2: add an optimized bmap for the journal inode
  ext4: fix WARNING in ext4_update_inline_data
  ext4: move where set the MAY_INLINE_DATA flag is set
  ext4: Fix deadlock during directory rename
  ext4: Fix comment about the 64BIT feature
  docs: ext4: modify the group desc size to 64
  ext4: fix another off-by-one fsmap error on 1k block filesystems
  ext4: fix RENAME_WHITEOUT handling for inline directories
  ext4: make kobj_type structures constant
  ext4: fix cgroup writeback accounting with fs-layer encryption
2023-03-12 08:55:55 -07:00
Ye Bin
1dcdce5919 ext4: move where set the MAY_INLINE_DATA flag is set
The only caller of ext4_find_inline_data_nolock() that needs setting of
EXT4_STATE_MAY_INLINE_DATA flag is ext4_iget_extra_inode().  In
ext4_write_inline_data_end() we just need to update inode->i_inline_off.
Since we are going to add one more caller that does not need to set
EXT4_STATE_MAY_INLINE_DATA, just move setting of EXT4_STATE_MAY_INLINE_DATA
out to ext4_iget_extra_inode().

Signed-off-by: Ye Bin <yebin10@huawei.com>
Cc: stable@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230307015253.2232062-2-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-11 00:44:24 -05:00
Linus Torvalds
b07ce43db6 Improve performance for ext4 by allowing multiple process to perform
direct I/O writes to preallocated blocks by using a shared inode lock
 instead of taking an exclusive lock.
 
 In addition, multiple bug fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmP9gYkACgkQ8vlZVpUN
 gaNN0AgAqwS873C9QX7QQK8tE+VvKT7iteNaJ68c/CMymSP7o5RdalbQRiAsSy/Q
 88PjBFVFQOsIa1d7OAUr50RHQODjOuOz6SJpitKKPnVC89gAzDt7Pk1AQzABjR37
 GY7nneHTQs6fGXLMUz/SlsU+7a08Bz5BeAxVBQxzkRL6D28/sbpT6Iw1tDhUUsug
 0o3kz/RolEopCzjhmH/Fpxt5RlBnTya5yX8IgmfEV3y7CfQ+XcTWgRebqDXxVCBE
 /VCZOl2cv5n4PFlRH8eUihmyO5iu7p9W9ro6HbLEuxQXwcRNY7skONidceim2EYh
 KzWZt59/JAs0DyvRWqZ9irtPDkuYqA==
 =OIYo
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Improve performance for ext4 by allowing multiple process to perform
  direct I/O writes to preallocated blocks by using a shared inode lock
  instead of taking an exclusive lock.

  In addition, multiple bug fixes and cleanups"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix incorrect options show of original mount_opt and extend mount_opt2
  ext4: Fix possible corruption when moving a directory
  ext4: init error handle resource before init group descriptors
  ext4: fix task hung in ext4_xattr_delete_inode
  jbd2: fix data missing when reusing bh which is ready to be checkpointed
  ext4: update s_journal_inum if it changes after journal replay
  ext4: fail ext4_iget if special inode unallocated
  ext4: fix function prototype mismatch for ext4_feat_ktype
  ext4: remove unnecessary variable initialization
  ext4: fix inode tree inconsistency caused by ENOMEM
  ext4: refuse to create ea block when umounted
  ext4: optimize ea_inode block expansion
  ext4: remove dead code in updating backup sb
  ext4: dio take shared inode lock when overwriting preallocated blocks
  ext4: don't show commit interval if it is zero
  ext4: use ext4_fc_tl_mem in fast-commit replay path
  ext4: improve xattr consistency checking and error reporting
2023-02-28 09:05:47 -08:00
Linus Torvalds
d2980d8d82 There is no particular theme here - mainly quick hits all over the tree.
Most notable is a set of zlib changes from Mikhail Zaslonko which enhances
 and fixes zlib's use of S390 hardware support: "lib/zlib: Set of s390
 DFLTCC related patches for kernel zlib".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/QC4QAKCRDdBJ7gKXxA
 jtKdAQCbDCBdY8H45d1fONzQW2UDqCPnOi77MpVUxGL33r+1SAEA807C7rvDEmlf
 yP1Ft+722fFU5jogVU8ZFh+vapv2/gI=
 =Q9YK
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2023-02-20-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:
 "There is no particular theme here - mainly quick hits all over the
  tree.

  Most notable is a set of zlib changes from Mikhail Zaslonko which
  enhances and fixes zlib's use of S390 hardware support: 'lib/zlib: Set
  of s390 DFLTCC related patches for kernel zlib'"

* tag 'mm-nonmm-stable-2023-02-20-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (55 commits)
  Update CREDITS file entry for Jesper Juhl
  sparc: allow PM configs for sparc32 COMPILE_TEST
  hung_task: print message when hung_task_warnings gets down to zero.
  arch/Kconfig: fix indentation
  scripts/tags.sh: fix the Kconfig tags generation when using latest ctags
  nilfs2: prevent WARNING in nilfs_dat_commit_end()
  lib/zlib: remove redundation assignement of avail_in dfltcc_gdht()
  lib/Kconfig.debug: do not enable DEBUG_PREEMPT by default
  lib/zlib: DFLTCC always switch to software inflate for Z_PACKET_FLUSH option
  lib/zlib: DFLTCC support inflate with small window
  lib/zlib: Split deflate and inflate states for DFLTCC
  lib/zlib: DFLTCC not writing header bits when avail_out == 0
  lib/zlib: fix DFLTCC ignoring flush modes when avail_in == 0
  lib/zlib: fix DFLTCC not flushing EOBS when creating raw streams
  lib/zlib: implement switching between DFLTCC and software
  lib/zlib: adjust offset calculation for dfltcc_state
  nilfs2: replace WARN_ONs for invalid DAT metadata block requests
  scripts/spelling.txt: add "exsits" pattern and fix typo instances
  fs: gracefully handle ->get_block not mapping bh in __mpage_writepage
  cramfs: Kconfig: fix spelling & punctuation
  ...
2023-02-23 17:55:40 -08:00
Linus Torvalds
3822a7c409 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X bit.
 
 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.
 
 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes
 
 - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which
   does perform some memcg maintenance and cleanup work.
 
 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".  These filters provide users
   with finer-grained control over DAMOS's actions.  SeongJae has also done
   some DAMON cleanup work.
 
 - Kairui Song adds a series ("Clean up and fixes for swap").
 
 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".
 
 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series.  It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.
 
 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".
 
 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".
 
 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".
 
 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".
 
 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series "mm:
   support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap
   PTEs".
 
 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".
 
 - Sergey Senozhatsky has improved zsmalloc's memory utilization with his
   series "zsmalloc: make zspage chain size configurable".
 
 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.  The previous BPF-based approach had
   shortcomings.  See "mm: In-kernel support for memory-deny-write-execute
   (MDWE)".
 
 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
 
 - T.J.  Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".
 
 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a per-node
   basis.  See the series "Introduce per NUMA node memory error
   statistics".
 
 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage during
   compaction".
 
 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".
 
 - Christoph Hellwig has removed block_device_operations.rw_page() in ths
   series "remove ->rw_page".
 
 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".
 
 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier functions".
 
 - Some pagemap cleanup and generalization work in Mike Rapoport's series
   "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and
   "fixups for generic implementation of pfn_valid()"
 
 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
 
 - Jason Gunthorpe rationalized the GUP system's interface to the rest of
   the kernel in the series "Simplify the external interface for GUP".
 
 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface.  To support this, we'll temporarily be
   printing warnings when people use the debugfs interface.  See the series
   "mm/damon: deprecate DAMON debugfs interface".
 
 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.
 
 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".
 
 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/PoPQAKCRDdBJ7gKXxA
 jlvpAPsFECUBBl20qSue2zCYWnHC7Yk4q9ytTkPB/MMDrFEN9wD/SNKEm2UoK6/K
 DmxHkn0LAitGgJRS/W9w81yrgig9tAQ=
 =MlGs
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
   F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X
   bit.

 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.

 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes

 - Johannes Weiner has a series ("mm: push down lock_page_memcg()")
   which does perform some memcg maintenance and cleanup work.

 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".

   These filters provide users with finer-grained control over DAMOS's
   actions. SeongJae has also done some DAMON cleanup work.

 - Kairui Song adds a series ("Clean up and fixes for swap").

 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".

 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.

 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".

 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".

 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".

 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".

 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series
   "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
   swap PTEs".

 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".

 - Sergey Senozhatsky has improved zsmalloc's memory utilization with
   his series "zsmalloc: make zspage chain size configurable".

 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.

   The previous BPF-based approach had shortcomings. See "mm: In-kernel
   support for memory-deny-write-execute (MDWE)".

 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".

 - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".

 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a
   per-node basis. See the series "Introduce per NUMA node memory error
   statistics".

 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage
   during compaction".

 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".

 - Christoph Hellwig has removed block_device_operations.rw_page() in
   ths series "remove ->rw_page".

 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".

 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier
   functions".

 - Some pagemap cleanup and generalization work in Mike Rapoport's
   series "mm, arch: add generic implementation of pfn_valid() for
   FLATMEM" and "fixups for generic implementation of pfn_valid()"

 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".

 - Jason Gunthorpe rationalized the GUP system's interface to the rest
   of the kernel in the series "Simplify the external interface for
   GUP".

 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface. To support this, we'll temporarily be
   printing warnings when people use the debugfs interface. See the
   series "mm/damon: deprecate DAMON debugfs interface".

 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.

 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".

 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".

* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
  include/linux/migrate.h: remove unneeded externs
  mm/memory_hotplug: cleanup return value handing in do_migrate_range()
  mm/uffd: fix comment in handling pte markers
  mm: change to return bool for isolate_movable_page()
  mm: hugetlb: change to return bool for isolate_hugetlb()
  mm: change to return bool for isolate_lru_page()
  mm: change to return bool for folio_isolate_lru()
  objtool: add UACCESS exceptions for __tsan_volatile_read/write
  kmsan: disable ftrace in kmsan core code
  kasan: mark addr_has_metadata __always_inline
  mm: memcontrol: rename memcg_kmem_enabled()
  sh: initialize max_mapnr
  m68k/nommu: add missing definition of ARCH_PFN_OFFSET
  mm: percpu: fix incorrect size in pcpu_obj_full_size()
  maple_tree: reduce stack usage with gcc-9 and earlier
  mm: page_alloc: call panic() when memoryless node allocation fails
  mm: multi-gen LRU: avoid futile retries
  migrate_pages: move THP/hugetlb migration support check to simplify code
  migrate_pages: batch flushing TLB
  migrate_pages: share more code between _unmap and _move
  ...
2023-02-23 17:09:35 -08:00