Commit Graph

1138711 Commits

Author SHA1 Message Date
Filipe Manana
ceb707da9a btrfs: directly pass the inode to btrfs_is_data_extent_shared()
Currently we pass a root and an inode number as arguments for
btrfs_is_data_extent_shared() and the inode number is always from an
inode that belongs to that root (it wouldn't make sense otherwise).
In every context that we call btrfs_is_data_extent_shared() (fiemap only),
we have an inode available, so directly pass the inode to the function
instead of a root and inode number. This reduces the number of parameters
and it makes the function's signature conform to most other functions we
have.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:39 +01:00
Filipe Manana
a0a5472ad8 btrfs: remove checks for a 0 inode number during backref walking
When doing backref walking to determine if an extent is shared, we are
testing if the inode number, stored in the 'inum' field of struct
share_check, is 0. However that can never be case, since the all instances
of the structure are created at btrfs_is_data_extent_shared(), which
always initializes it with the inode number from a fs tree (and the number
for any inode from any tree can never be 0). So remove the checks.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:38 +01:00
Filipe Manana
c902421927 btrfs: remove checks for a root with id 0 during backref walking
When doing backref walking to determine if an extent is shared, we are
testing the root_objectid of the given share_check struct is 0, but that
is an impossible case, since btrfs_is_data_extent_shared() always
initializes the root_objectid field with the id of the given root, and
no root can have an objectid of 0. So remove those checks.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:38 +01:00
Filipe Manana
206c1d32f3 btrfs: drop redundant bflags initialization when allocating extent buffer
When allocating an extent buffer, at __alloc_extent_buffer(), there's no
point in explicitly assigning zero to the bflags field of the new extent
buffer because we allocated it with kmem_cache_zalloc().

So just remove the redundant initialization, it saves one mov instruction
in the generated assembly code for x86_64 ("movq $0x0,0x10(%rax)").

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:38 +01:00
Filipe Manana
b98c6cd59e btrfs: drop pointless memset when cloning extent buffer
At btrfs_clone_extent_buffer(), before allocating the pages array for the
new extent buffer we are calling memset() to zero out the pages array of
the extent buffer. This is pointless however, because the extent buffer
already has every element in its pages array pointing to NULL, as it was
allocated with kmem_cache_zalloc(). The memset() was introduced with
commit dd137dd1f2 ("btrfs: factor out allocating an array of pages"),
but even before that commit we already depended on the pages array being
initialized to NULL for the error paths that need to call
btrfs_release_extent_buffer().

So remove the memset(), it's useless and slightly increases the object
text size.

Before this change:

   $ size fs/btrfs/extent_io.o
      text	   data	    bss	    dec	    hex	filename
     70580	   5469	     40	  76089	  12939	fs/btrfs/extent_io.o

After this change:

   $ size fs/btrfs/extent_io.o
      text	   data	    bss	    dec	    hex	filename
     70564	   5469	     40	  76073	  12929	fs/btrfs/extent_io.o

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:38 +01:00
Filipe Manana
a2853ffc2e btrfs: skip unnecessary delalloc search during fiemap and lseek
During fiemap and lseek (hole and data seeking), there's no point in
iterating the inode's io tree to count delalloc bits if the inode's
delalloc bytes counter has a value of zero, as that counter is updated
whenever we set a range for delalloc or clear a range from delalloc.

So skip the counting and io tree iteration if the inode's delalloc bytes
counter has a value of zero. This helps save time when processing a file
range corresponding to a hole or prealloc (unwritten) extent.

This patch is part of a series comprised of the following patches:

  btrfs: get the next extent map during fiemap/lseek more efficiently
  btrfs: skip unnecessary extent map searches during fiemap and lseek
  btrfs: skip unnecessary delalloc search during fiemap and lseek

The following test was performed on a release kernel (Debian's default
kernel config) before and after applying those 3 patches.

   # Wrapper to call fiemap in extent count only mode.
   # (struct fiemap::fm_extent_count set to 0)
   $ cat fiemap.c
   #include <stdio.h>
   #include <unistd.h>
   #include <stdlib.h>
   #include <fcntl.h>
   #include <errno.h>
   #include <string.h>
   #include <sys/ioctl.h>
   #include <linux/fs.h>
   #include <linux/fiemap.h>

   int main(int argc, char **argv)
   {
            struct fiemap fiemap = { 0 };
            int fd;

            if (argc != 2) {
                    printf("usage: %s <path>\n", argv[0]);
                    return 1;
            }
            fd = open(argv[1], O_RDONLY);
            if (fd < 0) {
                    fprintf(stderr, "error opening file: %s\n",
                            strerror(errno));
                    return 1;
            }

            /* fiemap.fm_extent_count set to 0, to count extents only. */
            fiemap.fm_length = FIEMAP_MAX_OFFSET;
            if (ioctl(fd, FS_IOC_FIEMAP, &fiemap) < 0) {
                    fprintf(stderr, "fiemap error: %s\n",
                            strerror(errno));
                    return 1;
            }
            close(fd);
            printf("fm_mapped_extents = %d\n", fiemap.fm_mapped_extents);

            return 0;
   }

   $ gcc -o fiemap fiemap.c

And the wrapper shell script that creates a file with many holes and runs
fiemap against it:

   $ cat test.sh
   #!/bin/bash

   DEV=/dev/sdi
   MNT=/mnt/sdi

   mkfs.btrfs -f $DEV
   mount $DEV $MNT

   FILE_SIZE=$((1 * 1024 * 1024 * 1024))
   echo -n > $MNT/foobar
   for ((off = 0; off < $FILE_SIZE; off += 8192)); do
           xfs_io -c "pwrite -S 0xab $off 4K" $MNT/foobar > /dev/null
   done

   # flush all delalloc
   sync

   start=$(date +%s%N)
   ./fiemap $MNT/foobar
   end=$(date +%s%N)
   dur=$(( (end - start) / 1000000 ))
   echo "fiemap took $dur milliseconds"

   umount $MNT

Result before applying patchset:

   fm_mapped_extents = 131072
   fiemap took 63 milliseconds

Result after applying patchset:

   fm_mapped_extents = 131072
   fiemap took 39 milliseconds   (-38.1%)

Running the same test for a 512M file instead of a 1G file, gave the
following results.

Result before applying patchset:

   fm_mapped_extents = 65536
   fiemap took 29 milliseconds

Result after applying patchset:

   fm_mapped_extents = 65536
   fiemap took 20 milliseconds    (-31.0%)

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:38 +01:00
Filipe Manana
013f9c70d2 btrfs: skip unnecessary extent map searches during fiemap and lseek
If we have no outstanding extents it means we don't have any extent maps
corresponding to delalloc that is flushing, as when an ordered extent is
created we increment the number of outstanding extents to 1 and when we
remove the ordered extent we decrement them by 1. So skip extent map tree
searches if the number of outstanding ordered extents is 0, saving time as
the tree is not empty if we have previously made some reads or flushed
delalloc, as in those cases it can have a very large number of extent maps
for files with many extents.

This helps save time when processing a file range corresponding to a hole
or prealloc (unwritten) extent.

The next patch in the series has a performance test in its changelog and
its subject is:

    "btrfs: skip unnecessary delalloc search during fiemap and lseek"

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:38 +01:00
Filipe Manana
d47704bd1c btrfs: get the next extent map during fiemap/lseek more efficiently
At find_delalloc_subrange(), when we need to get the next extent map, we
do a full search on the extent map tree (a red black tree). This is fine
but it's a lot more efficient to simply use rb_next(), which typically
requires iterating over less nodes of the tree and never needs to compare
the ranges of nodes with the one we are looking for.

So add a public helper to extent_map.{h,c} to get the extent map that
immediately follows another extent map, using rb_next(), and use that
helper at find_delalloc_subrange().

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:38 +01:00
Qu Wenruo
88074c8b13 btrfs: raid56: make it more explicit that cache rbio should have all its data sectors uptodate
For Btrfs RAID56, we have a caching system for btrfs raid bios (rbio).

We call cache_rbio_pages() to mark a qualified rbio ready for cache.

The timing happens at:

- finish_rmw()

  At this timing, we have already read all necessary sectors, along with
  the rbio sectors, we have covered all data stripes.

- __raid_recover_end_io()

  At this timing, we have rebuild the rbio, thus all data sectors
  involved (either from stripe or bio list) are uptodate now.

Thus at the timing of cache_rbio_pages(), we should have all data
sectors uptodate.

This patch will make it explicit that all data sectors are uptodate at
cache_rbio_pages() timing, mostly to prepare for the incoming
verification at RMW time.

This patch will add:

- Extra ASSERT()s in cache_rbio_pages()
  This is to make sure all data sectors, which are not covered by bio,
  are already uptodate.

- Extra ASSERT()s in steal_rbio()
  Since only cached rbio can be stolen, thus every data sector should
  already be uptodate in the source rbio.

- Update __raid_recover_end_io() to update recovered sector->uptodate
  Previously __raid_recover_end_io() will only mark failed sectors
  uptodate if it's doing an RMW.

  But this can trigger new ASSERT()s, as for recovery case, a recovered
  failed sector will not be marked uptodate, and trigger ASSERT() in
  later cache_rbio_pages() call.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:38 +01:00
Qu Wenruo
797d74b749 btrfs: raid56: allocate memory separately for rbio pointers
Currently inside alloc_rbio(), we allocate a larger memory to contain
the following members:

- struct btrfs_raid_rbio itself
- stripe_pages array
- bio_sectors array
- stripe_sectors array
- finish_pointers array

Then update rbio pointers to point the extra space after the rbio
structure itself.

Thus it introduced a complex CONSUME_ALLOC() macro to help the thing.

This is too hacky, and is going to make later pointers expansion harder.

This patch will change it to use regular kcalloc() for each pointer
inside btrfs_raid_bio, making the later expansion much easier.

And introduce a helper free_raid_bio_pointers() to free up all the
pointer members in btrfs_raid_bio, which will be used in both
free_raid_bio() and error path of alloc_rbio().

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:38 +01:00
Qu Wenruo
ff2b64a22a btrfs: raid56: cleanup for function __free_raid_bio()
The cleanup involves two things:

- Remove the "__" prefix
  There is no naming confliction.

- Remove the forward declaration
  There is no special function call involved.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:38 +01:00
Josef Bacik
765c3fe99b btrfs: introduce BTRFS_RESERVE_FLUSH_EMERGENCY
Inside of FB, as well as some user reports, we've had a consistent
problem of occasional ENOSPC transaction aborts.  Inside FB we were
seeing ~100-200 ENOSPC aborts per day in the fleet, which is a really
low occurrence rate given the size of our fleet, but it's not nothing.

There are two causes of this particular problem.

First is delayed allocation.  The reservation system for delalloc
assumes that contiguous dirty ranges will result in 1 file extent item.
However if there is memory pressure that results in fragmented writeout,
or there is fragmentation in the block groups, this won't necessarily be
true.  Consider the case where we do a single 256MiB write to a file and
then close it.  We will have 1 reservation for the inode update, the
reservations for the checksum updates, and 1 reservation for the file
extent item.  At some point later we decide to write this entire range
out, but we're so fragmented that we break this into 100 different file
extents.  Since we've already closed the file and are no longer writing
to it there's nothing to trigger a refill of the delalloc block rsv to
satisfy the 99 new file extent reservations we need.  At this point we
exhaust our delalloc reservation, and we begin to steal from the global
reserve.  If you have enough of these cases going in parallel you can
easily exhaust the global reserve, get an ENOSPC at
btrfs_alloc_tree_block() time, and then abort the transaction.

The other case is the delayed refs reserve.  The delayed refs reserve
updates its size based on outstanding delayed refs and dirty block
groups.  However we only refill this block reserve when returning
excess reservations and when we call btrfs_start_transaction(root, X).
We will reserve 2*X credits at transaction start time, and fill in X
into the delayed refs reserve to make sure it stays topped off.
Generally this works well, but clearly has downsides.  If we do a
particularly delayed ref heavy operation we may never catch up in our
reservations.  Additionally running delayed refs generates more delayed
refs, and at that point we may be committing the transaction and have no
way to trigger a refill of our delayed refs rsv.  Then a similar thing
occurs with the delalloc reserve.

Generally speaking we well over-reserve in all of our block rsvs.  If we
reserve 1 credit we're usually reserving around 264k of space, but we'll
often not use any of that reservation, or use a few blocks of that
reservation.  We can be reasonably sure that as long as you were able to
reserve space up front for your operation you'll be able to find space
on disk for that reservation.

So introduce a new flushing state, BTRFS_RESERVE_FLUSH_EMERGENCY.  This
gets used in the case that we've exhausted our reserve and the global
reserve.  It simply forces a reservation if we have enough actual space
on disk to make the reservation, which is almost always the case.  This
keeps us from hitting ENOSPC aborts in these odd occurrences where we've
not kept up with the delayed work.

Fixing this in a complete way is going to be relatively complicated and
time consuming.  This patch is what I discussed with Filipe earlier this
year, and what I put into our kernels inside FB.  With this patch we're
down to 1-2 ENOSPC aborts per week, which is a significant reduction.
This is a decent stop gap until we can work out a more wholistic
solution to these two corner cases.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:38 +01:00
Josef Bacik
7a66eda351 btrfs: move the btrfs_verity_descriptor_item defs up in ctree.h
These are wrapped in CONFIG_FS_VERITY, but we can have the definitions
without verity enabled.  Move these definitions up with the other
accessor helpers.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:37 +01:00
Josef Bacik
890d2b1aa3 btrfs: move btrfs_next_old_item into ctree.c
This uses btrfs_header_nritems, which I will be moving out of ctree.h.
In order to avoid needing to include the relevant header in ctree.h,
simply move this helper function into ctree.c.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ rename parameters ]
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:37 +01:00
Josef Bacik
eda517fd0c btrfs: move free space cachep's out of ctree.h
This is local to the free-space-cache.c code, remove it from ctree.h and
inode.c, create new init/exit functions for the cachep, and move it
locally to free-space-cache.c.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:37 +01:00
Josef Bacik
226463d7b1 btrfs: move btrfs_path_cachep out of ctree.h
This is local to the ctree code, remove it from ctree.h and inode.c,
create new init/exit functions for the cachep, and move it locally to
ctree.c.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:37 +01:00
Josef Bacik
956504a331 btrfs: move trans_handle_cachep out of ctree.h
This is local to the transaction code, remove it from ctree.h and
inode.c, create new helpers in the transaction to handle the init work
and move the cachep locally to transaction.c.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:37 +01:00
Josef Bacik
f60acad355 btrfs: move btrfs_print_data_csum_error into inode.c
This isn't used outside of inode.c, there's no reason to define it in
btrfs_inode.h. Drop the inline and add __cold as it's for errors that
are not in any hot path.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:37 +01:00
Josef Bacik
f1e5c6185c btrfs: move flush related definitions to space-info.h
This code is used in space-info.c, move the definitions to space-info.h.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:37 +01:00
Josef Bacik
06d61cb101 btrfs: move btrfs_should_fragment_free_space into block-group.c
This function uses functions that are not defined in block-group.h, move
it into block-group.c in order to keep the header clean.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:37 +01:00
Josef Bacik
390d89ccf6 btrfs: move discard stat defs to free-space-cache.h
These definitions are used for discard statistics, move them out of
ctree.h and put them in free-space-cache.h.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:37 +01:00
Josef Bacik
ed4c491a3d btrfs: move BTRFS_MAX_MIRRORS into scrub.c
This is only used locally in scrub.c, move it out of ctree.h into
scrub.c.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:37 +01:00
Josef Bacik
ad4b63caf5 btrfs: move maximum limits to btrfs_tree.h
We have maximum link and name length limits, move these to btrfs_tree.h
as they're on disk limitations.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ reformat comments ]
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:37 +01:00
Josef Bacik
51129b33d3 btrfs: move btrfs_get_block_group helper out of disk-io.h
This inline helper calls btrfs_fs_compat_ro(), which is defined in
another header.  To avoid weird header dependency problems move this
helper into disk-io.c with the rest of the global root helpers.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:36 +01:00
Josef Bacik
4300c58f80 btrfs: move btrfs on-disk definitions out of ctree.h
The bulk of our on-disk definitions exist in btrfs_tree.h, which user
space can use.  Keep things consistent and move the rest of the on disk
definitions out of ctree.h into btrfs_tree.h.  Note I did have to update
all u8's to __u8, but otherwise this is a strict copy and paste.

Most of the definitions are mainly for internal use and are not
guaranteed stable public API and may change as we need. Compilation
failures by user applications can happen.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ reformat comments, style fixups ]
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:36 +01:00
Josef Bacik
4ce76e8e78 btrfs: remove unused BTRFS_IOPRIO_READA
The last user of this definition was removed in patch f26c923860
("btrfs: remove reada infrastructure") so we can remove this definition.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:36 +01:00
Josef Bacik
ea206640a6 btrfs: remove unused BTRFS_TOTAL_BYTES_PINNED_BATCH
This hasn't been used since 138a12d865 ("btrfs: rip out
btrfs_space_info::total_bytes_pinned") so it is safe to remove.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:36 +01:00
Josef Bacik
d60d956eb4 btrfs: remove unused set/clear_pending_info helpers
The last users of these helpers were removed in 5297199a8b ("btrfs:
remove inode number cache feature") so delete these helpers.

The point was for mount options that were applicable after transaction
commit so they could not be applied immediately. We don't have such
options anymore and if we do the patch can be reverted.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:36 +01:00
Peng Hao
c1b078545e btrfs: simplify cleanup after error in btrfs_create_tree
Since leaf is already NULL, and no other branch will go to fail_unlock,
the fail_unlock label is useless and can be removed

Signed-off-by: Peng Hao <flyingpeng@tencent.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:36 +01:00
Josef Bacik
e5e886bad9 btrfs: add cached_state to read_extent_buffer_subpage
We don't use a cached state here at all, which generally makes sense as
async reads are going to unlock at endio time.  However for blocking
reads we will call wait_extent_bit() for our range.  Since the
lock_extent() stuff will return the cached_state for the start of the
range this is a helpful optimization to have for this case, we'll have
the exact state we want to wait on.  Add a cached state here and simply
throw it away if we're a non-blocking read, otherwise we'll get a small
improvement by eliminating some tree searches.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:36 +01:00
Josef Bacik
123a7f008c btrfs: cache the failed state when locking extents
Currently if we fail to lock a range we'll return the start of the range
that we failed to lock.  We'll then search down to this range and wait
on any extent states in this range.

However we can avoid this search altogether if we simply cache the
extent_state that had the contention.  We can pass this into
wait_extent_bit() and start from that extent_state without doing the
search.  In the most optimistic case we can avoid all searches, more
likely we'll avoid the initial search and have to perform the search
after we wait on the failed state, or worst case we must search both
times which is what currently happens.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:36 +01:00
Josef Bacik
9c5c960463 btrfs: use a cached_state everywhere in relocation
All of the relocation code avoids using the cached state, despite
everywhere using the normal

  lock_extent()
  // do something
  unlock_extent()

pattern.  Fix this by plumbing a cached state throughout all of these
functions in order to allow for less tree searches.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:36 +01:00
Josef Bacik
632ddfa213 btrfs: use cached_state for btrfs_check_nocow_lock
Now that try_lock_extent() takes a cached_state, plumb the cached_state
through btrfs_try_lock_ordered_range() and then use a cached_state in
btrfs_check_nocow_lock everywhere to avoid extra tree searches on the
extent_io_tree.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:36 +01:00
Josef Bacik
83ae4133ac btrfs: add a cached_state to try_lock_extent
With nowait becoming more pervasive throughout our codebase go ahead and
add a cached_state to try_lock_extent().  This allows us to be faster
about clearing the locked area if we have contention, and then gives us
the same optimization for unlock if we are able to lock the range.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05 18:00:35 +01:00
Linus Torvalds
76dcd734ec Linux 6.1-rc8 2022-12-04 14:48:12 -08:00
Linus Torvalds
0ba09b1733 Revert "mm: align larger anonymous mappings on THP boundaries"
This reverts commit f35b5d7d67.

It has been reported to cause huge performance regressions on some loads
(will-it-scale.per_process_ops, but also building the kernel with
clang).

The commit did speed up gcc builds by a small amount, so it's not an
unambiguous regression, but until the big regressions are understood,
let's revert it.

Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/r/202210181535.7144dd15-yujie.liu@intel.com
Reported-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/lkml/Y1DNQaoPWxE%2BrGce@dev-arch.thelio-3990X/
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-04 12:51:59 -08:00
Jan Dabros
23393c6461 char: tpm: Protect tpm_pm_suspend with locks
Currently tpm transactions are executed unconditionally in
tpm_pm_suspend() function, which may lead to races with other tpm
accessors in the system.

Specifically, the hw_random tpm driver makes use of tpm_get_random(),
and this function is called in a loop from a kthread, which means it's
not frozen alongside userspace, and so can race with the work done
during system suspend:

  tpm tpm0: tpm_transmit: tpm_recv: error -52
  tpm tpm0: invalid TPM_STS.x 0xff, dumping stack for forensics
  CPU: 0 PID: 1 Comm: init Not tainted 6.1.0-rc5+ #135
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
  Call Trace:
   tpm_tis_status.cold+0x19/0x20
   tpm_transmit+0x13b/0x390
   tpm_transmit_cmd+0x20/0x80
   tpm1_pm_suspend+0xa6/0x110
   tpm_pm_suspend+0x53/0x80
   __pnp_bus_suspend+0x35/0xe0
   __device_suspend+0x10f/0x350

Fix this by calling tpm_try_get_ops(), which itself is a wrapper around
tpm_chip_start(), but takes the appropriate mutex.

Signed-off-by: Jan Dabros <jsd@semihalf.com>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/all/c5ba47ef-393f-1fba-30bd-1230d1b4b592@suse.cz/
Cc: stable@vger.kernel.org
Fixes: e891db1a18 ("tpm: turn on TPM on suspend for TPM 1.x")
[Jason: reworked commit message, added metadata]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-04 12:49:13 -08:00
Linus Torvalds
0c3b5bcb48 - Fix a use-after-free case where the perf pending task callback would
see an already freed event
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOMqHUACgkQEsHwGGHe
 VUpRzw/9Gow+0wbm2XhMuweUA6t3LgNweOmzDl9w8k1f55OD6niCvuDiF9jSaiKZ
 UwGyErasp2dlEVjuNGnp42qSHos3vRiR7sdZZQG+7opWV2FFyxyFpx5x8UEgVnFy
 gOuEij5vLXBApUdNRAcVqCbvivs4Lv6SggDyQ075zGzuOmUv57vw2jDt8YfKaFcp
 jZTiL+j5GKwihndDB6ayx+7Gwo9a9ASKrTgz8JK2tPOIHZR4X9y9ot1IanZnxzwF
 d0kFpLgF/ZqjPRpJoaFn/jgk1AfahQyYHXh7lQ1aP7rLSLRRGcfTBX4n9nC3BYT+
 EHaA94l151L1mzbR69ij9tryAERU4NlguD/FIuCeW+6IEPiuwBNGklXF+rRegNj4
 IYC0ZSld/NyWKtOrwNSrFRMsxFm583Pg6TaBkvU1rGd5YVQ7GImrj7UjecXO/W71
 iXpfarF7ur2zmd+5+F5FB34VYw8GumRo+D+XIb34+8UMBURTX36hgXvSC3sVyyCw
 b0c758F3+1zTwm8z52T1RhOOp47t5iWAznwTq6k1cT7788PDXJ9sGYXIpdLpwKcI
 Fuj61alwamGeUciCr0iKGtCLRHayZII7OeQh1VjXuqgCwI3hI2j3EaI9C74WSApn
 ttVInS0Ka2xcu//A1VFltkMOWNMQK9JeTlqdqctwypTL3WVb2XA=
 =jo4r
 -----END PGP SIGNATURE-----

Merge tag 'perf_urgent_for_v6.1_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fix from Borislav Petkov:

 - Fix a use-after-free case where the perf pending task callback would
   see an already freed event

* tag 'perf_urgent_for_v6.1_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix perf_pending_task() UaF
2022-12-04 12:36:23 -08:00
Linus Torvalds
eea8bebd51 - Revert a fix to RISC-V timers supposed to address an uncertainty
whether clock events are received during S3 or not which locks up other
 RISC-V platforms. The issue will be fixed differently later.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOMnhwACgkQEsHwGGHe
 VUrQsBAAt8ILDPAfXSvnak9p6qWywfa5/eMzychUFx4z8UBaTgeTTq5MHIADSkTj
 m4vSroX/IzhEzYMDZtLTrggNKta0pTddQkw1wZXZztu4QeixHkGyHrVOaGaWLbPd
 8Z+D4yjPuhdvCP8cXq0X3YynmNRDOaNMZMuwq9AtZplZHmBHTdSpDFU5ZbhSlhPT
 DXABL5wVOJ1lOzGxtUPCjzgGj/Vo7wSfwA+XGCprj7+1/CO9iMF6LaFhnAf4huLl
 alscLysRxbonZ/HKydWFMLMWo7/hcb2kr69QZ2qWlIfCSXHIDE3jF7m/7lpF0FrQ
 Ggn9DrcS5uTLWwxZEnbHqKJKQ+JNz9S9gBY2pv6omKVhT0iGkCI9V/h/26QhY6DK
 4MQ0PeV+Jrb9rpl6xv41Zqg9S+JzjOrnJPSDgJKK+DUPba4L6vkEJvOErQ1rfB+N
 3E/+s2IKXjHFz281jtwmjM37lcNxI8ULvvrw3o8SVxPVyldWJyXIgOndSvYMsKpg
 0usXPiTruNc+l39WP2Gf1GgEKlML21GfeFMeuC8ekbTnIYRbHbgUYRhOrOv0nOf8
 KUDmp4J/0Ko478jQG2WsTXQ77KM88X2tWdTgfqVpj+Tl+FvVvDuLuREbnBFijBnd
 ED/Hyu6i5snCe8ZMwqOHezFBFAZrMgUZwcnvR9NwvIxmzd8Nr0M=
 =Urji
 -----END PGP SIGNATURE-----

Merge tag 'timers_urgent_for_v6.1_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Borislav Petkov:

 - Revert a fix to RISC-V timers supposed to address an uncertainty
   whether clock events are received during S3 or not which locks up
   other RISC-V platforms. The issue will be fixed differently later.

* tag 'timers_urgent_for_v6.1_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "clocksource/drivers/riscv: Events are stopped during CPU suspend"
2022-12-04 12:33:44 -08:00
Linus Torvalds
ae6bb71711 powerpc fixes for 6.1 #6
- Fix oops in 32-bit BPF tail call tests
  - Add missing declaration for machine_check_early_boot()
 
 Thanks to: Christophe Leroy, Naveen N. Rao.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmOMnI8THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgN7mD/9+9S6UugT20o82MtWLuyBMsNC+ILcf
 op23x6uYXp9aS/1/8ktRFhH3C6r74QvRxvnDs6He/Ai7jz4jlYNLNT2SfJ70WZGy
 SLlQKJwV1eUz502zCqV0s5/G/77wlrEfCrL3wToa0G6aw8/u1ECtSSXFx/fCAG0K
 jehHASbGxcyJiNZboMPJ940CXdeQNIK0ICbpi06Qvr4Uc3tybeICzgWGNzBROGIH
 q+kdOrMN2mF5LQONOKyC2OI94CNWJmBdqTTOle0jqWJT6h23vh4Oys+oO6uHyszU
 6+qM1Ze4oLrU/oCaQXA1y4oFB2VL1pkFo2aO1rRot+GuqAIjy7BJlUdSfB6hQto5
 JmtU0u7p3gBxU8BhMxufErNYODiXYQ0IVRR1YWHBoRs9aXyyQ/I0Ux46hvMI8qEN
 HyUt4wPWUt1L0QDP9hlGjSbwz/rs0lyDc+L46+TH45CozSHssWSQo5VRSmuyMuQ1
 juQgNvydDdC7S+JIa6ppmNmhLlIm0jGyp7fM5SN4bopcc3bBT6LMIK+Am1O0bWsU
 petyukpuXD8kjxDazid56JkJNva0nnhScLfTQuvxxkViiVyCcuXSZh8g8/C1iMmk
 7apvijphBG3ZwC0JZbwJjQxsi74uzdN8vxJMc7kZilY/qMngjVE3NoI9rPgbwvi+
 lYdwu5F1KnxxAA==
 =Ea1J
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix oops in 32-bit BPF tail call tests

 - Add missing declaration for machine_check_early_boot()

Thanks to Christophe Leroy and Naveen N. Rao.

* tag 'powerpc-6.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: Add missing declaration for machine_check_early_boot()
  powerpc/bpf/32: Fix Oops on tail call tests
2022-12-04 12:24:58 -08:00
Linus Torvalds
50f36c5aa1 Input updates for v6.1-rc7
- a fix for Raydium touchscreen driver to stop leaking memory when
   sending commands to the chip.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQST2eWILY88ieB2DOtAj56VGEWXnAUCY4wjZwAKCRBAj56VGEWX
 nJI2AQCcfR1zDi8yQOtR2KjKK0DJX7QmKQVK/SbodlFNUnYNPgD8CpDM67vH6Sle
 g2TpbVin8186G+a1PBu21NbdbicHYQA=
 =AbZG
 -----END PGP SIGNATURE-----

Merge tag 'input-for-v6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input fix from Dmitry Torokhov:

 - a fix for Raydium touchscreen driver to stop leaking memory when
   sending commands to the chip

* tag 'input-for-v6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: raydium_ts_i2c - fix memory leak in raydium_i2c_send()
2022-12-04 12:18:37 -08:00
Linus Torvalds
c2bf05db6c I2C has this time a power state fix in the core for ACPI devices, a
regression fix regarding bus recovery for the cadence driver, a DMA
 handling fix for the imx driver, and two error path fixes (npcm7xx and
 qcom-geni).
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEOZGx6rniZ1Gk92RdFA3kzBSgKbYFAmOLav8PHHdzYUBrZXJu
 ZWwub3JnAAoJEBQN5MwUoCm2s7gP+wZTI7tJutMLBHOPji2lvoinD9J19Gk7CLbU
 95DrL54VSpt9QB3FzH5du7fsEmmpKaepcG6hXDiG3XNoGyBBPxd8tmEU/SOnyvRI
 ucMIBb+DNum+CEWdf3XyTu3VTMpi4iuvxOTlhFkvfZqZXIsjSoiCjVOtA/npiW25
 svAflgdm69eBNpr6/w5oJbCsh+cRzmV8V3Un2iwouWV0kUWTlDU11Iu93snzUSEe
 fPFOJZVm3R8gyALTlE4v0i7irRWaeKKuoS+dpV5h/hComqL+lvZ4jc+KCiYetimE
 jhdWz9RjgX3FKnCk5zap1lagdjDcJ3L0s4m4/LFm7t/OJiLJEkVByqgrftlR3FhM
 T4aFFYPegsbvcXz4Gmx4cMILbzIYoh3mN4uaspmCLi3B9fe7NK8iRLN66DmMfoKI
 HCZ8FbWuUFk2w/2pPaz5GKfwXDO2YUgKtANdn+zHK8wWJnNQzGPGVkL1XHJeFbJS
 dXNka6YITm2Tra3MePT+ra3SfACfS2fGBgH8s0tnyaRQNOUYI6fqokM1IGrCKbTr
 nEN8VXIWFVm+3++AlVJcQw26DIN0jGE2PRJQhyiZnxTyryvW6yrkE/KFOOMKA7Ro
 5CMccDjY2pkLO9uQmpqeOsQscQh3X36gN8TL2RkTTdB97t71XX+//CkBE9VcJOPM
 Ovw2YYWO
 =WDqW
 -----END PGP SIGNATURE-----

Merge tag 'i2c-for-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "A power state fix in the core for ACPI devices, a regression fix
  regarding bus recovery for the cadence driver, a DMA handling fix for
  the imx driver, and two error path fixes (npcm7xx and qcom-geni)"

* tag 'i2c-for-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: imx: Only DMA messages with I2C_M_DMA_SAFE flag set
  i2c: qcom-geni: fix error return code in geni_i2c_gpi_xfer
  i2c: cadence: Fix regression with bus recovery
  i2c: Restore initial power state if probe fails
  i2c: npcm7xx: Fix error handling in npcm_i2c_init()
2022-12-03 13:51:37 -08:00
Linus Torvalds
6085bc9579 dax fixes for v6.1-rc8
- Fix duplicate overlapping device-dax instances for HMAT described
   "Soft Reserved" Memory
 
 - Fix missing node targets in the sysfs representation of memory tiers
 
 - Remove a confusing variable initialization
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCY4q2jAAKCRDfioYZHlFs
 Z1P/AQCbMguw+Nj0oTj64TxvrJ6JjFbmJXI8YTFuSt7yOK4XLgD+OlH4SmZyQ1rH
 HSY2kAl1mPKiqdoO0tKwcNtYYrOZtQQ=
 =4hxx
 -----END PGP SIGNATURE-----

Merge tag 'dax-fixes-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull dax fixes from Dan Williams:
 "A few bug fixes around the handling of "Soft Reserved" memory and
  memory tiering information.

  Linux is starting to enounter more real world systems that deploy an
  ACPI HMAT to describe different performance classes of memory, as well
  the "special purpose" (Linux "Soft Reserved") designation from EFI.

  These fixes result from that testing.

  It has all appeared in -next for a while with no known issues.

   - Fix duplicate overlapping device-dax instances for HMAT described
     "Soft Reserved" Memory

   - Fix missing node targets in the sysfs representation of memory
     tiers

   - Remove a confusing variable initialization"

* tag 'dax-fixes-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  device-dax: Fix duplicate 'hmem' device registration
  ACPI: HMAT: Fix initiator registration for single-initiator systems
  ACPI: HMAT: remove unnecessary variable initialization
2022-12-03 13:43:38 -08:00
Linus Torvalds
97ee9d1c16 block-6.1-2022-12-02
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmOKM1MQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgprErD/4vyIhYg4ZM9HOWNjpuT8oZCG6yRZ4gLhz0
 GT7VRcb8GKEkKUMmeazaxocWbC3fc+yvj49Oan1Uj7/teHTmJDM0pF/fMpJdkJrF
 z+PAy2++MGF++QNBq+wrDEIDsJ4QvRxDDJe9N+KDTtX6UsoBFYxJhem4JzZpM4BI
 4GY8jYiKlx42WM58stZ0DXOucG1DsKaOQKYRQGjtKYvA0dTn7dj9btY+n6rGerEX
 4265huzW5iY+MZWc5KLXGSr0wIJqAiKMoecN03JSBHONFVB4cjMQpZuQfSChqkUS
 3fhVmFOZnYMzMIZgiwhFxuIP/QzLjctdibwU9JusqChYP9Mx7HQ2+gs7H7i5PSdS
 9m64g2u+GuRjbgIeeGPVMPnBR3UG2GE8BDRfFBBCtbdmHXIKoolXdKvG9enRjXit
 e4wjGQDHk6x9iV6LITH1Jn82kzk6TTuBkdSBJN6u8KASeOCoPwWuhgyRXo6+jh5D
 1wd2mYxtM1UB2mZilPpflDSpzZCrp/CMjbLVPIV0aTxmmeEJN+Ao2PnduNjEBxoh
 kYwlScoz9DPvMf59UU45MLc9/vYchL14VoPOl59osLlQrWf9vPMATlU1CaRgQSVa
 apBNAMzWFTMGxXCtIsUoClNX7uuHrqrMEjBbhWuWp4DSOVQoJORrU5ymX9M92MYP
 f0incJSEZQ==
 =Gdkx
 -----END PGP SIGNATURE-----

Merge tag 'block-6.1-2022-12-02' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:
 "Just a small NVMe merge for this week, fixing protection of the name
  space list, and a missing clear of a reserved field when unused"

* tag 'block-6.1-2022-12-02' of git://git.kernel.dk/linux:
  nvme: fix SRCU protection of nvme_ns_head list
  nvme-pci: clear the prp2 field when not used
2022-12-02 16:27:15 -08:00
Linus Torvalds
63050a5ca1 Pin control fixes for the v6.1 series:
- Fix a potential divide by zero in pinctrl-singe (OMAP
   and HiSilicon)
 
 - Disable IRQs on startup in the Mediatek driver. This
   is a classic, we should be looking out for this more.
 
 - Save and restore pins in "direct IRQ" mode in the Intel
   driver, this works around firmware bugs.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAmOKZB4ACgkQQRCzN7AZ
 XXMsnw/+PSijwUPsikfZYtgp2BTxMb/xr0XKmggqheNzEs1e0xYHBTJO9nLuUCxG
 v0zfpTMgwhx9/988Y39/fRih/DEKB5s/CMV7Ic5DQCmPo9NC5apeWvW80aQPbEun
 jwEEmjIUHHx9nX9z4B9CSICZA7XUiTb9vbHIG2KJCX7L3atzkOMmTYNi62qLQ6CU
 fo6JYZm1V3zdqLX3dD8HlDdVfzyywvG9MAhFlRgxPk/s2E8BMQdRL93rejPYKvWh
 fFH6aQrJMgEymzgRq+vfI62XRKK0ebE6A4084BMHSxflh+LNpjFwZfaNTotaqPHY
 uVVmPOGH2wjLHRFit0mp+6xWL9sGjggawJ4Y56gYpsUnNN+aKhkpjdvm9UFscnql
 6MZFx6hKbx91czhSD0M5nSWTR7AQwP3YLgOPZnGS0bt7WvuX306eh1CxYcbHlBFq
 KM4u7B36Q89b0Ac2+CjyXo4rUdXyeMRY6kDFuVaqVGyU1SEIWaqP9wwGDDY5ZXWx
 Kqc+mP5Zr6TzUbx4Amry/EswynT5zeqr6N8DFWcDZW2VJwiDqPs7g3ZIVxqpv719
 OOFzwNGtCkrjYs2SH9o697gC5xPofw2OgIFUeYMFNoCNjmjhegym6qrVAT45IOV1
 SYYoRKEFFdof9DbhJrWUmOBkMqtuPhycZClbHpRHoI/309Cq7eo=
 =bP9I
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-v6.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
 "Three driver fixes. The Intel fix looks like the most important.

   - Fix a potential divide by zero in pinctrl-singe (OMAP and
     HiSilicon)

   - Disable IRQs on startup in the Mediatek driver. This is a classic,
     we should be looking out for this more.

   - Save and restore pins in 'direct IRQ' mode in the Intel driver,
     this works around firmware bugs"

* tag 'pinctrl-v6.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: intel: Save and restore pins in "direct IRQ" mode
  pinctrl: meditatek: Startup with the IRQs disabled
  pinctrl: single: Fix potential division by zero
2022-12-02 16:22:17 -08:00
Linus Torvalds
0e15c3c75a RISC-V Fixes for 6.1-rc8
* A build-time fix for the NR_CPUS Kconfig SBI version dependency.
 * A pair of fixes to early memory initialization, to fix page
   permissions in EFI and post-initmem-free.
 * A build-time fix for the VDSO, to avoid trying to profile the VDSO
   functions.
 * A pair of fixes for kexec crash handling, to fix multi-core and
   interrupt related initialization inside the crash kernel.
 * A fix to avoid a race condition when handling multiple concurrect
   kernel stack overflows.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmOKLK8THHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYidb0EACeA33Y1YMY/0W679rJgPgeas2xLfRU
 RIpSFZk1ylBQi8upQi6XWjh8xb/kd9LVglORRazaCcVYzmRNWOtcLfpIfYccqFhv
 7aszAXbajHtXJHN8GK0XKf2S4PazchtQ6tTsmDT12VwnWDn8pEOdW3BOBEvh6DPX
 Mn+tMZeCmcI9jzaR7OwaZYyZmc4u16MTsh9stCfnmcU9tS9oq1JTPY1UHUqGzeiC
 W8zzHyREHoKO5fU4JZYQYDoXtuqqfjiBXWVxIogQduBzMwyXKP6RR1+qMtDvLc8k
 OhThrde1NCIFD6se6IQlvjMdUaroMZf0gprhahbcjABdtvsPYwAG0TBLMNaHYUZT
 Pl+np/xmFocTOPcMQ1A57qlPUfeAsR55eE0bEjxLiy5H7ygnEu3D2st+uBtiO69v
 d6gie9qmrEF230dHJ7qJnbMtrJcAL/u671ylmRS8iwFZlbOE+Ra2aqsBgf+9ri56
 syZY8ovnPUl72ZNZtLiBxnDSIegMfLr7As1vFlAXT+ZntDRKR1ZGkXDvSk9apOMd
 oxIiIOPTQHQQKlzH8oZEIDTnuL7T6+6CtwvlF74keSF+y4YMQJTmDTIARJ7z5rab
 aiR+pU4HdvF6Koujv4imlO/9Ahwk9G/vCQ9zyz/AGG21kic4gACvA45Z5AjKoakP
 PgIh0Uintun+yw==
 =u62e
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - build fix for the NR_CPUS Kconfig SBI version dependency

 - fixes to early memory initialization, to fix page permissions in EFI
   and post-initmem-free

 - build fix for the VDSO, to avoid trying to profile the VDSO functions

 - fixes for kexec crash handling, to fix multi-core and interrupt
   related initialization inside the crash kernel

 - fix for a race condition when handling multiple concurrect kernel
   stack overflows

* tag 'riscv-for-linus-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: kexec: Fixup crash_smp_send_stop without multi cores
  riscv: kexec: Fixup irq controller broken in kexec crash path
  riscv: mm: Proper page permissions after initmem free
  riscv: vdso: fix section overlapping under some conditions
  riscv: fix race when vmap stack overflow
  riscv: Sync efi page table's kernel mappings before switching
  riscv: Fix NR_CPUS range conditions
2022-12-02 16:04:53 -08:00
Linus Torvalds
2df2adc3e6 MMC core:
- Fix ambiguous TRIM and DISCARD args
  - Fix removal of debugfs file for mmc_test
 
 MMC host:
  - mtk-sd: Add missing clk_disable_unprepare() in an error path
  - sdhci: Fix I/O voltage switch delay for UHS-I SD cards
  - sdhci-esdhc-imx: Fix CQHCI exit halt state check
  - sdhci-sprd: Fix voltage switch
 -----BEGIN PGP SIGNATURE-----
 
 iQJLBAABCgA1FiEEugLDXPmKSktSkQsV/iaEJXNYjCkFAmOJt2wXHHVsZi5oYW5z
 c29uQGxpbmFyby5vcmcACgkQ/iaEJXNYjCnIfhAAzDpsdF1zBYQDHelN6DsqMX4c
 vHmBO8P9DE9xfhmdt3bnCa+26WIzmXGJ/8/jZLkV9ZGYLeAjkj6sYPQ2Zgvndecd
 f+9l4sGBiL1b26ON2wQqnrsZcEedtDh3xYdAtuHyEwqb4hRs+ryl9vMGvwdfE685
 T0Y+rvIxsT9m+X0kQfJzc7hedJ+K7wytkY5MmQhh5bMzhm7+6BhQJf/ABG2CTdUm
 Wilx9VJIxeVfORg1jEgQ+ssR0K9RmbuzAb3690xUYKobAK034JbSCvhodXIzVMYU
 g4iJ2m5rZrvdYKweuO98AAoRQ4DzNo2scGjmF9V2ImBrIbkIc2Mq2wms3PhNoYCu
 Rvzoa6fkoOR8acSo4dU3433xeZfdOIX9h0o5sBI+esERfdST1FwQ5FpF4SAiAr3u
 wXo/KZV/PfSZUAPHzbKCvIiEd330MJD6z18ORUYviqAcQNjqEhYyeARrzKxbkJA7
 zOn3yirLR6yGm5cZ1YS0+A8wj4GBcf7XwkSJs2ospQqeTCpqZwceOxogs03myey9
 Igx7IGT/PRHbMWFli584iERL+L6LbHUtZguJGabr/xh7YHt/vbOniH9BiG6AexXZ
 UzOjDaddzVJeAmvARQMowV7WssxvdVg8jnO19T4v93At0LKmTwUC015AQigaAvDP
 PZ3jUC/QBty5d7N3GBg=
 =7SFO
 -----END PGP SIGNATURE-----

Merge tag 'mmc-v6.1-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
 "MMC core:
   - Fix ambiguous TRIM and DISCARD args
   - Fix removal of debugfs file for mmc_test

  MMC host:
   - mtk-sd: Add missing clk_disable_unprepare() in an error path
   - sdhci: Fix I/O voltage switch delay for UHS-I SD cards
   - sdhci-esdhc-imx: Fix CQHCI exit halt state check
   - sdhci-sprd: Fix voltage switch"

* tag 'mmc-v6.1-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci-sprd: Fix no reset data and command after voltage switch
  mmc: sdhci: Fix voltage switch delay
  mmc: mtk-sd: Fix missing clk_disable_unprepare in msdc_of_clock_parse()
  mmc: mmc_test: Fix removal of debugfs file
  mmc: sdhci-esdhc-imx: correct CQHCI exit halt state check
  mmc: core: Fix ambiguous TRIM and DISCARD arg
2022-12-02 15:58:07 -08:00
Linus Torvalds
f66f62f83d IOMMU Fixes for Linux v6.1-rc7
Including:
 
 	- Intel VT-d fixes:
 	  - IO/TLB flush fix
 	  - Various pci_dev refcount fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmOKBQkACgkQK/BELZcB
 GuOzPhAAtJfmbfbvGLjCz/e9DgLl1sdfFrR1FwzwpXmQ3iV1isBy8AG/PX2+uBMs
 Qcge0BkzX0X/8I/lbnJYHbix3Z0cjDuYl4kPyYP8V5+tqSuJRnAODw+GJK17ntn8
 EfsDG4fZzEIUAgPE8PP4qXZwXI2pLfF6A4CT0ztB46976fpzcLAUicG0H2Opy9vQ
 DmDNOsg3R0yBB/1XaN0QSavfnoLKmaB37aHv0GeN4l5aue6tgWzxKUxBKSWnA7nF
 ZS+3XFe0tAhmxPH3JGmHqloxQrR52zqq9vMsbn0PTND6UKCN/pEo+3TkJQ9FLxvm
 qQi1lrAf9zRoIcsodXVAvgWbEgbR5LWxAffSwz+oJBv9MwMA8pfCG95HGBVX90fD
 WY01XcsnHmo1BqOHg5P9lSC979xGdltL71IjbKi1r31njZ2VByfDNcsa9OSBCD0L
 9Y8JJ0vW8ipbpDEDoxZUuElY+UkKUyJFurNVPxpCiKQhIdWdTPUurnvCBQgi5uas
 zVtI6OP/I7MIZbc00C4Y7KfsLm0MqlVOYzhvG+8vGW9GLUTVtWF0MkP6sfUEiQmS
 OqsxqTiLjbGfIOBvhZxyVZ7sCVY2d776KS6d9LlYINmRn8UAzIQC01szyr+Jx5m4
 jqs/ujTVIr2UiZ2QSrdQ2wNsrab/4vUrAN/O+uoJ5eV537ryKl8=
 =rgA1
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fixes-v6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:
 "Intel VT-d fixes:

   - IO/TLB flush fix

   - Various pci_dev refcount fixes"

* tag 'iommu-fixes-v6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Fix PCI device refcount leak in dmar_dev_scope_init()
  iommu/vt-d: Fix PCI device refcount leak in has_external_pci()
  iommu/vt-d: Fix PCI device refcount leak in prq_event_thread()
  iommu/vt-d: Add a fix for devices need extra dtlb flush
2022-12-02 15:54:12 -08:00
Pawan Gupta
6606515742 x86/bugs: Make sure MSR_SPEC_CTRL is updated properly upon resume from S3
The "force" argument to write_spec_ctrl_current() is currently ambiguous
as it does not guarantee the MSR write. This is due to the optimization
that writes to the MSR happen only when the new value differs from the
cached value.

This is fine in most cases, but breaks for S3 resume when the cached MSR
value gets out of sync with the hardware MSR value due to S3 resetting
it.

When x86_spec_ctrl_current is same as x86_spec_ctrl_base, the MSR write
is skipped. Which results in SPEC_CTRL mitigations not getting restored.

Move the MSR write from write_spec_ctrl_current() to a new function that
unconditionally writes to the MSR. Update the callers accordingly and
rename functions.

  [ bp: Rework a bit. ]

Fixes: caa0ff24d5 ("x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value")
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/806d39b0bfec2fe8f50dc5446dff20f5bb24a959.1669821572.git.pawan.kumar.gupta@linux.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-02 15:45:33 -08:00
Zhang Xiaoxu
8c9a59939d Input: raydium_ts_i2c - fix memory leak in raydium_i2c_send()
There is a kmemleak when test the raydium_i2c_ts with bpf mock device:

  unreferenced object 0xffff88812d3675a0 (size 8):
    comm "python3", pid 349, jiffies 4294741067 (age 95.695s)
    hex dump (first 8 bytes):
      11 0e 10 c0 01 00 04 00                          ........
    backtrace:
      [<0000000068427125>] __kmalloc+0x46/0x1b0
      [<0000000090180f91>] raydium_i2c_send+0xd4/0x2bf [raydium_i2c_ts]
      [<000000006e631aee>] raydium_i2c_initialize.cold+0xbc/0x3e4 [raydium_i2c_ts]
      [<00000000dc6fcf38>] raydium_i2c_probe+0x3cd/0x6bc [raydium_i2c_ts]
      [<00000000a310de16>] i2c_device_probe+0x651/0x680
      [<00000000f5a96bf3>] really_probe+0x17c/0x3f0
      [<00000000096ba499>] __driver_probe_device+0xe3/0x170
      [<00000000c5acb4d9>] driver_probe_device+0x49/0x120
      [<00000000264fe082>] __device_attach_driver+0xf7/0x150
      [<00000000f919423c>] bus_for_each_drv+0x114/0x180
      [<00000000e067feca>] __device_attach+0x1e5/0x2d0
      [<0000000054301fc2>] bus_probe_device+0x126/0x140
      [<00000000aad93b22>] device_add+0x810/0x1130
      [<00000000c086a53f>] i2c_new_client_device+0x352/0x4e0
      [<000000003c2c248c>] of_i2c_register_device+0xf1/0x110
      [<00000000ffec4177>] of_i2c_notify+0x100/0x160
  unreferenced object 0xffff88812d3675c8 (size 8):
    comm "python3", pid 349, jiffies 4294741070 (age 95.692s)
    hex dump (first 8 bytes):
      22 00 36 2d 81 88 ff ff                          ".6-....
    backtrace:
      [<0000000068427125>] __kmalloc+0x46/0x1b0
      [<0000000090180f91>] raydium_i2c_send+0xd4/0x2bf [raydium_i2c_ts]
      [<000000001d5c9620>] raydium_i2c_initialize.cold+0x223/0x3e4 [raydium_i2c_ts]
      [<00000000dc6fcf38>] raydium_i2c_probe+0x3cd/0x6bc [raydium_i2c_ts]
      [<00000000a310de16>] i2c_device_probe+0x651/0x680
      [<00000000f5a96bf3>] really_probe+0x17c/0x3f0
      [<00000000096ba499>] __driver_probe_device+0xe3/0x170
      [<00000000c5acb4d9>] driver_probe_device+0x49/0x120
      [<00000000264fe082>] __device_attach_driver+0xf7/0x150
      [<00000000f919423c>] bus_for_each_drv+0x114/0x180
      [<00000000e067feca>] __device_attach+0x1e5/0x2d0
      [<0000000054301fc2>] bus_probe_device+0x126/0x140
      [<00000000aad93b22>] device_add+0x810/0x1130
      [<00000000c086a53f>] i2c_new_client_device+0x352/0x4e0
      [<000000003c2c248c>] of_i2c_register_device+0xf1/0x110
      [<00000000ffec4177>] of_i2c_notify+0x100/0x160

After BANK_SWITCH command from i2c BUS, no matter success or error
happened, the tx_buf should be freed.

Fixes: 3b384bd6c3 ("Input: raydium_ts_i2c - do not split tx transactions")
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Link: https://lore.kernel.org/r/20221202103412.2120169-1-zhangxiaoxu5@huawei.com
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-12-02 15:42:21 -08:00