One a zoned filesystem, never clear the dirty flag of an extent buffer,
but instead mark it as zeroout.
On writeout, when encountering a marked extent_buffer, zero it out.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
EXTENT_BUFFER_ZONED_ZEROOUT better describes the state of the extent buffer,
namely it is written as all zeros. This is needed in zoned mode, to
preserve I/O ordering.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The extent_io_tree is embedded in several structures, notably in struct
btrfs_inode. The fs_info is only used for reporting errors and for
reference in trace points. We can get to the pointer through the inode,
but not all io trees set it. However, we always know the owner and
can recognize if inode is valid. For access helpers are provided, const
variant for the trace points.
This reduces size of extent_io_tree by 8 bytes and following structures
in turn:
- btrfs_inode 1104 -> 1088
- btrfs_device 520 -> 512
- btrfs_root 1360 -> 1344
- btrfs_transaction 456 -> 440
- btrfs_fs_info 3600 -> 3592
- reloc_control 1520 -> 1512
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pass the type of the extent io tree operation which failed in the report
helper. The message wording and contents is updated, though locking
might be the cause of the error it's probably not the only one and we're
interested in the state.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The printk helpers take const fs_info if it's used just for the
identifier in the messages, __btrfs_panic() lacks that.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The helper insert_state errors are handled in all callers and reported
by extent_io_tree_panic so we don't need to do it twice.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The per-inode file extent tree was added in 41a2ee75aa ("btrfs:
introduce per-inode file extent tree"), it's the only tree type
that requires the lockdep class. Move it to the file where it is
actually used.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It's not needed to have a local variable to store the stripe size at
insert_dev_extents(), we can just take from the chunk map as it's only
used once and typing 'map->stripe_size' is not much more verbose than
simply typing 'stripe_size'. So remove the local variable.
This was added before the recent addition of a dedicated structure for
chunk mappings because the stripe size was encoded in the 'orig_block_len'
field of an extent_map structure, so the use of the local variable made
things more readable.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently we abuse the extent_map structure for two purposes:
1) To actually represent extents for inodes;
2) To represent chunk mappings.
This is odd and has several disadvantages:
1) To create a chunk map, we need to do two memory allocations: one for
an extent_map structure and another one for a map_lookup structure, so
more potential for an allocation failure and more complicated code to
manage and link two structures;
2) For a chunk map we actually only use 3 fields (24 bytes) of the
respective extent map structure: the 'start' field to have the logical
start address of the chunk, the 'len' field to have the chunk's size,
and the 'orig_block_len' field to contain the chunk's stripe size.
Besides wasting a memory, it's also odd and not intuitive at all to
have the stripe size in a field named 'orig_block_len'.
We are also using 'block_len' of the extent_map structure to contain
the chunk size, so we have 2 fields for the same value, 'len' and
'block_len', which is pointless;
3) When an extent map is associated to a chunk mapping, we set the bit
EXTENT_FLAG_FS_MAPPING on its flags and then make its member named
'map_lookup' point to the associated map_lookup structure. This means
that for an extent map associated to an inode extent, we are not using
this 'map_lookup' pointer, so wasting 8 bytes (on a 64 bits platform);
4) Extent maps associated to a chunk mapping are never merged or split so
it's pointless to use the existing extent map infrastructure.
So add a dedicated data structure named 'btrfs_chunk_map' to represent
chunk mappings, this is basically the existing map_lookup structure with
some extra fields:
1) 'start' to contain the chunk logical address;
2) 'chunk_len' to contain the chunk's length;
3) 'stripe_size' for the stripe size;
4) 'rb_node' for insertion into a rb tree;
5) 'refs' for reference counting.
This way we do a single memory allocation for chunk mappings and we don't
waste memory for them with unused/unnecessary fields from an extent_map.
We also save 8 bytes from the extent_map structure by removing the
'map_lookup' pointer, so the size of struct extent_map is reduced from
144 bytes down to 136 bytes, and we can now have 30 extents map per 4K
page instead of 28.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There's no reason to open code what btrfs_next_item() does when searching
for extent items at scrub.c:scrub.c:find_first_extent_item(), so remove
the logic to find the next item and use btrfs_next_item() instead, making
the code shorter and less nested code blocks. While at it also fix the
comment to the plural "items" instead of "item" and end it with proper
punctuation.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The helper extent_map_block_end() is currently not used anywhere outside
extent_map.c, so move into from extent_map.h into extent_map.c. While at
it, also make the extent map pointer argument as const.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When starting a transaction to remove a block group we have one ASSERT
that checks we found an extent map and that the extent map's start offset
matches the desired chunk offset. In case one of the conditions fails, we
get a stack trace that point to the respective line of code, however we
can't tell which condition failed: either there's no extent map or we got
one with an unexpected start offset. To make such an issue easier to debug
and analyse, split the assertion into two, one for each condition. This
was actually triggered during development of another upcoming change.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When getting a chunk map, at btrfs_get_chunk_map(), we do some sanity
checks to verify that we found an extent map and that it includes the
requested logical address. These are never expected to fail, so mark
them as unlikely to make it more clear as well as to allow a compiler
to generate more efficient code.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Looks like the struct member was added in 2007 in 2.6.29 in commit
87ee04eb0f ("Btrfs: Add simple stripe size parameter") but hasn't been
used at all since. So let's remove it. This was found by tool
https://github.com/jirislaby/clang-struct, then build tested after
removing the struct member.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The declaration was temporarily moved in a4055213bf ("btrfs: unexport
all the temporary exports for extent-io-tree.c") and then should have
been removed in 6.0 in 071d19f513 ("btrfs: remove struct tree_entry in
extent-io-tree.c") but was not. This was found by tool
https://github.com/jirislaby/clang-struct .
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The raid56 changes in 6.2 reworked the IO path to RMW, commit
93723095b5 ("btrfs: raid56: switch write path to rmw_rbio()") in
particular removed the last use of the work member so it can be removed
as well. This was found by tool https://github.com/jirislaby/clang-struct .
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The whole isize code was deleted in 5.6 3f1c64ce04 ("btrfs: delete the
ordered isize update code"), except the struct member. This was found
by tool https://github.com/jirislaby/clang-struct .
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The recent scrub rewrite forgot to remove the sectors_per_bio in
6.3 in 13a62fd997 ("btrfs: scrub: remove scrub_bio structure").
This was found by tool https://github.com/jirislaby/clang-struct .
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As a cleanup and preparation for future folio migration, this patch
would replace all page->private to folio version. This includes:
- PagePrivate()
-> folio_test_private()
- page->private
-> folio_get_private()
- attach_page_private()
-> folio_attach_private()
- detach_page_private()
-> folio_detach_private()
Since we're here, also remove the forced cast on page->private, since
it's (void *) already, we don't really need to do the cast.
For now even if we missed some call sites, it won't cause any problem
yet, as we're only using order 0 folio (single page), thus all those
folio/page flags should be synced.
But for the future conversion to utilize higher order folio, the page
<-> folio flag sync is no longer guaranteed, thus we have to migrate to
utilize folio flags.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The pages are now allocated and freed centrally, so we can extend the
logic to manage the lifetime. The main idea is to keep a few recently
used pages and hand them to all writers. Ideally we won't have to go to
allocator at all (a slight performance gain) and also raise chance that
we'll have the pages available (slightly increased reliability).
In order to avoid gathering too many pages, the shrinker is attached to
the cache so we can free them on when MM demands that. The first
implementation will drain the whole cache. Further this can be refined
to keep some minimal number of pages for emergency purposes. The
ultimate goal to avoid memory allocation failures on the write out path
from the compression.
The pool threshold is set to cover full BTRFS_MAX_COMPRESSED / PAGE_SIZE
for minimal thread pool, which is 8 (btrfs_init_fs_info()). This is 128K
/ 4K * 8 = 256 pages at maximum, which is 1MiB.
This is for all filesystems currently mounted, with heavy use of
compression IO the allocator is still needed. The cache helps for short
burst IO.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is a preparation for managing compression pages in a cache-like
manner, instead of asking the allocator each time. The common allocation
and free wrappers are introduced and are functionally equivalent to the
current code.
The freeing helpers need to be carefully placed where the last reference
is dropped. This is either after directly allocating (error handling)
or when there are no other users of the pages (after copying the contents).
It's safe to not use the helper and use put_page() that will handle the
reference count. Not using the helper means there's lower number of
pages that could be reused without passing them back to allocator.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[PROBLEM]
The function __btrfs_update_delayed_inode() is doing something not
meeting the code standard of today:
path->slots[0]++
if (path->slots[0] >= btrfs_header_nritems(leaf))
goto search;
again:
if (!is_the_target_inode_ref())
goto out;
ret = btrfs_delete_item();
/* Some cleanup. */
return ret;
search:
ret = search_for_the_last_inode_ref();
goto again;
With the tag named "again", it's pretty common to think it's a loop, but
the truth is, we only need to do the search once, to locate the last
(also the first, since there should only be one INODE_REF or
INODE_EXTREF now) ref of the inode.
[FIX]
Instead of the weird jumps, just do them in a stream-lined fashion.
This removes those weird labels, and add extra comments on why we can do
the different searches.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The logic in btrfs_block_can_be_shared() is hard to follow as we have a
lot of conditions in a single if statement including a subexpression with
a logical or and two nested if statements inside the main if statement.
Make this easier to read by using separate if statements that return
immediately when we find a condition that determines if a block can be
or can not be shared.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently btrfs_block_can_be_shared() returns an int that is used as a
boolean. Since it all it needs is to return true or false, and it can't
return errors for example, change the return type from int to bool to
make it a bit more readable and obvious.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The logged_list[2] and log_extents_lock[2] members of struct btrfs_root
are no longer used, their last use was removed in commit 5636cf7d6d
("btrfs: remove the logged extents infrastructure"). So remove these
fields. This reduces the size of struct btrfs_root, on a release kernel,
from 1392 bytes down to 1352 bytes.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The prototype for btrfs_clear_buffer_dirty() is declared in both disk-io.h
and extent_io.h, but the function is defined at extent_io.c. So remove the
prototype declaration from disk-io.h.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmV7q7QACgkQiiy9cAdy
T1G9EQv/fpdrMMDcivh3h8vzZTxR9kIDa971C/wEPgQb4CNtRp2LTfybg/OOeyPD
qtdRVXyUs3fA/1/tCxfdo2Jan1E4iEFOkzGXv+EmolCpQ5Ye3tEsAwF6s5eP9pUc
wR5/swzNFdVfW5BwoES7/RonMezc43OXWZY0Y/9NiaPZKV7i8NTz2ZlfDMjPkplL
Pxlmiht62L11O3Ui4h8udVGaLagfbmbPt4MLfpuMupDFg071XA8Sz8AF0Wfqh2zu
WxkTCGHD6Oj8GPp1gJcVUkLgugvSzeSmarTOgygZVF5/fIeFJKB8VrfqCxDZcxhe
e4E4QEv6tfetutwuCFJejTHeNgrzvMOoR+tuw5/oci/W8msq0l91varSXf0TwUBc
7ZSnFIw92Oa4pG0zYV9SbTAxEwuoMbrUAXDvraT9AccBYFBZm66TVooR2rnTwRwc
art398CiTdRcllP9g4ZI4ogxzkHHsVJnQ5w0h/R6/7Y1qLEqRcps84LwmSMYaK4y
5jad3mh9
=i6Gk
-----END PGP SIGNATURE-----
Merge tag '6.7-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
"Address OOBs and NULL dereference found by Dr. Morris's recent
analysis and fuzzing.
All marked for stable as well"
* tag '6.7-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix OOB in smb2_query_reparse_point()
smb: client: fix NULL deref in asn1_ber_decoder()
smb: client: fix potential OOBs in smb2_parse_contexts()
smb: client: fix OOB in receive_encrypted_standard()
- tcp: fix tcp_disordered_ack() vs usec TS resolution
Current release - new code bugs:
- dpll: sanitize possible null pointer dereference in dpll_pin_parent_pin_set()
- eth: octeon_ep: initialise control mbox tasks before using APIs
Previous releases - regressions:
- io_uring/af_unix: disable sending io_uring over sockets
- eth: mlx5e:
- TC, don't offload post action rule if not supported
- fix possible deadlock on mlx5e_tx_timeout_work
- eth: iavf: fix iavf_shutdown to call iavf_remove instead iavf_close
- eth: bnxt_en: fix skb recycling logic in bnxt_deliver_skb()
- eth: ena: fix DMA syncing in XDP path when SWIOTLB is on
- eth: team: fix use-after-free when an option instance allocation fails
Previous releases - always broken:
- neighbour: don't let neigh_forced_gc() disable preemption for long
- net: prevent mss overflow in skb_segment()
- ipv6: support reporting otherwise unknown prefix flags in RTM_NEWPREFIX
- tcp: remove acked SYN flag from packet in the transmit queue correctly
- eth: octeontx2-af:
- fix a use-after-free in rvu_nix_register_reporters
- fix promisc mcam entry action
- eth: dwmac-loongson: make sure MDIO is initialized before use
- eth: atlantic: fix double free in ring reinit logic
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmV6/E4SHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkas8P/if7c+MUxkegwRbO0vOObG/B/QXJ+dR8
UcqPYnroF0u7s2KhDqbj/h9msbNhAmWzrhzk4c086hpIkq34piiS+W319K/tia6u
H1fRbVfBAo/mcQ8eG7EPiDYrNKDhuiGL6Gsd/Fdl9om1CMjW4fAFWY1F79OoL7F5
mDTiVdnHik06CGgic6zRdp4xy6zHZ5oBanS60VNjLa4sb69g1Z1fjLQoJt4qXYbJ
jWZ9QkJ1t/98MOca6mFIZNJY+f3doYMRv5dP1oUSJmbFGfCYjbMcdpa3BQlTiDdu
96xWF01p5uJ2UBib0nKiGSZmg1Xz1xal9V+ahApmTe8BpZAn6PJeXYbtMQO2SXYf
VW3V7rSkCB482UPN3siubhtZnOE5oYixM/5OL/UGZv113ShF8HNjj4AAZOeXtJPc
75QeQOSRy+vhopEexCZ+21Zou+Ao3MjEFlVMCfTJ7couvjFg9LNkazHTXfAkwe0J
QaLYpbbaXwS3lOspwWFK2rV/G+3fpJZBrW2WRwlLBMMg3lXLuo2OdqrewV9GoI36
ksqv2c5mMtLwomdM2QfK0zeUc6kDeqlpEcjMzfapn/92A+pcAmcBpT2FfFDR4QUz
nhoULC2XvTdlri7nxxp/9AYbQK0DFXqChPPV3NdcN/HPI7fYFHTv387ZkLU5zDlN
nwnXj8rbA0d5
=84lK
-----END PGP SIGNATURE-----
Merge tag 'net-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Current release - regressions:
- tcp: fix tcp_disordered_ack() vs usec TS resolution
Current release - new code bugs:
- dpll: sanitize possible null pointer dereference in
dpll_pin_parent_pin_set()
- eth: octeon_ep: initialise control mbox tasks before using APIs
Previous releases - regressions:
- io_uring/af_unix: disable sending io_uring over sockets
- eth: mlx5e:
- TC, don't offload post action rule if not supported
- fix possible deadlock on mlx5e_tx_timeout_work
- eth: iavf: fix iavf_shutdown to call iavf_remove instead iavf_close
- eth: bnxt_en: fix skb recycling logic in bnxt_deliver_skb()
- eth: ena: fix DMA syncing in XDP path when SWIOTLB is on
- eth: team: fix use-after-free when an option instance allocation
fails
Previous releases - always broken:
- neighbour: don't let neigh_forced_gc() disable preemption for long
- net: prevent mss overflow in skb_segment()
- ipv6: support reporting otherwise unknown prefix flags in
RTM_NEWPREFIX
- tcp: remove acked SYN flag from packet in the transmit queue
correctly
- eth: octeontx2-af:
- fix a use-after-free in rvu_nix_register_reporters
- fix promisc mcam entry action
- eth: dwmac-loongson: make sure MDIO is initialized before use
- eth: atlantic: fix double free in ring reinit logic"
* tag 'net-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
net: atlantic: fix double free in ring reinit logic
appletalk: Fix Use-After-Free in atalk_ioctl
net: stmmac: Handle disabled MDIO busses from devicetree
net: stmmac: dwmac-qcom-ethqos: Fix drops in 10M SGMII RX
dpaa2-switch: do not ask for MDB, VLAN and FDB replay
dpaa2-switch: fix size of the dma_unmap
net: prevent mss overflow in skb_segment()
vsock/virtio: Fix unsigned integer wrap around in virtio_transport_has_space()
Revert "tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is set"
MIPS: dts: loongson: drop incorrect dwmac fallback compatible
stmmac: dwmac-loongson: drop useless check for compatible fallback
stmmac: dwmac-loongson: Make sure MDIO is initialized before use
tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is set
dpll: sanitize possible null pointer dereference in dpll_pin_parent_pin_set()
net: ena: Fix XDP redirection error
net: ena: Fix DMA syncing in XDP path when SWIOTLB is on
net: ena: Fix xdp drops handling due to multibuf packets
net: ena: Destroy correct number of xdp queues upon failure
net: Remove acked SYN flag from packet in the transmit queue correctly
qed: Fix a potential use-after-free in qed_cxt_tables_alloc
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmV5rTIACgkQxWXV+ddt
WDuLUg/+Ix/CeA+JY6VZMA2kBHMzmRexSjYONWfQwIL7LPBy4sOuSEaTZt+QQMs+
AEKau1YfTgo7e9S2DlbZhIWp6P87VFui7Q1E99uJEmKelakvf94DbMrufPTTKjaD
JG2KB6LsD59yWwfbGHEAVVNGSMRk2LDXzcUWMK6/uzu/7Bcr4ataOymWd86/blUV
cw5g87uAHpBn+R1ARTf1CkqyYiI9UldNUJmW1q7dwxOyYG+weUtJImosw2Uda76y
wQXAFQAH3vsFzTC+qjC9Vz7cnyAX9qAw48ODRH7rIT1BQ3yAFQbfXE20jJ/fSE+C
lz3p05tA9373KAOtLUHmANBwe3NafCnlut6ZYRfpTcEzUslAO5PnajPaHh5Al7uC
Iwdpy49byoyVFeNf0yECBsuDP8s86HlUALF8mdJabPI1Kl66MUea6KgS1oyO3pCB
hfqLbpofV4JTywtIRLGQTQvzSwkjPHTbSwtZ9nftTw520a5f7memDu5vi4XzFd+B
NrJxmz2DrMRlwrLgWg9OXXgx1riWPvHnIoqzjG5W6A9N74Ud1/oz7t3VzjGSQ5S2
UikRB6iofPE0deD8IF6H6DvFfvQxU9d9BJ6IS9V2zRt5vdgJ2w08FlqbLZewSY4x
iaQ+L7UYKDjC9hdosXVNu/6fAspyBVdSp2NbKk14fraZtNAoPNs=
=uF/Q
-----END PGP SIGNATURE-----
Merge tag 'for-6.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Some fixes to quota accounting code, mostly around error handling and
correctness:
- free reserves on various error paths, after IO errors or
transaction abort
- don't clear reserved range at the folio release time, it'll be
properly cleared after final write
- fix integer overflow due to int used when passing around size of
freed reservations
- fix a regression in squota accounting that missed some cases with
delayed refs"
* tag 'for-6.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: ensure releasing squota reserve on head refs
btrfs: don't clear qgroup reserved bit in release_folio
btrfs: free qgroup pertrans reserve on transaction abort
btrfs: fix qgroup_free_reserved_data int overflow
btrfs: free qgroup reserve when ORDERED_IOERR is set
Driver has a logic leak in ring data allocation/free,
where double free may happen in aq_ring_free if system is under
stress and driver init/deinit is happening.
The probability is higher to get this during suspend/resume cycle.
Verification was done simulating same conditions with
stress -m 2000 --vm-bytes 20M --vm-hang 10 --backoff 1000
while true; do sudo ifconfig enp1s0 down; sudo ifconfig enp1s0 up; done
Fixed by explicitly clearing pointers to NULL on deallocation
Fixes: 018423e90b ("net: ethernet: aquantia: Add ring support code")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Closes: https://lore.kernel.org/netdev/CAHk-=wiZZi7FcvqVSUirHBjx0bBUZ4dFrMDVLc3+3HCrtq0rBA@mail.gmail.com/
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Link: https://lore.kernel.org/r/20231213094044.22988-1-irusskikh@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Because atalk_ioctl() accesses sk->sk_receive_queue
without holding a sk->sk_receive_queue.lock, it can
cause a race with atalk_recvmsg().
A use-after-free for skb occurs with the following flow.
```
atalk_ioctl() -> skb_peek()
atalk_recvmsg() -> skb_recv_datagram() -> skb_free_datagram()
```
Add sk->sk_receive_queue.lock to atalk_ioctl() to fix this issue.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Link: https://lore.kernel.org/r/20231213041056.GA519680@v4bel-B760M-AORUS-ELITE-AX
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Many hardware configurations have the MDIO bus disabled, and are instead
using some other MDIO bus to talk to the MAC's phy.
of_mdiobus_register() returns -ENODEV in this case. Let's handle it
gracefully instead of failing to probe the MAC.
Fixes: 47dd7a540b ("net: add support for STMicroelectronics Ethernet controllers.")
Signed-off-by: Andrew Halaney <ahalaney@redhat.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Link: https://lore.kernel.org/r/20231212-b4-stmmac-handle-mdio-enodev-v2-1-600171acf79f@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In 10M SGMII mode all the packets are being dropped due to wrong Rx clock.
SGMII 10MBPS mode needs RX clock divider programmed to avoid drops in Rx.
Update configure SGMII function with Rx clk divider programming.
Fixes: 463120c31c ("net: stmmac: dwmac-qcom-ethqos: add support for SGMII")
Tested-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Sneh Shah <quic_snehshah@quicinc.com>
Reviewed-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Link: https://lore.kernel.org/r/20231212092208.22393-1-quic_snehshah@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-12-12 (iavf)
This series contains updates to iavf driver only.
Piotr reworks Flow Director states to deal with issues in restoring
filters.
Slawomir fixes shutdown processing as it was missing needed calls.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: Fix iavf_shutdown to call iavf_remove instead iavf_close
iavf: Handle ntuple on/off based on new state machines for flow director
iavf: Introduce new state machines for flow director
====================
Link: https://lore.kernel.org/r/20231212203613.513423-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ioana Ciornei says:
====================
dpaa2-switch: various fixes
The first patch fixes the size passed to two dma_unmap_single() calls
which was wrongly put as the size of the pointer.
The second patch is new to this series and reverts the behavior of the
dpaa2-switch driver to not ask for object replay upon offloading so that
we avoid the errors encountered when a VLAN is installed multiple times
on the same port.
====================
Link: https://lore.kernel.org/r/20231212164326.2753457-1-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Starting with commit 4e51bf44a0 ("net: bridge: move the switchdev
object replay helpers to "push" mode") the switchdev_bridge_port_offload()
helper was extended with the intention to provide switchdev drivers easy
access to object addition and deletion replays. This works by calling
the replay helpers with non-NULL notifier blocks.
In the same commit, the dpaa2-switch driver was updated so that it
passes valid notifier blocks to the helper. At that moment, no
regression was identified through testing.
In the meantime, the blamed commit changed the behavior in terms of
which ports get hit by the replay. Before this commit, only the initial
port which identified itself as offloaded through
switchdev_bridge_port_offload() got a replay of all port objects and
FDBs. After this, the newly joining port will trigger a replay of
objects on all bridge ports and on the bridge itself.
This behavior leads to errors in dpaa2_switch_port_vlans_add() when a
VLAN gets installed on the same interface multiple times.
The intended mechanism to address this is to pass a non-NULL ctx to the
switchdev_bridge_port_offload() helper and then check it against the
port's private structure. But since the driver does not have any use for
the replayed port objects and FDBs until it gains support for LAG
offload, it's better to fix the issue by reverting the dpaa2-switch
driver to not ask for replay. The pointers will be added back when we
are prepared to ignore replays on unrelated ports.
Fixes: b28d580e29 ("net: bridge: switchdev: replay all VLAN groups")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20231212164326.2753457-3-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The size of the DMA unmap was wrongly put as a sizeof of a pointer.
Change the value of the DMA unmap to be the actual macro used for the
allocation and the DMA map.
Fixes: 1110318d83 ("dpaa2-switch: add tc flower hardware offload on ingress traffic")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20231212164326.2753457-2-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We need to do signed arithmetic if we expect condition
`if (bytes < 0)` to be possible
Found by Linux Verification Center (linuxtesting.org) with SVACE
Fixes: 06a8fc7836 ("VSOCK: Introduce virtio_vsock_common.ko")
Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20231211162317.4116625-1-kniv@yandex-team.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are some wrong return values check in sign-file when call OpenSSL
API. The ERR() check cond is wrong because of the program only check the
return value is < 0 which ignored the return val is 0. For example:
1. CMS_final() return 1 for success or 0 for failure.
2. i2d_CMS_bio_stream() returns 1 for success or 0 for failure.
3. i2d_TYPEbio() return 1 for success and 0 for failure.
4. BIO_free() return 1 for success and 0 for failure.
Link: https://www.openssl.org/docs/manmaster/man3/
Fixes: e5a2e3c847 ("scripts/sign-file.c: Add support for signing with a raw signature")
Signed-off-by: Yusong Gao <a869920004@gmail.com>
Reviewed-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20231213024405.624692-1-a869920004@gmail.com/ # v5
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
conventions for filemap_lock_folio() are not the same as for
find_lock_page()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZXnZHAAKCRBZ7Krx/gZQ
6y3QAQCazzMsqWYmqfkbR5yGjolKBPS6ILFWBHWoFySs9/WptAEA3c/960nhFuh1
aQE9Qp5zUlbWmSZ5zjz3Q2lX8N/jugU=
=kyMm
-----END PGP SIGNATURE-----
Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull ufs fix from Al Viro:
"ufs got broken this merge window on folio conversion - calling
conventions for filemap_lock_folio() are not the same as for
find_lock_page()"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix ufs_get_locked_folio() breakage
- Deal with a regression in the recently refactored x86 EFI stub code on
older Dell systems by disabling randomization of the physical load
address
- Use the correct load address for relocatable Loongarch kernels
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCZXgvLAAKCRAwbglWLn0t
XLgKAP9oKLP7v0TD2BJOPGqr4kEtMfZYayV2EUN387VbPYfT0wEAoeDeZmaGUYce
BuovToERSgjj2FylAWNlZATEh2d35ww=
=kv9E
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent-for-v6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
- Deal with a regression in the recently refactored x86 EFI stub code
on older Dell systems by disabling randomization of the physical load
address
- Use the correct load address for relocatable Loongarch kernels
* tag 'efi-urgent-for-v6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi/x86: Avoid physical KASLR on older Dell systems
efi/loongarch: Use load address to calculate kernel entry address
filemap_lock_folio() returns ERR_PTR(-ENOENT) if the thing is not
in cache - not NULL like find_lock_page() used to.
Fixes: 5fb7bd50b3 "ufs: add ufs_get_locked_folio and ufs_put_locked_folio"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Yanteng Si says:
====================
stmmac: Some bug fixes
* Put Krzysztof's patch into my thread, pick Conor's Reviewed-by
tag and Jiaxun's Acked-by tag.(prev version is RFC patch)
* I fixed an Oops related to mdio, mainly to ensure that
mdio is initialized before use, because it will be used
in a series of patches I am working on.
see <https://lore.kernel.org/loongarch/cover.1699533745.git.siyanteng@loongson.cn/T/#t>
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Device binds to proper PCI ID (LOONGSON, 0x7a03), already listed in DTS,
so checking for some other compatible does not make sense. It cannot be
bound to unsupported platform.
Drop useless, incorrect (space in between) and undocumented compatible.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Device binds to proper PCI ID (LOONGSON, 0x7a03), already listed in DTS,
so checking for some other compatible does not make sense. It cannot be
bound to unsupported platform.
Drop useless, incorrect (space in between) and undocumented compatible.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Generic code will use mdio. If it is not initialized before use,
the kernel will Oops.
Fixes: 30bba69d7d ("stmmac: pci: Add dwmac support for Loongson")
Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on the tcp man page, if TCP_NODELAY is set, it disables Nagle's algorithm
and packets are sent as soon as possible. However in the `tcp_push` function
where autocorking is evaluated the `nonagle` value set by TCP_NODELAY is not
considered which can trigger unexpected corking of packets and induce delays.
For example, if two packets are generated as part of a server's reply, if the
first one is not transmitted on the wire quickly enough, the second packet can
trigger the autocorking in `tcp_push` and be delayed instead of sent as soon as
possible. It will either wait for additional packets to be coalesced or an ACK
from the client before transmitting the corked packet. This can interact badly
if the receiver has tcp delayed acks enabled, introducing 40ms extra delay in
completion times. It is not always possible to control who has delayed acks
set, but it is possible to adjust when and how autocorking is triggered.
Patch prevents autocorking if the TCP_NODELAY flag is set on the socket.
Patch has been tested using an AWS c7g.2xlarge instance with Ubuntu 22.04 and
Apache Tomcat 9.0.83 running the basic servlet below:
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
public class HelloWorldServlet extends HttpServlet {
@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
response.setContentType("text/html;charset=utf-8");
OutputStreamWriter osw = new OutputStreamWriter(response.getOutputStream(),"UTF-8");
String s = "a".repeat(3096);
osw.write(s,0,s.length());
osw.flush();
}
}
Load was applied using wrk2 (https://github.com/kinvolk/wrk2) from an AWS
c6i.8xlarge instance. With the current auto-corking behavior and TCP_NODELAY
set an additional 40ms latency from P99.99+ values are observed. With the
patch applied we see no occurrences of 40ms latencies. The patch has also been
tested with iperf and uperf benchmarks and no regression was observed.
# No patch with tcp_autocorking=1 and TCP_NODELAY set on all sockets
./wrk -t32 -c128 -d40s --latency -R10000 http://172.31.49.177:8080/hello/hello'
...
50.000% 0.91ms
75.000% 1.12ms
90.000% 1.46ms
99.000% 1.73ms
99.900% 1.96ms
99.990% 43.62ms <<< 40+ ms extra latency
99.999% 48.32ms
100.000% 49.34ms
# With patch
./wrk -t32 -c128 -d40s --latency -R10000 http://172.31.49.177:8080/hello/hello'
...
50.000% 0.89ms
75.000% 1.13ms
90.000% 1.44ms
99.000% 1.67ms
99.900% 1.78ms
99.990% 2.27ms <<< no 40+ ms extra latency
99.999% 3.71ms
100.000% 4.57ms
Fixes: f54b311142 ("tcp: auto corking")
Signed-off-by: Salvatore Dipietro <dipiets@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>