linux/fs/btrfs
Filipe Manana c67d970f0e Btrfs: fix memory leak due to concurrent append writes with fiemap
When we have a buffered write that starts at an offset greater than or
equals to the file's size happening concurrently with a full ranged
fiemap, we can end up leaking an extent state structure.

Suppose we have a file with a size of 1Mb, and before the buffered write
and fiemap are performed, it has a single extent state in its io tree
representing the range from 0 to 1Mb, with the EXTENT_DELALLOC bit set.

The following sequence diagram shows how the memory leak happens if a
fiemap a buffered write, starting at offset 1Mb and with a length of
4Kb, are performed concurrently.

          CPU 1                                                  CPU 2

  extent_fiemap()
    --> it's a full ranged fiemap
        range from 0 to LLONG_MAX - 1
        (9223372036854775807)

    --> locks range in the inode's
        io tree
      --> after this we have 2 extent
          states in the io tree:
          --> 1 for range [0, 1Mb[ with
              the bits EXTENT_LOCKED and
              EXTENT_DELALLOC_BITS set
          --> 1 for the range
              [1Mb, LLONG_MAX[ with
              the EXTENT_LOCKED bit set

                                                  --> start buffered write at offset
                                                      1Mb with a length of 4Kb

                                                  btrfs_file_write_iter()

                                                    btrfs_buffered_write()
                                                      --> cached_state is NULL

                                                      lock_and_cleanup_extent_if_need()
                                                        --> returns 0 and does not lock
                                                            range because it starts
                                                            at current i_size / eof

                                                      --> cached_state remains NULL

                                                      btrfs_dirty_pages()
                                                        btrfs_set_extent_delalloc()
                                                          (...)
                                                          __set_extent_bit()

                                                            --> splits extent state for range
                                                                [1Mb, LLONG_MAX[ and now we
                                                                have 2 extent states:

                                                                --> one for the range
                                                                    [1Mb, 1Mb + 4Kb[ with
                                                                    EXTENT_LOCKED set
                                                                --> another one for the range
                                                                    [1Mb + 4Kb, LLONG_MAX[ with
                                                                    EXTENT_LOCKED set as well

                                                            --> sets EXTENT_DELALLOC on the
                                                                extent state for the range
                                                                [1Mb, 1Mb + 4Kb[
                                                            --> caches extent state
                                                                [1Mb, 1Mb + 4Kb[ into
                                                                @cached_state because it has
                                                                the bit EXTENT_LOCKED set

                                                    --> btrfs_buffered_write() ends up
                                                        with a non-NULL cached_state and
                                                        never calls anything to release its
                                                        reference on it, resulting in a
                                                        memory leak

Fix this by calling free_extent_state() on cached_state if the range was
not locked by lock_and_cleanup_extent_if_need().

The same issue can happen if anything else other than fiemap locks a range
that covers eof and beyond.

This could be triggered, sporadically, by test case generic/561 from the
fstests suite, which makes duperemove run concurrently with fsstress, and
duperemove does plenty of calls to fiemap. When CONFIG_BTRFS_DEBUG is set
the leak is reported in dmesg/syslog when removing the btrfs module with
a message like the following:

  [77100.039461] BTRFS: state leak: start 6574080 end 6582271 state 16402 in tree 0 refs 1

Otherwise (CONFIG_BTRFS_DEBUG not set) detectable with kmemleak.

CC: stable@vger.kernel.org # 4.16+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-10-01 18:40:58 +02:00
..
tests Btrfs: fix selftests failure due to uninitialized i_mode in test inodes 2019-09-24 14:45:02 +02:00
acl.c btrfs: cleanup btrfs_setxattr_trans and drop transaction parameter 2019-04-29 19:02:44 +02:00
async-thread.c btrfs: async-thread: convert defines to enums 2019-09-09 14:59:03 +02:00
async-thread.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
backref.c Btrfs: fix deadlock between fiemap and transaction commits 2019-07-30 18:25:12 +02:00
backref.h btrfs: fiemap: preallocate ulists for btrfs_check_shared 2019-07-01 13:34:53 +02:00
block-group.c btrfs: add space reservation tracepoint for reserved bytes 2019-09-09 14:59:17 +02:00
block-group.h btrfs: move struct io_ctl to free-space-cache.h 2019-09-09 14:59:15 +02:00
block-rsv.c btrfs: use btrfs_try_granting_tickets in update_global_rsv 2019-09-09 14:59:19 +02:00
block-rsv.h btrfs: migrate the global_block_rsv helpers to block-rsv.c 2019-07-02 12:30:55 +02:00
btrfs_inode.h btrfs: remove assumption about csum type form btrfs_print_data_csum_error() 2019-07-01 13:35:02 +02:00
check-integrity.c btrfs: reduce stack usage for btrfsic_process_written_block 2019-09-09 14:58:58 +02:00
check-integrity.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
compression.c btrfs: move cond_wake_up functions out of ctree 2019-09-09 14:59:15 +02:00
compression.h btrfs: compression: replace set_level callbacks by a common helper 2019-09-09 14:59:11 +02:00
ctree.c btrfs: Don't assign retval of btrfs_try_tree_write_lock/btrfs_tree_read_lock_atomic 2019-09-09 14:59:20 +02:00
ctree.h btrfs: create structure to encode checksum type and length 2019-09-09 14:59:19 +02:00
delalloc-space.c btrfs: roll tracepoint into btrfs_space_info_update helper 2019-09-09 14:59:17 +02:00
delalloc-space.h btrfs: migrate the delalloc space stuff to it's own home 2019-07-04 17:26:17 +02:00
delayed-inode.c btrfs: move cond_wake_up functions out of ctree 2019-09-09 14:59:15 +02:00
delayed-inode.h Btrfs: delayed-inode: use rb_first_cached for ins_root and del_root 2018-10-15 17:23:33 +02:00
delayed-ref.c btrfs: rename btrfs_space_info_add_old_bytes 2019-09-09 14:59:18 +02:00
delayed-ref.h btrfs: migrate the delayed refs rsv code 2019-07-04 17:26:17 +02:00
dev-replace.c btrfs: move cond_wake_up functions out of ctree 2019-09-09 14:59:15 +02:00
dev-replace.h btrfs: get fs_info from trans in btrfs_run_dev_replace 2019-04-29 19:02:43 +02:00
dir-item.c btrfs: remove unused parameter fs_info from btrfs_extend_item 2019-04-29 19:02:50 +02:00
disk-io.c btrfs: Detect unbalanced tree with empty leaf before crashing btree operations 2019-09-09 14:59:14 +02:00
disk-io.h btrfs: Make reada_tree_block_flagged private 2019-09-09 14:59:11 +02:00
export.c btrfs: Remove 'objectid' member from struct btrfs_root 2018-10-15 17:23:25 +02:00
export.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
extent_io.c Btrfs: fix missing error return if writeback for extent buffer never started 2019-09-24 14:45:23 +02:00
extent_io.h btrfs: Remove delalloc_end argument from extent_clear_unlock_delalloc 2019-09-09 14:58:59 +02:00
extent_map.c btrfs: assert extent map tree lock in add_extent_mapping 2019-09-09 14:59:00 +02:00
extent_map.h btrfs: Remove impossible condition from mergable_maps 2019-02-25 14:13:21 +01:00
extent-tree.c btrfs: refactor the ticket wakeup code 2019-09-09 14:59:18 +02:00
file-item.c btrfs: directly call into crypto framework for checksumming 2019-07-01 13:35:02 +02:00
file.c Btrfs: fix memory leak due to concurrent append writes with fiemap 2019-10-01 18:40:58 +02:00
free-space-cache.c btrfs: stop clearing EXTENT_DIRTY in inode I/O tree 2019-09-09 14:59:17 +02:00
free-space-cache.h btrfs: move struct io_ctl to free-space-cache.h 2019-09-09 14:59:15 +02:00
free-space-tree.c btrfs: move basic block_group definitions to their own header 2019-09-09 14:59:03 +02:00
free-space-tree.h btrfs: move basic block_group definitions to their own header 2019-09-09 14:59:03 +02:00
inode-item.c btrfs: Make btrfs_find_name_in_ext_backref return struct btrfs_inode_extref 2019-09-09 14:59:16 +02:00
inode-map.c btrfs: rename the btrfs_calc_*_metadata_size helpers 2019-09-09 14:59:13 +02:00
inode-map.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
inode.c btrfs: stop clearing EXTENT_DIRTY in inode I/O tree 2019-09-09 14:59:17 +02:00
ioctl.c btrfs: stop clearing EXTENT_DIRTY in inode I/O tree 2019-09-09 14:59:17 +02:00
Kconfig btrfs: Fix build error while LIBCRC32C is module 2019-07-17 17:03:30 +02:00
locking.c btrfs: move cond_wake_up functions out of ctree 2019-09-09 14:59:15 +02:00
locking.h btrfs: Remove unused locking functions 2019-09-09 14:58:59 +02:00
lzo.c btrfs: compression: replace set_level callbacks by a common helper 2019-09-09 14:59:11 +02:00
Makefile btrfs: migrate the block group lookup code 2019-09-09 14:59:04 +02:00
misc.h btrfs: move math functions to misc.h 2019-09-09 14:59:15 +02:00
ordered-data.c btrfs: move cond_wake_up functions out of ctree 2019-09-09 14:59:15 +02:00
ordered-data.h btrfs: don't assume ordered sums to be 4 bytes 2019-07-01 13:35:00 +02:00
orphan.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
print-tree.c btrfs: switch extent_buffer write_locks from atomic to int 2019-07-02 12:30:47 +02:00
print-tree.h btrfs: print-tree: debugging output enhancement 2018-04-20 19:18:16 +02:00
props.c btrfs: rename the btrfs_calc_*_metadata_size helpers 2019-09-09 14:59:13 +02:00
props.h btrfs: delete unused function btrfs_set_prop_trans 2019-04-29 19:02:54 +02:00
qgroup.c btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls 2019-09-27 15:24:34 +02:00
qgroup.h btrfs: qgroup: Move reserved data accounting from btrfs_delayed_ref_head to btrfs_qgroup_extent_record 2019-02-25 14:13:39 +01:00
raid56.c btrfs: move private raid56 definitions from ctree.h 2019-09-09 14:59:15 +02:00
raid56.h btrfs: constify map parameter for nr_parity_stripes and nr_data_stripes 2019-07-01 13:34:58 +02:00
rcu-string.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
reada.c btrfs: Make reada_tree_block_flagged private 2019-09-09 14:59:11 +02:00
ref-verify.c Wimplicit-fallthrough patches for 5.2-rc1 2019-05-07 12:48:10 -07:00
ref-verify.h btrfs: ref-verify: Use btrfs_ref to refactor btrfs_ref_tree_mod() 2019-04-29 19:02:49 +02:00
relocation.c btrfs: relocation: fix use-after-free on dead relocation roots 2019-09-25 15:23:42 +02:00
root-tree.c btrfs: rename the btrfs_calc_*_metadata_size helpers 2019-09-09 14:59:13 +02:00
scrub.c btrfs: move basic block_group definitions to their own header 2019-09-09 14:59:03 +02:00
send.c btrfs: Relinquish CPUs in btrfs_compare_trees 2019-09-09 14:59:20 +02:00
send.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
space-info.c btrfs: add enospc debug messages for ticket failure 2019-09-09 14:59:19 +02:00
space-info.h btrfs: rename btrfs_space_info_add_old_bytes 2019-09-09 14:59:18 +02:00
struct-funcs.c btrfs: tie extent buffer and it's token together 2019-09-09 14:59:16 +02:00
super.c btrfs: move sysfs declarations out of ctree.h 2019-09-09 14:59:06 +02:00
sysfs.c btrfs: sysfs: move helper macros to sysfs.c 2019-09-09 14:59:08 +02:00
sysfs.h btrfs: sysfs: move helper macros to sysfs.c 2019-09-09 14:59:08 +02:00
transaction.c btrfs: move cond_wake_up functions out of ctree 2019-09-09 14:59:15 +02:00
transaction.h Btrfs: fix deadlock between fiemap and transaction commits 2019-07-30 18:25:12 +02:00
tree-checker.c btrfs: Detect unbalanced tree with empty leaf before crashing btree operations 2019-09-09 14:59:14 +02:00
tree-checker.h btrfs: get fs_info from eb in btrfs_check_chunk_valid 2019-04-29 19:02:39 +02:00
tree-defrag.c btrfs: open code now trivial btrfs_set_lock_blocking 2019-02-25 14:13:27 +01:00
tree-log.c btrfs: tie extent buffer and it's token together 2019-09-09 14:59:16 +02:00
tree-log.h btrfs: get fs_info from trans in btrfs_set_log_full_commit 2019-04-29 19:02:41 +02:00
ulist.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
ulist.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
uuid-tree.c btrfs: remove unused parameter fs_info from btrfs_extend_item 2019-04-29 19:02:50 +02:00
volumes.c btrfs: Fix a regression which we can't convert to SINGLE profile 2019-09-25 16:00:37 +02:00
volumes.h btrfs: reset device stat using btrfs_dev_stat_set 2019-09-09 14:59:06 +02:00
xattr.c Btrfs: fix failure to persist compression property xattr deletion on fsync 2019-06-17 16:37:17 +02:00
xattr.h btrfs: cleanup btrfs_setxattr_trans and drop transaction parameter 2019-04-29 19:02:44 +02:00
zlib.c btrfs: compression: replace set_level callbacks by a common helper 2019-09-09 14:59:11 +02:00
zstd.c btrfs: move cond_wake_up functions out of ctree 2019-09-09 14:59:15 +02:00