btrfs: don't commit transaction for every subvol create

Recently a Meta-internal workload encountered subvolume creation taking
up to 2s each, significantly slower than directory creation. As they
were hoping to be able to use subvolumes instead of directories, and
were looking to create hundreds, this was a significant issue. After
Josef investigated, it turned out to be due to the transaction commit
currently performed at the end of subvolume creation.

This change improves the workload by not doing transaction commit for every
subvolume creation, and merely requiring a transaction commit on fsync.
In the worst case, of doing a subvolume create and fsync in a loop, this
should require an equal amount of time to the current scheme; and in the
best case, the internal workload creating hundreds of subvolumes before
fsyncing is greatly improved.

While it would be nice to be able to use the log tree and use the normal
fsync path, log tree replay can't deal with new subvolume inodes
presently.

It's possible that there's some reason that the transaction commit is
necessary for correctness during subvolume creation; however,
git logs indicate that the commit dates back to the beginning of
subvolume creation, and there are no notes on why it would be necessary.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This commit is contained in:
Sweet Tea Dorminy 2023-04-11 15:10:53 -04:00 committed by David Sterba
parent f469c8bd90
commit 1b53e51a4a

View File

@ -649,6 +649,8 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
}
trans->block_rsv = &block_rsv;
trans->bytes_reserved = block_rsv.size;
/* Tree log can't currently deal with an inode which is a new root. */
btrfs_set_log_full_commit(trans);
ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit);
if (ret)
@ -757,10 +759,7 @@ out:
trans->bytes_reserved = 0;
btrfs_subvolume_release_metadata(root, &block_rsv);
if (ret)
btrfs_end_transaction(trans);
else
ret = btrfs_commit_transaction(trans);
btrfs_end_transaction(trans);
out_new_inode_args:
btrfs_new_inode_args_destroy(&new_inode_args);
out_inode: