linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-18 10:01:43 +00:00

Author	SHA1	Message	Date
Wang Shilong	467bb1d27c	Btrfs: make sure we cleanup all reloc roots if error happens I hit an oops when merging reloc roots fails, the reason is that new reloc roots may be added and we should make sure we cleanup all reloc roots. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2013-12-12 07:12:51 -08:00
Wang Shilong	6646374863	Btrfs: skip building backref tree for uuid and quota tree when doing balance relocation Quota tree and UUID Tree is only cowed, they can not be snapshoted. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2013-12-12 07:12:36 -08:00
Wang Shilong	c974c4642f	Btrfs: fix an oops when doing balance relocation I hit an oops when inserting reloc root into @reloc_root_tree(it can be easily triggered when forcing cow for relocation root) [ 866.494539] [<ffffffffa0499579>] btrfs_init_reloc_root+0x79/0xb0 [btrfs] [ 866.495321] [<ffffffffa044c240>] record_root_in_trans+0xb0/0x110 [btrfs] [ 866.496109] [<ffffffffa044d758>] btrfs_record_root_in_trans+0x48/0x80 [btrfs] [ 866.496908] [<ffffffffa0494da8>] select_reloc_root+0xa8/0x210 [btrfs] [ 866.497703] [<ffffffffa0495c8a>] do_relocation+0x16a/0x540 [btrfs] This is because reloc root inserted into @reloc_root_tree is not within one transaction,reloc root may be cowed and root block bytenr will be reused then oops happens.We should update reloc root in @reloc_root_tree when cow reloc root node, fix it. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2013-12-12 07:12:20 -08:00
Miao Xie	91aef86f3b	Btrfs: rename btrfs_start_all_delalloc_inodes rename the function -- btrfs_start_all_delalloc_inodes(), and make its name be compatible to btrfs_wait_ordered_roots(), since they are always used at the same place. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:13:58 -05:00
Miao Xie	b02441999e	Btrfs: don't wait for the completion of all the ordered extents It is very likely that there are lots of ordered extents in the filesytem, if we wait for the completion of all of them when we want to reclaim some space for the metadata space reservation, we would be blocked for a long time. The performance would drop down suddenly for a long time. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:13:44 -05:00
Dulshani Gunawardhana	fae7f21cec	btrfs: Use WARN_ON()'s return value in place of WARN_ON(1) Use WARN_ON()'s return value in place of WARN_ON(1) for cleaner source code that outputs a more descriptive warnings. Also fix the styling warning of redundant braces that came up as a result of this fix. Signed-off-by: Dulshani Gunawardhana <dulshani.gunawardhana89@gmail.com> Reviewed-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:11:53 -05:00
Wang Shilong	7fdf4b608d	Btrfs: use 'u64' rather than 'int' to get extent's generation We define a 'int' to get extent's generation by mistake,fix it. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:10:06 -05:00
Josef Bacik	9e6a0c52b7	Btrfs: stop committing the transaction so much during relocate I noticed with my horrible snapshot excercisor that we were taking forever to relocate the larger the file system got. This appeared to be because we were committing the transaction _constantly_. There were a few places where we do braindead things with metadata reservation, like start a transaction and then try to refill the block rsv, which not only keeps us from committing a transaction during the enospc stuff, but keeps us from doing some of the harder flushing work which will make us more likely to need to commit the transaction. We also were checking the block rsv and committing the transaction if the block rsv was below a certain threshold, but we were doing this in a place where we don't actually keep anything in the block rsv so this was always ending up false so we always committed the transaction in this case. I tested this to make sure it didn't break anything, but it takes about 10 hours to get the box to this state so I don't know how much of an impact it will really make. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:08:31 -05:00
Josef Bacik	0ef8b72607	Btrfs: return an error from btrfs_wait_ordered_range I noticed that if the free space cache has an error writing out it's data it won't actually error out, it will just carry on. This is because it doesn't check the return value of btrfs_wait_ordered_range, which didn't actually return anything. So fix this in order to keep us from making free space cache look valid when it really isnt. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:07:35 -05:00
Miao Xie	20dd2cbf01	Btrfs: fix BUG_ON() casued by the reserved space migration When we did space balance and snapshot creation at the same time, we might meet the following oops: kernel BUG at fs/btrfs/inode.c:3038! [SNIP] Call Trace: [<ffffffffa0411ec7>] btrfs_orphan_cleanup+0x293/0x407 [btrfs] [<ffffffffa042dc45>] btrfs_mksubvol.isra.28+0x259/0x373 [btrfs] [<ffffffffa042de85>] btrfs_ioctl_snap_create_transid+0x126/0x156 [btrfs] [<ffffffffa042dff1>] btrfs_ioctl_snap_create_v2+0xd0/0x121 [btrfs] [<ffffffffa0430b2c>] btrfs_ioctl+0x414/0x1854 [btrfs] [<ffffffff813b60b7>] ? __do_page_fault+0x305/0x379 [<ffffffff811215a9>] vfs_ioctl+0x1d/0x39 [<ffffffff81121d7c>] do_vfs_ioctl+0x32d/0x3e2 [<ffffffff81057fe7>] ? finish_task_switch+0x80/0xb8 [<ffffffff81121e88>] SyS_ioctl+0x57/0x83 [<ffffffff813b39ff>] ? do_device_not_available+0x12/0x14 [<ffffffff813b99c2>] system_call_fastpath+0x16/0x1b [SNIP] RIP [<ffffffffa040da40>] btrfs_orphan_add+0xc3/0x126 [btrfs] The reason of the problem is that the relocation root creation stole the reserved space, which was reserved for orphan item deletion. There are several ways to fix this problem, one is to increasing the reserved space size of the space balace, and then we can use that space to create the relocation tree for each fs/file trees. But it is hard to calculate the suitable size because we doesn't know how many fs/file trees we need relocate. We fixed this problem by reserving the space for relocation root creation actively since the space it need is very small (one tree block, used for root node copy), then we use that reserved space to create the relocation tree. If we don't reserve space for relocation tree creation, we will use the reserved space of the balance. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 21:54:28 -05:00
Josef Bacik	4577b014d1	Btrfs: relocate csums properly with prealloc extents A user reported a problem where they were getting csum errors when running a balance and running systemd's journal. This is because systemd is awesome and fallocate()'s its log space and writes into it. Unfortunately we assume that when we read in all the csums for an extent that they are sequential starting at the bytenr we care about. This obviously isn't the case for prealloc extents, where we could have written to the middle of the prealloc extent only, which means the csum would be for the bytenr in the middle of our range and not the front of our range. Fix this by offsetting the new bytenr we are logging to based on the original bytenr the csum was for. With this patch I no longer see the csum errors I was seeing. Thanks, Cc: stable@vger.kernel.org Reported-by: Chris Murphy <lists@colorremedies.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 21:53:22 -05:00
Filipe David Borba Manana	7451432394	Btrfs: remove path arg from btrfs_truncate_free_space_cache Not used for anything, and removing it avoids caller's need to allocate a path structure. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 21:51:33 -05:00
Miao Xie	c00869f1ae	Btrfs: fix oops caused by the space balance and dead roots When doing space balance and subvolume destroy at the same time, we met the following oops: kernel BUG at fs/btrfs/relocation.c:2247! RIP: 0010: [<ffffffffa04cec16>] prepare_to_merge+0x154/0x1f0 [btrfs] Call Trace: [<ffffffffa04b5ab7>] relocate_block_group+0x466/0x4e6 [btrfs] [<ffffffffa04b5c7a>] btrfs_relocate_block_group+0x143/0x275 [btrfs] [<ffffffffa0495c56>] btrfs_relocate_chunk.isra.27+0x5c/0x5a2 [btrfs] [<ffffffffa0459871>] ? btrfs_item_key_to_cpu+0x15/0x31 [btrfs] [<ffffffffa048b46a>] ? btrfs_get_token_64+0x7e/0xcd [btrfs] [<ffffffffa04a3467>] ? btrfs_tree_read_unlock_blocking+0xb2/0xb7 [btrfs] [<ffffffffa049907d>] btrfs_balance+0x9c7/0xb6f [btrfs] [<ffffffffa049ef84>] btrfs_ioctl_balance+0x234/0x2ac [btrfs] [<ffffffffa04a1e8e>] btrfs_ioctl+0xd87/0x1ef9 [btrfs] [<ffffffff81122f53>] ? path_openat+0x234/0x4db [<ffffffff813c3b78>] ? __do_page_fault+0x31d/0x391 [<ffffffff810f8ab6>] ? vma_link+0x74/0x94 [<ffffffff811250f5>] vfs_ioctl+0x1d/0x39 [<ffffffff811258c8>] do_vfs_ioctl+0x32d/0x3e2 [<ffffffff811259d4>] SyS_ioctl+0x57/0x83 [<ffffffff813c3bfa>] ? do_page_fault+0xe/0x10 [<ffffffff813c73c2>] system_call_fastpath+0x16/0x1b It is because we returned the error number if the reference of the root was 0 when doing space relocation. It was not right here, because though the root was dead(refs == 0), but the space it held still need be relocated, or we could not remove the block group. So in this case, we should return the root no matter it is dead or not. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-10-10 21:31:02 -04:00
Josef Bacik	f0de181c9b	Btrfs: kill delay_iput arg to the wait_ordered functions This is a left over of how we used to wait for ordered extents, which was to grab the inode and then run filemap flush on it. However if we have an ordered extent then we already are holding a ref on the inode, and we just use btrfs_start_ordered_extent anyway, so there is no reason to have an extra ref on the inode to start work on the ordered extent. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-09-21 11:05:27 -04:00
Josef Bacik	83d4cfd4da	Btrfs: fixup error handling in btrfs_reloc_cow If we failed to actually allocate the correct size of the extent to relocate we will end up in an infinite loop because we won't return an error, we'll just move on to the next extent. So fix this up by returning an error, and then fix all the callers to return an error up the stack rather than BUG_ON()'ing. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-09-21 10:58:54 -04:00
Stefan Behrens	23fa76b0ba	Btrf: cleanup: don't check for root_refs == 0 twice btrfs_read_fs_root_no_name() already checks if btrfs_root_refs() is zero and returns ENOENT in this case. There is no need to do it again in three more places. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-09-01 08:16:29 -04:00
Geert Uytterhoeven	c1c9ff7c94	Btrfs: Remove superfluous casts from u64 to unsigned long long u64 is "unsigned long long" on all architectures now, so there's no need to cast it when formatting it using the "ll" length modifier. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-09-01 08:16:08 -04:00
Josef Bacik	b6c60c8018	Btrfs: change how we queue blocks for backref checking Previously we only added blocks to the list to have their backrefs checked if the level of the block is right above the one we are searching for. This is because we want to make sure we don't add the entire path up to the root to the lists to make sure we process things one at a time. This assumes that if any blocks in the path to the root are going to be not checked (shared in other words) then they will be in the level right above the current block on up. This isn't quite right though since we can have blocks higher up the list that are shared because they are attached to a reloc root. But we won't add this block to be checked and then later on we will BUG_ON(!upper->checked). So instead keep track of wether or not we've queued a block to be checked in this current search, and if we haven't go ahead and queue it to be checked. This patch fixed the panic I was seeing where we BUG_ON(!upper->checked). Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-09-01 08:04:41 -04:00
Josef Bacik	d062d13cf1	Btrfs: check to see if we have an inline item properly If our item isn't big enough to have an actual inline item when we have skinny metadata enabled just return 1 in find_inline_backref so we can move on to the next item. This probably wasn't causing a problem since we check the values of ptr and end properly, but just in case this will keep us from doing extra work. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-09-01 08:04:40 -04:00
Josef Bacik	b37b39cd6b	Btrfs: cleanup reloc roots properly on error I was hitting the BUG_ON() at the end of merge_reloc_roots() because we were aborting the transaction at some point previously and then getting an error when we tried to drop the reloc root. I fixed btrfs_drop_snapshot to re-add us to the dead roots list if we failed, but this isn't the right thing to do for reloc roots since it uses root->root_list for it's own stuff in order to know what needs to be cleaned up. So fix btrfs_drop_snapshot to only do the re-add if we aren't dropping for reloc, and handle errors from merge_reloc_root() by dropping the reloc root we are processing since it won't be on the list of roots to cleanup. With this patch my reproducer no longer panics. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-09-01 08:04:24 -04:00
Filipe David Borba Manana	647f63bd36	Btrfs: add missing error checks to add_data_references The function relocation.c:add_data_references() was not checking if all calls to __add_tree_block() and find_data_references() were succeeding or not. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-09-01 07:57:35 -04:00
Miao Xie	f51a4a1826	Btrfs: remove btrfs_sector_sum structure Using the structure btrfs_sector_sum to keep the checksum value is unnecessary, because the extents that btrfs_sector_sum points to are continuous, we can find out the expected checksums by btrfs_ordered_sum's bytenr and the offset, so we can remove btrfs_sector_sum's bytenr. After removing bytenr, there is only one member in the structure, so it makes no sense to keep the structure, just remove it, and use a u32 array to store the checksum value. By this change, we don't use the while loop to get the checksums one by one. Now, we can get several checksum value at one time, it improved the performance by ~74% on my SSD (31MB/s -> 54MB/s). test command: # dd if=/dev/zero of=/mnt/btrfs/file0 bs=1M count=1024 oflag=sync Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-07-02 11:50:47 -04:00
Josef Bacik	aee68ee5f5	Btrfs: fix not being able to find skinny extents during relocate We unconditionally search for the EXTENT_ITEM_KEY for metadata during balance, and then check the key that we found to see if it is actually a METADATA_ITEM_KEY, but this doesn't work right because METADATA is a higher key value, so if what we are looking for happens to be the first item in the leaf the search will dump us out at the previous leaf, and we won't find our item. So instead do what we do everywhere else, search for the skinny extent first and if we don't find it go back and re-search for the extent item. This patch fixes the panic I was hitting when balancing a large file system with skinny extents. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-07-01 08:52:30 -04:00
Miao Xie	5bc7247ac4	Btrfs: fix broken nocow after balance Balance will create reloc_root for each fs root, and it's going to record last_snapshot to filter shared blocks. The side effect of setting last_snapshot is to break nocow attributes of files. Since the extents are not shared by the relocation tree after the balance, we can recover the old last_snapshot safely if no one snapshoted the source tree. We fix the above problem by this way. Reported-by: Kyle Gates <kylegates@hotmail.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-07-01 08:52:25 -04:00
Miao Xie	199c2a9c3d	Btrfs: introduce per-subvolume ordered extent list The reason we introduce per-subvolume ordered extent list is the same as the per-subvolume delalloc inode list. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-06-14 11:29:41 -04:00
Miao Xie	eb73c1b7ce	Btrfs: introduce per-subvolume delalloc inode list When we create a snapshot, we need flush all delalloc inodes in the fs, just flushing the inodes in the source tree is OK. So we introduce per-subvolume delalloc inode list. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-06-14 11:29:40 -04:00
Miao Xie	cb517eabba	Btrfs: cleanup the similar code of the fs root read There are several functions whose code is similar, such as btrfs_find_last_root() btrfs_read_fs_root_no_radix() Besides that, some functions are invoked twice, it is unnecessary, for example, we are sure that all roots which is found in btrfs_find_orphan_roots() have their orphan items, so it is unnecessary to check the orphan item again. So cleanup it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-06-14 11:29:37 -04:00
Josef Bacik	a9995eece3	Btrfs: init relocate extent_io_tree with a mapping Dave reported a NULL pointer deref. This is caused because he thought he'd be smart and add sanity checks to the extent_io bit operations, but he didn't expect a tree to have a NULL mapping. To fix this we just need to init the relocation's processed_blocks with the btree_inode->i_mapping. Thanks, Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-06-08 15:07:53 -04:00
Miao Xie	7b61cd9224	Btrfs: don't use global block reservation for inode cache truncation It is very likely that there are lots of subvolumes/snapshots in the filesystem, so if we use global block reservation to do inode cache truncation, we may hog all the free space that is reserved in global rsv. So it is better that we do the free space reservation for inode cache truncation by ourselves. Cc: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-05-17 21:40:22 -04:00
Stefan Behrens	379cde741b	Btrfs: fix possible memory leak in replace_path() In replace_path(), if read_tree_block() fails, we cannot return directly, we should free some allocated memory otherwise memory leak happens. Similar to Wang's "Btrfs: fix possible memory leak in the find_parent_nodes()" patch, the current commit fixes an issue that is related to the "Btrfs: fix all callers of read_tree_block" commit. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-05-17 21:40:19 -04:00
David Sterba	34c2b29079	btrfs: handle errors returned from get_tree_block_key Signed-off-by: David Sterba <dsterba@suse.cz> Reviewed-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-05-06 15:55:24 -04:00
Eric Sandeen	48a3b6366f	btrfs: make static code static & remove dead code Big patch, but all it does is add statics to functions which are in fact static, then remove the associated dead-code fallout. removed functions: btrfs_iref_to_path() __btrfs_lookup_delayed_deletion_item() __btrfs_search_delayed_insertion_item() __btrfs_search_delayed_deletion_item() find_eb_for_page() btrfs_find_block_group() range_straddles_pages() extent_range_uptodate() btrfs_file_extent_length() btrfs_scrub_cancel_devid() btrfs_start_transaction_lflush() btrfs_print_tree() is left because it is used for debugging. btrfs_start_transaction_lflush() and btrfs_reada_detach() are left for symmetry. ulist.c functions are left, another patch will take care of those. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-05-06 15:55:23 -04:00
Josef Bacik	416bc6580b	Btrfs: fix all callers of read_tree_block We kept leaking extent buffers when mounting a broken file system and it turns out it's because not everybody uses read_tree_block properly. You need to check and make sure the extent_buffer is uptodate before you use it. This patch fixes everybody who calls read_tree_block directly to make sure they check that it is uptodate and free it and return an error if it is not. With this we no longer leak EB's when things go horribly wrong. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-05-06 15:55:07 -04:00
Josef Bacik	09a2a8f96e	Btrfs: fix bad extent logging A user sent me a btrfs-image of a file system that was panicing on mount during the log recovery. I had originally thought these problems were from a bug in the free space cache code, but that was just a symptom of the problem. The problem is if your application does something like this [prealloc][prealloc][prealloc] the internal extent maps will merge those all together into one extent map, even though on disk they are 3 separate extents. So if you go to write into one of these ranges the extent map will be right since we use the physical extent when doing the write, but when we log the extents they will use the wrong sizes for the remainder prealloc space. If this doesn't happen to trip up the free space cache (which it won't in a lot of cases) then you will get bogus entries in your extent tree which will screw stuff up later. The data and such will still work, but everything else is broken. This patch fixes this by not allowing extents that are on the modified list to be merged. This has the side effect that we are no longer adding everything to the modified list all the time, which means we now have to call btrfs_drop_extents every time we log an extent into the tree. So this allows me to drop all this speciality code I was using to get around calling btrfs_drop_extents. With this patch the testcase I've created no longer creates a bogus file system after replaying the log. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-05-06 15:54:34 -04:00
David Sterba	9d1a2a3ad5	btrfs: clean snapshots one by one Each time pick one dead root from the list and let the caller know if it's needed to continue. This should improve responsiveness during umount and balance which at some point waits for cleaning all currently queued dead roots. A new dead root is added to the end of the list, so the snapshots disappear in the order of deletion. The snapshot cleaning work is now done only from the cleaner thread and the others wake it if needed. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-05-06 15:54:21 -04:00
Josef Bacik	3173a18f70	Btrfs: add a incompatible format change for smaller metadata extent refs We currently store the first key of the tree block inside the reference for the tree block in the extent tree. This takes up quite a bit of space. Make a new key type for metadata which holds the level as the offset and completely removes storing the btrfs_tree_block_info inside the extent ref. This reduces the size from 51 bytes to 33 bytes per extent reference for each tree block. In practice this results in a 30-35% decrease in the size of our extent tree, which means we COW less and can keep more of the extent tree in memory which makes our heavy metadata operations go much faster. This is not an automatic format change, you must enable it at mkfs time or with btrfstune. This patch deals with having metadata stored as either the old format or the new format so it is easy to convert. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-05-06 15:54:18 -04:00
Linus Torvalds	0aefda3e81	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "These are scattered fixes and one performance improvement. The biggest functional change is in how we throttle metadata changes. The new code bumps our average file creation rate up by ~13% in fs_mark, and lowers CPU usage. Stefan bisected out a regression in our allocation code that made balance loop on extents larger than 256MB." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: improve the delayed inode throttling Btrfs: fix a mismerge in btrfs_balance() Btrfs: enforce min_bytes parameter during extent allocation Btrfs: allow running defrag in parallel to administrative tasks Btrfs: avoid deadlock on transaction waiting list Btrfs: do not BUG_ON on aborted situation Btrfs: do not BUG_ON in prepare_to_reloc Btrfs: free all recorded tree blocks on error Btrfs: build up error handling for merge_reloc_roots Btrfs: check for NULL pointer in updating reloc roots Btrfs: fix unclosed transaction handler when the async transaction commitment fails Btrfs: fix wrong handle at error path of create_snapshot() when the commit fails Btrfs: use set_nlink if our i_nlink is 0	2013-03-08 17:33:20 -08:00
Liu Bo	0f788c5819	Btrfs: do not BUG_ON on aborted situation Btrfs balance can easily hit BUG_ON in these places, but we want to it bail out gracefully after we force the whole filesystem to readonly. So we use btrfs_std_error hook in place of BUG_ON. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-03-04 16:33:23 -05:00
Liu Bo	288189471d	Btrfs: do not BUG_ON in prepare_to_reloc We can bail out from here gracefully instead of a cold BUG_ON. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-03-04 16:33:23 -05:00
Liu Bo	e1a1267054	Btrfs: free all recorded tree blocks on error We've missed the 'free blocks' part on ENOMEM error. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-03-04 16:33:23 -05:00
Liu Bo	aca1bba6f9	Btrfs: build up error handling for merge_reloc_roots We first use btrfs_std_error hook to replace with BUG_ON, and we also need to cleanup what is left, including reloc roots rbtree and reloc roots list. Here we use a helper function to cleanup both rbtree and list, and since this function can also be used in the balance recover path, we also make the change as well to keep code simple. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-03-04 16:33:22 -05:00
Liu Bo	8f71f3e0e4	Btrfs: check for NULL pointer in updating reloc roots Add a check for NULL pointer to avoid invalid reference. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-03-04 16:33:22 -05:00
Linus Torvalds	b695188dd3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs update from Chris Mason: "The biggest feature in the pull is the new (and still experimental) raid56 code that David Woodhouse started long ago. I'm still working on the parity logging setup that will avoid inconsistent parity after a crash, so this is only for testing right now. But, I'd really like to get it out to a broader audience to hammer out any performance issues or other problems. scrub does not yet correct errors on raid5/6 either. Josef has another pass at fsync performance. The big change here is to combine waiting for metadata with waiting for data, which is a big latency win. It is also step one toward using atomics from the hardware during a commit. Mark Fasheh has a new way to use btrfs send/receive to send only the metadata changes. SUSE is using this to make snapper more efficient at finding changes between snapshosts. Snapshot-aware defrag is also included. Otherwise we have a large number of fixes and cleanups. Eric Sandeen wins the award for removing the most lines, and I'm hoping we steal this idea from XFS over and over again." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (118 commits) btrfs: fixup/remove module.h usage as required Btrfs: delete inline extents when we find them during logging btrfs: try harder to allocate raid56 stripe cache Btrfs: cleanup to make the function btrfs_delalloc_reserve_metadata more logic Btrfs: don't call btrfs_qgroup_free if just btrfs_qgroup_reserve fails Btrfs: remove reduplicate check about root in the function btrfs_clean_quota_tree Btrfs: return ENOMEM rather than use BUG_ON when btrfs_alloc_path fails Btrfs: fix missing deleted items in btrfs_clean_quota_tree btrfs: use only inline_pages from extent buffer Btrfs: fix wrong reserved space when deleting a snapshot/subvolume Btrfs: fix wrong reserved space in qgroup during snap/subv creation Btrfs: remove unnecessary dget_parent/dput when creating the pending snapshot btrfs: remove a printk from scan_one_device Btrfs: fix NULL pointer after aborting a transaction Btrfs: fix memory leak of log roots Btrfs: copy everything if we've created an inline extent btrfs: cleanup for open-coded alignment Btrfs: do not change inode flags in rename Btrfs: use reserved space for creating a snapshot clear chunk_alloc flag on retryable failure ...	2013-03-02 16:41:54 -08:00
Miao Xie	4eee4fa4f8	Btrfs: use wrapper page_offset Use wrapper page_offset to get byte-offset into filesystem object for page. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-02-20 09:36:43 -05:00
Jiri Kosina	617677295b	Merge branch 'master' into for-next Conflicts: drivers/devfreq/exynos4_bus.c Sync with Linus' tree to be able to apply patches that are against newer code (mvneta).	2013-01-29 10:48:30 +01:00
Liu Bo	2c016dc2cb	btrfs: fix comment typos Convert 'hepler' to 'helper'. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-01-09 11:40:52 +01:00
Liu Bo	b53d3f5db2	Btrfs: cleanup for btrfs_btree_balance_dirty - 'nr' is no more used. - btrfs_btree_balance_dirty() and __btrfs_btree_balance_dirty() can share a bunch of code. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-12-12 17:15:28 -05:00
Miao Xie	8ccf6f19b6	Btrfs: make delalloc inodes be flushed by multi-task This patch introduce a new worker pool named "flush_workers", and if we want to force all the inode with pending delalloc to the disks, we can queue those inodes into the work queue of the worker pool, in this way, those inodes will be flushed by multi-task. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-12-11 13:31:37 -05:00
Miao Xie	08e007d2e5	Btrfs: improve the noflush reservation In some places(such as: evicting inode), we just can not flush the reserved space of delalloc, flushing the delayed directory index and delayed inode is OK, but we don't try to flush those things and just go back when there is no enough space to be reserved. This patch fixes this problem. We defined 3 types of the flush operations: NO_FLUSH, FLUSH_LIMIT and FLUSH_ALL. If we can in the transaction, we should not flush anything, or the deadlock would happen, so use NO_FLUSH. If we flushing the reserved space of delalloc would cause deadlock, use FLUSH_LIMIT. In the other cases, FLUSH_ALL is used, and we will flush all things. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-12-11 13:31:31 -05:00
Josef Bacik	e6138876ad	Btrfs: cache extent state when writing out dirty metadata pages Everytime we write out dirty pages we search for an offset in the tree, convert the bits in the state, and then when we wait we search for the offset again and clear the bits. So for every dirty range in the io tree we are doing 4 rb searches, which is suboptimal. With this patch we are only doing 2 searches for every cycle (modulo weird things happening). Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-10-09 09:15:41 -04:00

1 2 3

131 Commits