linux

Author	SHA1	Message	Date
Josef Bacik	2cdc342c20	Btrfs: fix bitmap regression In cleaning up the clustering code I accidently introduced a regression by adding bitmap entries to the cluster rb tree. The problem is if we've maxed out the number of bitmaps we can have for the block group we can only add free space to the bitmaps, but since the bitmap is on the cluster we can't find it and we try to create another one. This would result in a panic because the total bitmaps was bigger than the max bitmaps that were allowed. This patch fixes this by checking to see if we have a cluster, and then looking at the cluster rb tree to see if it has a bitmap entry and if it does and that space belongs to that bitmap, go ahead and add it to that bitmap. I could hit this panic every time with an fs_mark test within a couple of minutes. With this patch I no longer hit the panic and fs_mark goes to completion. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-06-08 16:37:28 -04:00
Josef Bacik	f2bb8f5cfb	Btrfs: don't commit the transaction if we dont have enough pinned bytes I noticed when running an enospc test that we would get stuck committing the transaction in check_data_space even though we truly didn't have enough space. So check to see if bytes_pinned is bigger than num_bytes, if it's not don't commit the transaction. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-06-08 15:08:31 -04:00
Josef Bacik	3de85bb95c	Btrfs: noinline the cluster searching functions When profiling the find cluster code it's hard to tell where we are spending our time because the bitmap and non-bitmap functions get inlined by the compiler, so make that not happen. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-06-08 15:08:30 -04:00
Josef Bacik	86d4a77ba3	Btrfs: cache bitmaps when searching for a cluster If we are looking for a cluster in a particularly sparse or fragmented block group, we will do a lot of looping through the free space tree looking for various things, and if we need to look at bitmaps we will endup doing the whole dance twice. So instead add the bitmap entries to a temporary list so if we have to do the bitmap search we can just look through the list of entries we've found quickly instead of having to loop through the entire tree again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-06-08 15:08:28 -04:00
David Sterba	aa0467d8d2	btrfs: fix uninitialized variable warning With Linus' tree, today's linux-next build (powercp ppc64_defconfig) produced this warning: fs/btrfs/delayed-inode.c: In function 'btrfs_delayed_update_inode': fs/btrfs/delayed-inode.c:1598:6: warning: 'ret' may be used uninitialized in this function Introduced by commit `16cdcec736` ("btrfs: implement delayed inode items operation"). This fixes a bug in btrfs_update_inode(): if the returned value from btrfs_delayed_update_inode is a nonzero garbage, inode stat data are not updated and several call paths may hit a BUG_ON or fail with strange code. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David Sterba <dsterba@suse.cz>	2011-06-04 08:11:38 -04:00
David Sterba	7841cb2898	btrfs: add helper for fs_info->closing wrap checking of filesystem 'closing' flag and fix a few missing memory barriers. Signed-off-by: David Sterba <dsterba@suse.cz>	2011-06-04 08:11:22 -04:00
Chris Mason	4b9465cb9e	Btrfs: add mount -o inode_cache This makes the inode map cache default to off until we fix the overflow problem when the free space crcs don't fit inside a single page. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-04 08:03:47 -04:00
Arne Jansen	e7786c3ae5	btrfs: scrub: add explicit plugging With the removal of the implicit plugging scrub ends up doing more and smaller I/O than necessary. This patch adds explicit plugging per chunk. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-04 08:03:46 -04:00
David Sterba	a4689d2bd3	btrfs: use btrfs_ino to access inode number commit `4cb5300bc` ("Btrfs: add mount -o auto_defrag") accesses inode number directly while it should use the helper with the new inode number allocator. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-04 08:03:46 -04:00
Josef Bacik	d132a538d2	Btrfs: don't save the inode cache if we are deleting this root With xfstest 254 I can panic the box every time with the inode number caching stuff on. This is because we clean the inodes out when we delete the subvolume, but then we write out the inode cache which adds an inode to the subvolume inode tree, and then when it gets evicted again the root gets added back on the dead roots list and is deleted again, so we have a double free. To stop this from happening just return 0 if refs is 0 (and we're not the tree root since tree root always has refs of 0). With this fix 254 no longer panics. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Tested-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-04 08:03:45 -04:00
Arne Jansen	5f3f302a6f	btrfs: false BUG_ON when degraded In degraded mode the struct btrfs_device of missing devs don't have device->name set. A kstrdup of NULL correctly returns NULL. Don't BUG in this case. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-04 08:03:44 -04:00
liubo	ca456ae280	Btrfs: don't save the inode cache in non-FS roots This adds extra checks to make sure the inode map we are caching really belongs to a FS root instead of a special relocation tree. It prevents crashes during balancing operations. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-04 08:03:44 -04:00
Chris Mason	211f96c24f	Btrfs: make sure we don't overflow the free space cache crc page The free space cache uses only one page for crcs right now, which means we can't have a cache file bigger than the crcs we can fit in the first page. This adds a check to enforce that restriction. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-04 08:03:43 -04:00
Chris Mason	17aca1c987	Btrfs: fix uninit variable in the delayed inode code The nitems counter needs to start at zero Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-04 08:03:43 -04:00
Arne Jansen	1bc8779349	btrfs: scrub: don't reuse bios and pages The current scrub implementation reuses bios and pages as often as possible, allocating them only on start and releasing them when finished. This leads to more problems with the block layer than it's worth. The elevator gets confused when there are more pages added to the bio than bi_size suggests. This patch completely rips out the reuse of bios and pages and allocates them freshly for each submit. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Chris Maosn <chris.mason@oracle.com>	2011-06-04 08:03:17 -04:00
Chris Mason	ff5714cca9	Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work into for-linus Conflicts: fs/btrfs/disk-io.c fs/btrfs/extent-tree.c fs/btrfs/free-space-cache.c fs/btrfs/inode.c fs/btrfs/transaction.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-28 07:00:39 -04:00
Chris Mason	174ba50915	Btrfs: use the device_list_mutex during write_dev_supers write_dev_supers was changed to use RCU to protect the list of devices, but it was then sleeping while it actually wrote the supers. This fixes it to just use the mutex, since we really don't any concurrency in write_dev_supers anyway. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-27 10:03:58 -04:00
Li Zefan	a47d6b70e2	Btrfs: setup free ino caching in a more asynchronous way For a filesystem that has lots of files in it, the first time we mount it with free ino caching support, it can take quite a long time to setup the caching before we can create new files. Here we fill the cache with [highest_ino, BTRFS_LAST_FREE_OBJECTID] before we start the caching thread to search through the extent tree. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-26 17:53:04 -04:00
Arne Jansen	00d01bc17c	btrfs scrub: don't coalesce pages that are logically discontiguous scrub_page collects several pages into one bio as long as they are physically contiguous. As we only save one logical address for the whole bio, don't collect pages that are physically contiguous but logically discontiguous. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-26 17:52:52 -04:00
Chris Mason	c309df0786	Btrfs: return -ENOMEM in clear_extent_bit The btrfs releasepage function depends on ENOMEM coming back when it is called atomic. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-26 17:52:44 -04:00
Chris Mason	4cb5300bc8	Btrfs: add mount -o auto_defrag This will detect small random writes into files and queue the up for an auto defrag process. It isn't well suited to database workloads yet, but works for smaller files such as rpm, sqlite or bdb databases. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-26 17:52:15 -04:00
Chris Mason	d6c0cb379c	Merge branch 'cleanups_and_fixes' into inode_numbers Conflicts: fs/btrfs/tree-log.c fs/btrfs/volumes.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 14:37:47 -04:00
Xiao Guangrong	1f78160ce1	Btrfs: using rcu lock in the reader side of devices list fs_devices->devices is only updated on remove and add device paths, so we can use rcu to protect it in the reader side Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:43 -04:00
Xiao Guangrong	4622470565	Btrfs: drop unnecessary device lock Drop device_list_mutex for the reader side on clone_fs_devices and btrfs_rm_device pathes since the fs_info->volume_mutex can ensure the device list is not updated btrfs_close_extra_devices is the initialized path, we can not add or remove device at this time, so we can simply drop the mutex safely, like other initialized function does(add_missing_dev, __find_device, __btrfs_open_devices ...). Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:43 -04:00
Xiao Guangrong	0c1daee085	Btrfs: fix the race between remove dev and alloc chunk On remove device path, it updates device->dev_alloc_list but does not hold chunk lock Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:43 -04:00
Xiao Guangrong	c9513edb00	Btrfs: fix the race between reading and updating devices On btrfs_congested_fn and __unplug_io_fn paths, we should hold device_list_mutex to avoid remove/add device path to update fs_devices->devices On __btrfs_close_devices and btrfs_prepare_sprout paths, the devices in fs_devices->devices or fs_devices->devices is updated, so we should hold the mutex to avoid the reader side to reach them Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:42 -04:00
Xiao Guangrong	4f6c9328c6	Btrfs: fix bh leak on __btrfs_open_devices path 'bh' is forgot to release if no error is detected Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:42 -04:00
Xiao Guangrong	c7f895a2b2	Btrfs: fix unsafe usage of merge_state merge_state can free the current state if it can be merged with the next node, but in set_extent_bit(), after merge_state, we still use the current extent to get the next node and cache it into cached_state Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:41 -04:00
Xiao Guangrong	8233767a22	Btrfs: allocate extent state and check the result properly It doesn't allocate extent_state and check the result properly: - in set_extent_bit, it doesn't allocate extent_state if the path is not allowed wait - in clear_extent_bit, it doesn't check the result after atomic-ly allocate, we trigger BUG_ON() if it's fail - if allocate fail, we trigger BUG_ON instead of returning -ENOMEM since the return value of clear_extent_bit() is ignored by many callers Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:41 -04:00
Julia Lawall	b083916638	fs/btrfs: Add missing btrfs_free_path Btrfs_alloc_path should be matched with btrfs_free_path in error-handling code. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r exists@ local idexpression struct btrfs_path * x; expression ra,rb; position p1,p2; @@ x = btrfs_alloc_path@p1(...) ... when != btrfs_free_path(x,...) when != if (...) { ... btrfs_free_path(x,...) ...} when != x = ra if(...) { ... when != x = rb when forall when != btrfs_free_path(x,...) $return <+...x...+>; \\| return@p2...; $ } @script:python@ p1 << r.p1; p2 << r.p2; @@ cocci.print_main("alloc",p1) cocci.print_secs("return",p2) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:41 -04:00
Tsutomu Itoh	37daa4f968	Btrfs: check return value of btrfs_inc_extent_ref() If return value of btrfs_inc_extent_ref() is not 0, BUG() is called. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:40 -04:00
Tsutomu Itoh	c00e9493f1	Btrfs: return error to caller if read_one_inode() fails When read_one_inode() fails, error code is returned to caller instead of BUG_ON(). Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:40 -04:00
Tsutomu Itoh	1cd307990d	Btrfs: BUG_ON is deleted from the caller of btrfs_truncate_item & btrfs_extend_item Currently, btrfs_truncate_item and btrfs_extend_item returns only 0. So, the check by BUG_ON in the caller is unnecessary. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:39 -04:00
Tsutomu Itoh	65a246c5ff	Btrfs: return error code to caller when btrfs_del_item fails The error code is returned instead of calling BUG_ON when btrfs_del_item returns the error. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:39 -04:00
Tsutomu Itoh	b0b802d7e3	Btrfs: return error code to caller when btrfs_previous_item fails The error code is returned instead of calling BUG_ON when btrfs_previous_item returns the error. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:39 -04:00
Sergei Trofimovich	27160b6b5a	btrfs: fix typo 'testeing' -> 'testing' Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:32 -04:00
Sergei Trofimovich	9694b3fcbb	btrfs: typo: 'btrfS' -> 'btrfs' Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:25 -04:00
Sergei Trofimovich	c4f675cd40	btrfs: don't spin in shrink_delalloc if there is nothing to free Observed as a large delay when --mixed filesystem is filled up. Test example: 1. create tiny --mixed FS: $ dd if=/dev/zero of=2G.img seek=$((2048 * 1024 * 1024 - 1)) count=1 bs=1 $ mkfs.btrfs --mixed 2G.img $ mount -oloop 2G.img /mnt/ut/ 2. Try to fill it up: $ dd if=/dev/urandom of=10M.file bs=10240 count=1024 $ seq 1 256 \| while read file_no; do echo $file_no; time cp 10M.file ${file_no}.copy; done Up to '200.copy' it goes fast, but when disk fills-up each -ENOSPC message takes 3 seconds to pop-up _every_ ENOSPC (and in usermode linux it's even more: 30-60 seconds!). (Maybe, time depends on kernel's timer resolution). No IO, no CPU load, just rescheduling. Some debugging revealed busy spinning in shrink_delalloc. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:24:14 -04:00
Jamey Sharp	0f3b708c11	btrfs: Delete unused version.sh script. In 2008, commit `b4f6c45dfb` dropped the use of fs/btrfs/version.sh, but left the script behind. Kill it. Commit by Jamey Sharp and Josh Triplett. Signed-off-by: Jamey Sharp <jamey@minilop.net> Signed-off-by: Josh Triplett <josh@joshtriplett.org> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:05:39 -04:00
Hugo Mills	e215686715	btrfs: Ensure the tree search ioctl returns the right number of records Btrfs's tree search ioctl has a field to indicate that no more than a given number of records should be returned. The ioctl doesn't honour this, as the tested value is not incremented until the end of the copy_to_sk function. This patch removes an unnecessary local variable, and updates the num_found counter as each key is found in the tree. Signed-off-by: Hugo Mills <hugo@carfax.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:05:39 -04:00
Andi Kleen	0956c798ef	BTRFS: Remove unused node_lock `240f62c875` replaced the node_lock with rcu_read_lock, but forgot to remove the actual lock in the data structure. Remove it here. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-05-23 13:05:39 -04:00
Josef Bacik	d90c732122	Btrfs: leave spinning on lookup and map the leaf On lookup we only want to read the inode item, so leave the path spinning. Also we're just wholesale reading the leaf off, so map the leaf so we don't do a bunch of kmap/kunmaps. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-05-23 13:03:17 -04:00
Josef Bacik	207dde8289	Btrfs: check for duplicate entries in the free space cache If there are duplicate entries in the free space cache, discard the entire cache and load it the old fashioned way. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-05-23 13:03:16 -04:00
Josef Bacik	cca1c81f43	Btrfs: don't try to allocate from a block group that doesn't have enough space If we have a very large filesystem, we can spend a lot of time in find_free_extent just trying to allocate from empty block groups. So instead check to see if the block group even has enough space for the allocation, and if not go on to the next block group. Signed-off-by: Josef Bacik <josef@redhat.com>	2011-05-23 13:03:15 -04:00
Josef Bacik	026fd31782	Btrfs: don't always do readahead Our readahead is sort of sloppy, and really isn't always needed. For example if ls is doing a stating ls (which is the default) it's going to stat in non-disk order, so if say you have a directory with a stupid amount of files, readahead is going to do nothing but waste time in the case of doing the stat. Taking the unconditional readahead out made my test go from 57 minutes to 36 minutes. This means that everywhere we do loop through the tree we want to make sure we do set path->reada properly, so I went through and found all of the places where we loop through the path and set reada to 1. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-05-23 13:03:14 -04:00
Josef Bacik	589d8ade83	Btrfs: try not to sleep as much when doing slow caching When the fs is super full and we unmount the fs, we could get stuck in this thing where unmount is waiting for the caching kthread to make progress and the caching kthread keeps scheduling because we're in the middle of a commit. So instead just let the caching kthread keep going and only yeild if need_resched(). This makes my horrible umount case go from taking up to 10 minutes to taking less than 20 seconds. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-05-23 13:03:13 -04:00
Josef Bacik	d82a6f1d7e	Btrfs: kill BTRFS_I(inode)->block_group Originally this was going to be used as a way to give hints to the allocator, but frankly we can get much better hints elsewhere and it's not even used at all for anything usefull. In addition to be completely useless, when we initialize an inode we try and find a freeish block group to set as the inodes block group, and with a completely full 40gb fs this takes _forever_, so I imagine with say 1tb fs this is just unbearable. So just axe the thing altoghether, we don't need it and it saves us 8 bytes in the inode and saves us 500 microseconds per inode lookup in my testcase. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-05-23 13:03:12 -04:00
Josef Bacik	7e2355ba1a	Btrfs: don't look at the extent buffer level 3 times in a row We have a bit of debugging in btrfs_search_slot to make sure the level of the cow block is the same as the original block we were cow'ing. I don't think I've ever seen this tripped, so kill it. This saves us 2 kmap's per level in our search. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-05-23 13:03:11 -04:00
Josef Bacik	cb25c2ea6a	Btrfs: map the node block when looking for readahead targets If we have particularly full nodes, we could call btrfs_node_blockptr up to 32 times, which is 32 pairs of kmap/kunmap, which _sucks_. So go ahead and map the extent buffer while we look for readahead targets. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-05-23 13:03:10 -04:00
Josef Bacik	af60bed24e	Btrfs: set range_start to the right start in count_range_bits In count_range_bits we are adjusting total_bytes based on the range we are searching for, but we don't adjust the range start according to the range we are searching for, which makes for weird results. For example, if the range [0-8192] is set DELALLOC, but I search for 4096-8192, I will get back 4096 for the number of bytes found, but the range_start will be 0, which makes it look like the range is [0-4096]. So instead set range_start = max(cur_start, state->start). This makes everything come out right. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-05-23 13:03:09 -04:00

1 2 3 4 5 ...

245290 Commits