linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-17 01:22:07 +00:00

Author	SHA1	Message	Date
Mike Snitzer	018debea8d	dm thin: emit ignore_discard in status when discards disabled If "ignore_discard" is specified when creating the thin pool device then discard support is disabled for that device. The pool device's status should reflect this fact rather than stating "no_discard_passdown" (which implies discards are enabled but passdown is disabled). Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:32 +00:00
Joe Thornber	563af186df	dm thin: wake worker when discard is prepared When discards are prepared it is best to directly wake the worker that will process them. The worker will be woken anyway, via periodic commit, but there is no reason to not wake_worker here. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:31 +00:00
Joe Thornber	e8088073c9	dm thin: fix race between simultaneous io and discards to same block There is a race when discard bios and non-discard bios are issued simultaneously to the same block. Discard support is expensive for all thin devices precisely because you have to be careful to quiesce the area you're discarding. DM thin must handle this conflicting IO pattern (simultaneous non-discard vs discard) even though a sane application shouldn't be issuing such IO. The race manifests as follows: 1. A non-discard bio is mapped in thin_bio_map. This doesn't lock out parallel activity to the same block. 2. A discard bio is issued to the same block as the non-discard bio. 3. The discard bio is locked in a dm_bio_prison_cell in process_discard to lock out parallel activity against the same block. 4. The non-discard bio's mapping continues and its all_io_entry is incremented so the bio is accounted for in the thin pool's all_io_ds which is a dm_deferred_set used to track time locality of non-discard IO. 5. The non-discard bio is finally locked in a dm_bio_prison_cell in process_bio. The race can result in deadlock, leaving the block layer hanging waiting for completion of a discard bio that never completes, e.g.: INFO: task ruby:15354 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ruby D ffffffff8160f0e0 0 15354 15314 0x00000000 ffff8802fb08bc58 0000000000000082 ffff8802fb08bfd8 0000000000012900 ffff8802fb08a010 0000000000012900 0000000000012900 0000000000012900 ffff8802fb08bfd8 0000000000012900 ffff8803324b9480 ffff88032c6f14c0 Call Trace: [<ffffffff814e5a19>] schedule+0x29/0x70 [<ffffffff814e3d85>] schedule_timeout+0x195/0x220 [<ffffffffa06b9bc1>] ? _dm_request+0x111/0x160 [dm_mod] [<ffffffff814e589e>] wait_for_common+0x11e/0x190 [<ffffffff8107a170>] ? try_to_wake_up+0x2b0/0x2b0 [<ffffffff814e59ed>] wait_for_completion+0x1d/0x20 [<ffffffff81233289>] blkdev_issue_discard+0x219/0x260 [<ffffffff81233e79>] blkdev_ioctl+0x6e9/0x7b0 [<ffffffff8119a65c>] block_ioctl+0x3c/0x40 [<ffffffff8117539c>] do_vfs_ioctl+0x8c/0x340 [<ffffffff8119a547>] ? block_llseek+0x67/0xb0 [<ffffffff811756f1>] sys_ioctl+0xa1/0xb0 [<ffffffff810561f6>] ? sys_rt_sigprocmask+0x86/0xd0 [<ffffffff814ef099>] system_call_fastpath+0x16/0x1b The thinp-test-suite's test_discard_random_sectors reliably hits this deadlock on fast SSD storage. The fix for this race is that the all_io_entry for a bio must be incremented whilst the dm_bio_prison_cell is held for the bio's associated virtual and physical blocks. That cell locking wasn't occurring early enough in thin_bio_map. This patch fixes this. Care is taken to always call the new function inc_all_io_entry() with the relevant cells locked, but they are generally unlocked before calling issue() to try to avoid holding the cells locked across generic_submit_request. Also, now that thin_bio_map may lock bios in a cell, process_bio() is no longer the only thread that will do so. Because of this we must be sure to use cell_defer_except() to release all non-holder entries, that were added by the other thread, because they must be deferred. This patch depends on "dm thin: replace dm_cell_release_singleton with cell_defer_except". Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: stable@vger.kernel.org	2012-12-21 20:23:31 +00:00
Joe Thornber	b7ca9c9273	dm thin: replace dm_cell_release_singleton with cell_defer_except Change existing users of the function dm_cell_release_singleton to share cell_defer_except instead, and then remove the now-unused function. Everywhere that calls dm_cell_release_singleton, the bio in question is the holder of the cell. If there are no non-holder entries in the cell then cell_defer_except behaves exactly like dm_cell_release_singleton. Conversely, if there are non-holder entries then dm_cell_release_singleton must not be used because those entries would need to be deferred. Consequently, it is safe to replace use of dm_cell_release_singleton with cell_defer_except. This patch is a pre-requisite for "dm thin: fix race between simultaneous io and discards to same block". Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:31 +00:00
Mike Snitzer	4f81a41762	dm thin: move bio_prison code to separate module The bio prison code will be useful to other future DM targets so move it to a separate module. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-10-12 21:02:13 +01:00
Mike Snitzer	44feb387f6	dm thin: prepare to separate bio_prison code The bio prison code will be useful to share with future DM targets. Prepare to move this code into a separate module, adding a dm prefix to structures and functions that will be exported. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-10-12 21:02:10 +01:00
Mike Snitzer	28eed34e76	dm thin: support discard with non power of two block size Support discards when the pool's block size is not a power of 2. The block layer assumes discard_granularity is a power of 2 (in blkdev_issue_discard), so we set this to the largest power of 2 that is a divides into the number of sectors in each block, but never less than DATA_DEV_BLOCK_SIZE_MIN_SECTORS. This patch eliminates the "Discard support must be disabled when the block size is not a power of 2" constraint that was imposed in commit `55f2b8b` ("dm thin: support for non power of 2 pool blocksize"). That commit was incomplete: using a block size that is not a power of 2 shouldn't mean disabling discard support on the device completely. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-10-12 21:02:07 +01:00
Mike Snitzer	0424caa145	dm thin: fix discard support for data devices The discard limits that get established for a thin-pool or thin device may be incompatible with the pool's data device. Avoid this by checking the discard limits of the pool's data device. If an incompatibility is found then the pool's 'discard passdown' feature is disabled. Change thin_io_hints to ensure that a thin device always uses the same queue limits as its pool device. Introduce requested_pf to track whether or not the table line originally contained the no_discard_passdown flag and use this directly for table output. We prepare the correct setting for discard_passdown directly in bind_control_target (called from pool_io_hints) and store it in adjusted_pf rather than waiting until we have access to pool->pf in pool_preresume. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-09-26 23:45:47 +01:00
Mike Snitzer	9bc142dd75	dm thin: tidy discard support A little thin discard code refactoring to make the next patch (dm thin: fix discard support for data devices) more readable. Pull out a couple of functions (and uses bools instead of unsigned for features). No functional changes. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-09-26 23:45:46 +01:00
Mike Snitzer	307615a26e	dm thin: do not set discard_zeroes_data The dm thin pool target claims to support the zeroing of discarded data areas. This turns out to be incorrect when processing discards that do not exactly cover a complete number of blocks, so the target must always set discard_zeroes_data_unsupported. The thin pool target will zero blocks when they are allocated if the skip_block_zeroing feature is not specified. The block layer may send a discard that only partly covers a block. If a thin pool block is partially discarded then there is no guarantee that the discarded data will get zeroed before it is accessed again. Due to this, thin devices cannot claim discards will always zero data. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Cc: stable@vger.kernel.org # 3.4+ Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-09-26 23:45:39 +01:00
Alasdair G Kergon	1f4e0ff079	dm thin: commit before gathering status Commit outstanding metadata before returning the status for a dm thin pool so that the numbers reported are as up-to-date as possible. The commit is not performed if the device is suspended or if the DM_NOFLUSH_FLAG is supplied by userspace and passed to the target through a new 'status_flags' parameter in the target's dm_status_fn. The userspace dmsetup tool will support the --noflush flag with the 'dmsetup status' and 'dmsetup wait' commands from version 1.02.76 onwards. Tested-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-27 15:08:16 +01:00
Joe Thornber	e49e582965	dm thin: add read only and fail io modes Add read-only and fail-io modes to thin provisioning. If a transaction commit fails the pool's metadata device will transition to "read-only" mode. If a commit fails once already in read-only mode the transition to "fail-io" mode occurs. Once in fail-io mode the pool and all associated thin devices will report a status of "Fail". Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-27 15:08:16 +01:00
Joe Thornber	4afdd680f7	dm thin: reduce number of metadata commits Reduce the number of metadata commits by using dm_thin_changed_this_transaction to check if metadata was changed on a per thin device granularity. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-27 15:08:14 +01:00
Joe Thornber	66b1edc05e	dm thin metadata: add format option to dm_pool_metadata_open Add a parameter to dm_pool_metadata_open to indicate whether or not an unformatted metadata area should be formatted. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-27 15:08:14 +01:00
Alasdair G Kergon	0ac55489d9	dm: use bool bitfields in struct dm_target Use boolean bit fields for flags in struct dm_target. Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-27 15:08:08 +01:00
Joe Thornber	16ad3d103d	dm thin: set flush_supported The thin provisioning target commits internal metadata on flush. So it should receive flushes regardless of whether the underlying devices support them. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-27 15:08:07 +01:00
Joe Thornber	6004970136	dm thin: avoid unnecessarily breaking sharing for flushes There's no need to break sharing, triggering a copy, for a write that has no data (i.e. a flush). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-27 15:08:06 +01:00
Joe Thornber	905386f82d	dm thin: fix memory leak in process_prepared_mapping error paths Fix memory leak in process_prepared_mapping by always freeing the dm_thin_new_mapping structs from the mapping_pool mempool on the error paths. Signed-off-by: Joe Thornber <ejt@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-27 15:08:05 +01:00
Mikulas Patocka	f9a8e0cd26	dm thin: optimize power of two block size dm-thin will be most likely used with a block size that is a power of two. So it should be optimized for this case. This patch changes division and modulo operations to shifts and bit masks if block size is a power of two. A test that bi_sector is divisible by a block size is removed from io_overlaps_block. Device mapper never sends bios that span a block boundary. Consequently, if we tested that bi_size is equivalent to block size, bi_sector must already be on a block boundary. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-27 15:08:03 +01:00
Mikulas Patocka	4929630901	dm thin: split discards on block boundary This patch sets the variable "ti->split_discard_requests" for the dm thin target so that device mapper core splits discard requests on a block boundary. Consequently, a discard request that spans multiple blocks is never sent to dm-thin. The patch also removes some code in process_discard that deals with discards that span multiple blocks. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-27 15:08:03 +01:00
Mike Snitzer	55f2b8bdb0	dm thin: support for non power of 2 pool blocksize Non power of 2 blocksize support is needed to properly align thinp IO on storage that has non power of 2 optimal IO sizes (e.g. RAID6 10+2). Use sector_div to support non power of 2 blocksize for the pool's data device. This provides comparable performance to the power of 2 math that was performed until now (as tested on modern x86_64 hardware). The kernel currently assumes that limits->discard_granularity is a power of two so the thin target only enables discard support if the block size is a power of two. Eliminate pool structure's 'block_shift', 'offset_mask' and remaining 4 byte holes. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-27 15:08:02 +01:00
Mike Snitzer	542f903814	dm: support non power of two target max_io_len Remove the restriction that limits a target's specified maximum incoming I/O size to be a power of 2. Rename this setting from 'split_io' to the less-ambiguous 'max_io_len'. Change it from sector_t to uint32_t, which is plenty big enough, and introduce a wrapper function dm_set_target_max_io_len() to set it. Use sector_div() to process it now that it is not necessarily a power of 2. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-27 15:08:00 +01:00
Mike Snitzer	f09996c993	dm thin: provide specific errors for two table load failure cases Provide specific error message strings for two pool_ctr() failure cases that currently give just "Unknown error". Reference: test_two_pools_pointing_to_the_same_metadata_fails and test_different_pool_cant_replace_pool in thinp-test-suite. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-27 15:07:59 +01:00
Mike Snitzer	17b7d63f7e	dm thin: clean up compiler warning Clean up "warning: dubious: !x & y". Also make it clear that __snapshotted_since() returns a bool and that dm_thin_lookup_result's 'shared' member is a flag. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-27 15:07:57 +01:00
Alasdair G Kergon	7768ed33cc	dm thin: reduce endio_hook pool size Reduce the slab size used for the dm_thin_endio_hook mempool. Allocation has been seen to fail on machines with smaller amounts of memory due to fragmentation. lvm: page allocation failure. order:5, mode:0xd0 device-mapper: table: 253:38: thin-pool: Error creating pool's endio_hook mempool Cc: stable@vger.kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-27 15:07:57 +01:00
Mikulas Patocka	650d2a06b4	dm thin: do not send discards to shared blocks When process_discard receives a partial discard that doesn't cover a full block, it sends this discard down to that block. Unfortunately, the block can be shared and the discard would corrupt the other snapshots sharing this block. This patch detects block sharing and ends the discard with success when sending it to the shared block. The above change means that if the device supports discard it can't be guaranteed that a discard request zeroes data. Therefore, we set ti->discard_zeroes_data_unsupported. Thin target discard support with this bug arrived in commit `104655fd4d` (dm thin: support discards). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-20 14:25:05 +01:00
Joe Thornber	0d200aefd4	dm thin: commit metadata before creating metadata snapshot Userland sometimes sees a corrupt metadata block if metadata is changing rapidly when a metadata snapshot is reserved for userland, To make the problem go away, commit before we take the metadata snapshot (which is a sensible thing to do anyway). The checksums mean userland spots this corruption immediately so there's no risk of acting on incorrect data. No corruption exists from the kernel's point of view, and thin_check passes after pool shutdown. I believe this is to do with shared blocks at the first level of the {device, mapping} btree. Prior to the metadata-snap support no sharing at this level was possible, so this patch is only required after commit `cc8394d86f` ("dm thin: provide userspace access to pool metadata"). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-07-03 12:55:31 +01:00
Joe Thornber	cc8394d86f	dm thin: provide userspace access to pool metadata This patch implements two new messages that can be sent to the thin pool target allowing it to take a snapshot of the _metadata_. This, read-only snapshot can be accessed by userland, concurrently with the live target. Only one metadata snapshot can be held at a time. The pool's status line will give the block location for the current msnap. Since version 0.1.5 of the userland thin provisioning tools, the thin_dump program displays the msnap as follows: thin_dump -m <msnap root> <metadata dev> Available here: https://github.com/jthornber/thin-provisioning-tools Now that userland can access the metadata we can do various things that have traditionally been kernel side tasks: i) Incremental backups. By using metadata snapshots we can work out what blocks have changed over time. Combined with data snapshots we can ensure the data doesn't change while we back it up. A short proof of concept script can be found here: https://github.com/jthornber/thinp-test-suite/blob/master/incremental_backup_example.rb ii) Migration of thin devices from one pool to another. iii) Merging snapshots back into an external origin. iv) Asyncronous replication. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-06-03 00:30:01 +01:00
Mike Snitzer	a24c25696b	dm thin: use slab mempools Use dedicated caches prefixed with a "dm_" name rather than relying on kmalloc mempools backed by generic slab caches so the memory usage of thin provisioning (and any leaks) can be accounted for independently. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-06-03 00:30:00 +01:00
Mike Snitzer	f402693d06	dm thin: fix table output when pool target disables discard passdown internally When the thin pool target clears the discard_passdown parameter internally, it incorrectly changes the table line reported to userspace. This breaks dumb string comparisons on these table lines in generic userspace device-mapper library code and leads to tables being reloaded repeatedly when nothing is actually meant to be changing. This patch corrects this by no longer changing the table line when discard passdown was disabled. We can still tell when discard passdown is overridden by looking for the message "Discard unsupported by data device (sdX): Disabling discard passdown." This automatic detection is also moved from the 'load' to the 'resume' so that it is re-evaluated should the properties of underlying devices change. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-05-19 01:01:01 +01:00
Alasdair G Kergon	7cab8bf160	dm thin: correct module description Remove duplicate copy of string "device-mapper" (DM_NAME) from MODULE_DESCRIPTION. Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-05-12 01:43:19 +01:00
Mike Snitzer	c3a0ce2eab	dm thin: fix unprotected use of prepared_discards list Fix two places in commit `104655fd4d` ("dm thin: support discards") that didn't use pool->lock to protect against concurrent changes to the prepared_discards list. Without this fix, thin_endio() can race with process_discard(), leading to concurrent list_add()s that result in the processes locking up with an error like the following: WARNING: at lib/list_debug.c:32 __list_add+0x8f/0xa0() ... list_add corruption. next->prev should be prev (ffff880323b96140), but was ffff8801d2c48440. (next=ffff8801d2c485c0). ... Pid: 17205, comm: kworker/u:1 Tainted: G W O 3.4.0-rc3.snitm+ #1 Call Trace: [<ffffffff8103ca1f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8103cb16>] warn_slowpath_fmt+0x46/0x50 [<ffffffffa04f6ce6>] ? bio_detain+0xc6/0x210 [dm_thin_pool] [<ffffffff8124ff3f>] __list_add+0x8f/0xa0 [<ffffffffa04f70d2>] process_discard+0x2a2/0x2d0 [dm_thin_pool] [<ffffffffa04f6a78>] ? remap_and_issue+0x38/0x50 [dm_thin_pool] [<ffffffffa04f7c3b>] process_deferred_bios+0x7b/0x230 [dm_thin_pool] [<ffffffffa04f7df0>] ? process_deferred_bios+0x230/0x230 [dm_thin_pool] [<ffffffffa04f7e42>] do_worker+0x52/0x60 [dm_thin_pool] [<ffffffff81056fa9>] process_one_work+0x129/0x450 [<ffffffff81059b9c>] worker_thread+0x17c/0x3c0 [<ffffffff81059a20>] ? manage_workers+0x120/0x120 [<ffffffff8105eabe>] kthread+0x9e/0xb0 [<ffffffff814ceda4>] kernel_thread_helper+0x4/0x10 [<ffffffff8105ea20>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff814ceda0>] ? gs_change+0x13/0x13 ---[ end trace 7e0a523bc5e52692 ]--- Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-05-12 01:43:16 +01:00
Mike Snitzer	03aaae7cdc	dm thin: reinstate missing mempool_free in cell_release_singleton Fix a significant memory leak inadvertently introduced during simplification of cell_release_singleton() in commit `6f94a4c45a` ("dm thin: fix stacked bi_next usage"). A cell's hlist_del() must be accompanied by a mempool_free(). Use __cell_release() to do this, like before. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-05-12 01:43:12 +01:00
Joe Thornber	67e2e2b281	dm thin: add pool target flags to control discard Add dm thin target arguments to control discard support. ignore_discard: Disables discard support no_discard_passdown: Don't pass discards down to the underlying data device, but just remove the mapping within the thin provisioning target. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-03-28 18:41:29 +01:00
Joe Thornber	104655fd4d	dm thin: support discards Support discards in the thin target. On discard the corresponding mapping(s) are removed from the thin device. If the associated block(s) are no longer shared the discard is passed to the underlying device. All bios other than discards now have an associated deferred_entry that is saved to the 'all_io_entry' in endio_hook. When non-discard IO completes and associated mappings are quiesced any discards that were deferred, via ds_add_work() in process_discard(), will be queued for processing by the worker thread. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> drivers/md/dm-thin.c \| 173 ++++++++++++++++++++++++++++++++++++++++++++++---- drivers/md/dm-thin.c \| 172 ++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 158 insertions(+), 14 deletions(-)	2012-03-28 18:41:28 +01:00
Joe Thornber	eb2aa48d4e	dm thin: prepare to support discard This patch contains the ground work needed for dm-thin to support discard. - Adds endio function that replaces shared_read_endio. - Introduce an explicit 'quiesced' flag into the new_mapping structure. Before, this was implicitly indicated by m->list being empty. - The map_info->ptr remains constant for the duration of a bio's trip through the thin target. Make it easier to reason about it. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-03-28 18:41:28 +01:00
Alasdair G Kergon	6efd6e8309	dm thin: use dm_target_offset Use dm_target_offset wrapper instead of referencing the awkward ti->begin explicitly. Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-03-28 18:41:28 +01:00
Joe Thornber	2dd9c257fb	dm thin: support read only external snapshot origins Support the use of an external _read only_ device as an origin for a thin device. Any read to an unprovisioned area of the thin device will be passed through to the origin. Writes trigger allocation of new blocks as usual. One possible use case for this would be VM hosts that want to run guests on thinly-provisioned volumes but have the base image on another device (possibly shared between many VMs). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-03-28 18:41:28 +01:00
Mike Snitzer	c4a69ecdb4	dm thin: relax hard limit on the maximum size of a metadata device The thin metadata format can only make use of a device that is <= THIN_METADATA_MAX_SECTORS (currently 15.9375 GB). Therefore, there is no practical benefit to using a larger device. However, it may be that other factors impose a certain granularity for the space that is allocated to a device (E.g. lvm2 can impose a coarse granularity through the use of large, >= 1 GB, physical extents). Rather than reject a larger metadata device, during thin-pool device construction, switch to allowing it but issue a warning if a device larger than THIN_METADATA_MAX_SECTORS_WARNING (16 GB) is provided. Any space over 15.9375 GB will not be used. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-03-28 18:41:28 +01:00
Joe Thornber	905e51b39a	dm thin: commit outstanding data every second Commit unwritten data every second to prevent too much building up. Released blocks don't become available until after the next commit (for crash resilience). Prior to this patch commits were only triggered by a message to the target or a REQ_{FLUSH,FUA} bio. This allowed far too big a position to build up. The interval is hard-coded to 1 second. This is a sensible setting. I'm not making this user configurable, since there isn't much to be gained by tweaking this - and a lot lost by setting it far too high. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-03-28 18:41:27 +01:00
Joe Thornber	fe878f34df	dm thin: correct comments Remove documentation for unimplemented 'trim' message. I'd planned a 'trim' target message for shrinking thin devices, but this is better handled via the discard ioctl. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-03-28 18:41:24 +01:00
Joe Thornber	6f94a4c45a	dm thin: fix stacked bi_next usage Avoid using the bi_next field for the holder of a cell when deferring bios because a stacked device below might change it. Store the holder in a new field in struct cell instead. When a cell is created, the bio that triggered creation (the holder) was added to the same bio list as subsequent bios. In some cases we pass this holder bio directly to devices underneath. If those devices use the bi_next field there will be trouble... This also simplifies some code that had to work out which bio was the holder. Signed-off-by: Joe Thornber <ejt@redhat.com> Cc: stable@kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-03-28 18:41:23 +01:00
Joe Thornber	991d9fa02d	dm: add thin provisioning target Initial EXPERIMENTAL implementation of device-mapper thin provisioning with snapshot support. The 'thin' target is used to create instances of the virtual devices that are hosted in the 'thin-pool' target. The thin-pool target provides data sharing among devices. This sharing is made possible using the persistent-data library in the previous patch. The main highlight of this implementation, compared to the previous implementation of snapshots, is that it allows many virtual devices to be stored on the same data volume, simplifying administration and allowing sharing of data between volumes (thus reducing disk usage). Another big feature is support for arbitrary depth of recursive snapshots (snapshots of snapshots of snapshots ...). The previous implementation of snapshots did this by chaining together lookup tables, and so performance was O(depth). This new implementation uses a single data structure so we don't get this degradation with depth. For further information and examples of how to use this, please read Documentation/device-mapper/thin-provisioning.txt Signed-off-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2011-10-31 20:21:18 +00:00

1 2 3 4

193 Commits