linux/drivers/md
Mike Snitzer 0f640dca08 dm thin: fix queue limits stacking
thin_io_hints() is blindly copying the queue limits from the thin-pool
which can lead to incorrect limits being set.  The fix here simply
deletes the thin_io_hints() hook which leaves the existing stacking
infrastructure to set the limits correctly.

When a thin-pool uses an MD device for the data device a thin device
from the thin-pool must respect MD's constraints about disallowing a bio
from spanning multiple chunks.  Otherwise we can see problems.  If the raid0
chunksize is 1152K and thin-pool chunksize is 256K I see the following
md/raid0 error (with extra debug tracing added to thin_endio) when
mkfs.xfs is executed against the thin device:

md/raid0:md99: make_request bug: can't convert block across chunks or bigger than 1152k 6688 127
device-mapper: thin: bio sector=2080 err=-5 bi_size=130560 bi_rw=17 bi_vcnt=32 bi_idx=0

This extra DM debugging shows that the failing bio is spanning across
the first and second logical 1152K chunk (sector 2080 + 255 takes the
bio beyond the first chunk's boundary of sector 2304).  So the bio
splitting that DM is doing clearly isn't respecting the MD limits.

max_hw_sectors_kb is 127 for both the thin-pool and thin device
(queue_max_hw_sectors returns 255 so we'll excuse sysfs's lack of
precision).  So this explains why bi_size is 130560.

But the thin device's max_hw_sectors_kb should be 4 (PAGE_SIZE) given
that it doesn't have a .merge function (for bio_add_page to consult
indirectly via dm_merge_bvec) yet the thin-pool does sit above an MD
device that has a compulsory merge_bvec_fn.  This scenario is exactly
why DM must resort to sending single PAGE_SIZE bios to the underlying
layer. Some additional context for this is available in the header for
commit 8cbeb67a ("dm: avoid unsupported spanning of md stripe boundaries").

Long story short, the reason a thin device doesn't properly get
configured to have a max_hw_sectors_kb of 4 (PAGE_SIZE) is that
thin_io_hints() is blindly copying the queue limits from the thin-pool
device directly to the thin device's queue limits.

Fix this by eliminating thin_io_hints.  Doing so is safe because the
block layer's queue limits stacking already enables the upper level thin
device to inherit the thin-pool device's discard and minimum_io_size and
optimal_io_size limits that get set in pool_io_hints.  But avoiding the
queue limits copy allows the thin and thin-pool limits to be different
where it is important, namely max_hw_sectors_kb.

Reported-by: Daniel Browning <db@kavod.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-01-31 14:11:14 +00:00
..
persistent-data Miscellaneous device-mapper fixes, cleanups and performance improvements. 2012-12-21 17:08:06 -08:00
bitmap.c md/bitmap:Don't use IS_ERR to judge alloc_page(). 2012-10-11 13:45:36 +11:00
bitmap.h md/bitmap: record the space available for the bitmap in the superblock. 2012-05-22 13:55:34 +10:00
dm-bio-prison.c dm thin: replace dm_cell_release_singleton with cell_defer_except 2012-12-21 20:23:31 +00:00
dm-bio-prison.h dm thin: replace dm_cell_release_singleton with cell_defer_except 2012-12-21 20:23:31 +00:00
dm-bio-record.h dm: preserve bi_io_vec when resubmitting bios 2009-04-02 19:55:23 +01:00
dm-bufio.c dm: use ACCESS_ONCE for sysfs values 2012-10-12 16:59:46 +01:00
dm-bufio.h dm bufio: prefetch 2012-03-28 18:41:29 +01:00
dm-crypt.c dm: remove map_info 2012-12-21 20:23:41 +00:00
dm-delay.c dm: remove map_info 2012-12-21 20:23:41 +00:00
dm-exception-store.c dm: replace simple_strtoul 2012-07-27 15:07:59 +01:00
dm-exception-store.h dm snapshot: test chunk size against both origin and snapshot 2010-08-12 04:13:51 +01:00
dm-flakey.c dm: remove map_info 2012-12-21 20:23:41 +00:00
dm-io.c dm kcopyd: add WRITE SAME support to dm_kcopyd_zero 2012-12-21 20:23:37 +00:00
dm-ioctl.c dm ioctl: use kmalloc if possible 2012-12-21 20:23:36 +00:00
dm-kcopyd.c dm kcopyd: add WRITE SAME support to dm_kcopyd_zero 2012-12-21 20:23:37 +00:00
dm-linear.c dm: remove map_info 2012-12-21 20:23:41 +00:00
dm-log-userspace-base.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
dm-log-userspace-transfer.c connector/userns: replace netlink uses of cap_raised() with capable() 2012-05-10 23:21:39 -04:00
dm-log-userspace-transfer.h dm log: userspace add luid to distinguish between concurrent log instances 2009-09-04 20:40:34 +01:00
dm-log.c dm: use memweight() 2012-07-30 17:25:16 -07:00
dm-mpath.c dm mpath: fix check for null mpio in end_io fn 2012-10-12 16:59:42 +01:00
dm-mpath.h dm mpath: remove is_active from struct dm_path 2008-10-10 13:36:58 +01:00
dm-path-selector.c md: Add module.h to all files using it implicitly 2011-10-31 19:31:18 -04:00
dm-path-selector.h dm mpath: add start_io and nr_bytes to path selectors 2009-06-22 10:12:27 +01:00
dm-queue-length.c dm: reject trailing characters in sccanf input 2012-03-28 18:41:26 +01:00
dm-raid1.c dm: remove map_info 2012-12-21 20:23:41 +00:00
dm-raid.c dm: remove map_info 2012-12-21 20:23:41 +00:00
dm-region-hash.c dm raid1: fix crash with mirror recovery and discard 2012-07-20 14:25:03 +01:00
dm-round-robin.c dm: reject trailing characters in sccanf input 2012-03-28 18:41:26 +01:00
dm-service-time.c dm: reject trailing characters in sccanf input 2012-03-28 18:41:26 +01:00
dm-snap-persistent.c md: Add in export.h for files using EXPORT_SYMBOL 2011-10-31 19:31:19 -04:00
dm-snap-transient.c md: Add in export.h for files using EXPORT_SYMBOL 2011-10-31 19:31:19 -04:00
dm-snap.c dm: remove map_info 2012-12-21 20:23:41 +00:00
dm-stripe.c dm stripe: add WRITE SAME support 2012-12-21 20:23:41 +00:00
dm-sysfs.c Driver core: Constify struct sysfs_ops in struct kobj_type 2010-03-07 17:04:49 -08:00
dm-table.c dm: introduce per_bio_data 2012-12-21 20:23:38 +00:00
dm-target.c dm: remove map_info 2012-12-21 20:23:41 +00:00
dm-thin-metadata.c dm persistent data: fix nested btree deletion 2012-12-21 20:23:32 +00:00
dm-thin-metadata.h dm thin metadata: introduce dm_pool_abort_metadata 2012-07-27 15:08:15 +01:00
dm-thin.c dm thin: fix queue limits stacking 2013-01-31 14:11:14 +00:00
dm-uevent.c md: Add in export.h for files using EXPORT_SYMBOL 2011-10-31 19:31:19 -04:00
dm-uevent.h
dm-verity.c dm: remove map_info 2012-12-21 20:23:41 +00:00
dm-zero.c dm: remove map_info 2012-12-21 20:23:41 +00:00
dm.c dm: remove map_info 2012-12-21 20:23:41 +00:00
dm.h dm: introduce per_bio_data 2012-12-21 20:23:38 +00:00
faulty.c md faulty: use disk_stack_limits() 2012-10-22 10:44:55 +11:00
Kconfig dm thin: move bio_prison code to separate module 2012-10-12 21:02:13 +01:00
linear.c md: linear supports TRIM 2012-10-11 13:08:44 +11:00
linear.h md/linear: typedef removal: linear_conf_t -> struct linear_conf 2011-10-11 16:48:54 +11:00
Makefile dm thin: move bio_prison code to separate module 2012-10-12 21:02:13 +01:00
md.c md update for 3.8 2012-12-18 09:32:44 -08:00
md.h md update for 3.8 2012-12-18 09:32:44 -08:00
multipath.c MD: change the parameter of md thread 2012-10-11 13:34:00 +11:00
multipath.h md/multipath: typedef removal: multipath_conf_t -> struct mpconf 2011-10-11 16:48:57 +11:00
raid0.c md: raid 0 supports TRIM 2012-10-11 13:25:44 +11:00
raid0.h md: add proper merge_bvec handling to RAID0 and Linear. 2012-03-19 12:46:39 +11:00
raid1.c Merge branch 'for-3.8/drivers' of git://git.kernel.dk/linux-block 2012-12-17 13:39:11 -08:00
raid1.h md/raid1: prevent merging too large request 2012-07-31 10:03:53 +10:00
raid5.c md update for 3.8 2012-12-18 09:32:44 -08:00
raid5.h MD: raid5 trim support 2012-10-11 13:49:05 +11:00
raid10.c Merge branch 'for-3.8/drivers' of git://git.kernel.dk/linux-block 2012-12-17 13:39:11 -08:00
raid10.h md/raid10: fix problem with on-stack allocation of r10bio structure. 2012-08-18 09:51:42 +10:00