linux/drivers/md
Xiao Ni 5a409b4f56 MD: fix lock contention for flush bios
There is a lock contention when there are many processes which send flush bios
to md device. eg. Create many lvs on one raid device and mkfs.xfs on each lv.

Now it just can handle flush request sequentially. It needs to wait mddev->flush_bio
to be NULL, otherwise get mddev->lock.

This patch remove mddev->flush_bio and handle flush bio asynchronously.
I did a test with command dbench -s 128 -t 300. This is the test result:

=================Without the patch============================
 Operation                Count    AvgLat    MaxLat
 --------------------------------------------------
 Flush                    11165   167.595  5879.560
 Close                   107469     1.391  2231.094
 LockX                      384     0.003     0.019
 Rename                    5944     2.141  1856.001
 ReadX                   208121     0.003     0.074
 WriteX                   98259  1925.402 15204.895
 Unlink                   25198    13.264  3457.268
 UnlockX                    384     0.001     0.009
 FIND_FIRST               47111     0.012     0.076
 SET_FILE_INFORMATION     12966     0.007     0.065
 QUERY_FILE_INFORMATION   27921     0.004     0.085
 QUERY_PATH_INFORMATION  124650     0.005     5.766
 QUERY_FS_INFORMATION     22519     0.003     0.053
 NTCreateX               141086     4.291  2502.812

Throughput 3.7181 MB/sec (sync open)  128 clients  128 procs  max_latency=15204.905 ms

=================With the patch============================
 Operation                Count    AvgLat    MaxLat
 --------------------------------------------------
 Flush                     4500   174.134   406.398
 Close                    48195     0.060   467.062
 LockX                      256     0.003     0.029
 Rename                    2324     0.026     0.360
 ReadX                    78846     0.004     0.504
 WriteX                   66832   562.775  1467.037
 Unlink                    5516     3.665  1141.740
 UnlockX                    256     0.002     0.019
 FIND_FIRST               16428     0.015     0.313
 SET_FILE_INFORMATION      6400     0.009     0.520
 QUERY_FILE_INFORMATION   17865     0.003     0.089
 QUERY_PATH_INFORMATION   47060     0.078   416.299
 QUERY_FS_INFORMATION      7024     0.004     0.032
 NTCreateX                55921     0.854  1141.452

Throughput 11.744 MB/sec (sync open)  128 clients  128 procs  max_latency=1467.041 ms

The test is done on raid1 disk with two rotational disks

V5: V4 is more complicated than the version with memory pool. So revert to the memory pool
version

V4: use address of fbio to do hash to choose free flush info.
V3:
Shaohua suggests mempool is overkill. In v3 it allocs memory during creating raid device
and uses a simple bitmap to record which resource is free.

Fix a bug from v2. It should set flush_pending to 1 at first.

V2:
Neil pointed out two problems. One is counting error problem and another is return value
when allocat memory fails.
1. counting error problem
This isn't safe.  It is only safe to call rdev_dec_pending() on rdevs
that you previously called
                          atomic_inc(&rdev->nr_pending);
If an rdev was added to the list between the start and end of the flush,
this will do something bad.

Now it doesn't use bio_chain. It uses specified call back function for each
flush bio.
2. Returned on IO error when kmalloc fails is wrong.
I use mempool suggested by Neil in V2
3. Fixed some places pointed by Guoqing

Suggested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2018-05-21 09:30:26 -07:00
..
bcache for-4.17/block-20180402 2018-04-05 14:27:02 -07:00
persistent-data dm bufio: move dm-bufio.h to include/linux/ 2018-04-03 15:04:23 -04:00
dm-bio-prison-v1.c dm bio prison: use rb_entry() rather than container_of() 2017-06-19 11:03:50 -04:00
dm-bio-prison-v1.h block: switch bios to blk_status_t 2017-06-09 09:27:32 -06:00
dm-bio-prison-v2.c dm bio prison: use rb_entry() rather than container_of() 2017-06-19 11:03:50 -04:00
dm-bio-prison-v2.h
dm-bio-record.h block: replace bi_bdev with a gendisk pointer and partitions index 2017-08-23 12:49:55 -06:00
dm-bufio.c dm bufio: don't embed a bio in the dm_buffer structure 2018-04-03 15:04:29 -04:00
dm-builtin.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dm-cache-background-tracker.c dm cache background tracker: limit amount of background work that may be issued at once 2017-11-10 15:45:03 -05:00
dm-cache-background-tracker.h
dm-cache-block-types.h
dm-cache-metadata.c dm cache: convert dm_cache_metadata.ref_count from atomic_t to refcount_t 2017-10-24 15:09:51 -04:00
dm-cache-metadata.h
dm-cache-policy-internal.h
dm-cache-policy-smq.c dm cache policy smq: allocate cache blocks in order 2017-11-10 15:45:05 -05:00
dm-cache-policy.c
dm-cache-policy.h
dm-cache-target.c dm: allow targets to return output from messages they are sent 2018-04-03 15:04:10 -04:00
dm-core.h dm: various cleanups to md->queue initialization code 2018-01-29 13:44:55 -05:00
dm-crypt.c dm crypt: limit the number of allocated pages 2018-04-03 15:04:11 -04:00
dm-delay.c dm: backfill missing calls to mutex_destroy() 2018-01-17 09:16:15 -05:00
dm-era-target.c dm: allow targets to return output from messages they are sent 2018-04-03 15:04:10 -04:00
dm-exception-store.c
dm-exception-store.h
dm-flakey.c dm: remove fmode_t argument from .prepare_ioctl hook 2018-04-04 12:12:39 -04:00
dm-integrity.c dm bufio: move dm-bufio.h to include/linux/ 2018-04-03 15:04:23 -04:00
dm-io.c dm io: remove BIOSET_NEED_RESCUER flag from bios bioset 2017-12-13 12:15:56 -05:00
dm-ioctl.c dm: allow targets to return output from messages they are sent 2018-04-03 15:04:10 -04:00
dm-kcopyd.c dm: backfill missing calls to mutex_destroy() 2018-01-17 09:16:15 -05:00
dm-linear.c libnvdimm for 4.17 2018-04-10 10:25:57 -07:00
dm-log-userspace-base.c
dm-log-userspace-transfer.c
dm-log-userspace-transfer.h
dm-log-writes.c libnvdimm for 4.17 2018-04-10 10:25:57 -07:00
dm-log.c
dm-mpath.c dm: remove fmode_t argument from .prepare_ioctl hook 2018-04-04 12:12:39 -04:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h
dm-queue-length.c dm mpath selector: more evenly distribute ties 2018-01-29 13:44:58 -05:00
dm-raid1.c md: Convert timers to use timer_setup() 2017-11-14 20:11:57 -07:00
dm-raid.c dm raid: fix parse_raid_params() variable range issue 2018-04-04 12:12:37 -04:00
dm-region-hash.c
dm-round-robin.c
dm-rq.c for-linus-20180204 2018-02-04 11:16:35 -08:00
dm-rq.h dm rq: do not update rq partially in each ending bio 2017-08-28 10:23:28 -04:00
dm-service-time.c dm mpath selector: more evenly distribute ties 2018-01-29 13:44:58 -05:00
dm-snap-persistent.c dm bufio: move dm-bufio.h to include/linux/ 2018-04-03 15:04:23 -04:00
dm-snap-transient.c
dm-snap.c dm snapshot: use mutex instead of rw_semaphore 2018-01-17 09:16:14 -05:00
dm-stats.c dm: backfill missing calls to mutex_destroy() 2018-01-17 09:16:15 -05:00
dm-stats.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dm-stripe.c libnvdimm for 4.17 2018-04-10 10:25:57 -07:00
dm-switch.c dm: remove fmode_t argument from .prepare_ioctl hook 2018-04-04 12:12:39 -04:00
dm-sysfs.c
dm-table.c - DM core passthrough ioctl fix to retain reference to DM table, and 2018-04-06 11:50:19 -07:00
dm-target.c dm: remove unused macro DM_MOD_NAME_SIZE 2018-04-03 15:04:15 -04:00
dm-thin-metadata.c dm thin metadata: THIN_MAX_CONCURRENT_LOCKS should be 6 2018-01-17 09:07:54 -05:00
dm-thin-metadata.h
dm-thin.c dm: allow targets to return output from messages they are sent 2018-04-03 15:04:10 -04:00
dm-uevent.c
dm-uevent.h
dm-unstripe.c dm unstripe: remove unnecessary header includes 2018-04-03 15:04:15 -04:00
dm-verity-fec.c dm verity fec: fix GFP flags used with mempool_alloc() 2017-07-26 15:55:44 -04:00
dm-verity-fec.h
dm-verity-target.c dm: remove fmode_t argument from .prepare_ioctl hook 2018-04-04 12:12:39 -04:00
dm-verity.h dm verity: add 'check_at_most_once' option to only validate hashes once 2018-04-03 15:04:29 -04:00
dm-zero.c dm: don't return errnos from ->map 2017-06-09 09:27:32 -06:00
dm-zoned-metadata.c dm: backfill missing calls to mutex_destroy() 2018-01-17 09:16:15 -05:00
dm-zoned-reclaim.c dm zoned: use GFP_NOIO in I/O path 2017-07-26 15:55:43 -04:00
dm-zoned-target.c dm: remove fmode_t argument from .prepare_ioctl hook 2018-04-04 12:12:39 -04:00
dm-zoned.h dm zoned: drive-managed zoned block device target 2017-06-19 11:05:20 -04:00
dm.c libnvdimm for 4.17 2018-04-10 10:25:57 -07:00
dm.h dm: move dm_table_destroy() to same header as dm_table_create() 2018-01-17 09:16:06 -05:00
Kconfig dax, dm: allow device-mapper to operate without dax support 2018-04-03 05:41:19 -07:00
Makefile dm: add unstriped target 2018-01-17 09:16:00 -05:00
md-bitmap.c Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2017-11-14 16:07:26 -08:00
md-bitmap.h Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2017-11-14 16:07:26 -08:00
md-cluster.c md-cluster: update document for raid10 2017-11-01 21:32:25 -07:00
md-cluster.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
md-faulty.c md: rename some drivers/md/ files to have an "md-" prefix 2017-10-16 19:06:36 -07:00
md-linear.c block: Use blk_queue_flag_*() in drivers instead of queue_flag_*() 2018-03-08 14:13:48 -07:00
md-linear.h Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2017-11-14 16:07:26 -08:00
md-multipath.c md-multipath: Use seq_putc() in multipath_status() 2018-02-17 13:00:35 -08:00
md-multipath.h Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2017-11-14 16:07:26 -08:00
md.c MD: fix lock contention for flush bios 2018-05-21 09:30:26 -07:00
md.h MD: fix lock contention for flush bios 2018-05-21 09:30:26 -07:00
raid0.c block: Use blk_queue_flag_*() in drivers instead of queue_flag_*() 2018-03-08 14:13:48 -07:00
raid0.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
raid1-10.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
raid1.c md/raid1: add error handling of read error from FailFast device 2018-05-17 09:55:59 -07:00
raid1.h md: document lifetime of internal rdev pointer. 2018-02-18 10:22:27 -08:00
raid5-cache.c raid5-ppl: PPL support for disks with write-back cache enabled 2018-01-15 14:29:42 -08:00
raid5-log.h raid5-ppl: fix handling flush requests 2018-02-21 09:40:40 -08:00
raid5-ppl.c raid5-ppl: fix handling flush requests 2018-02-21 09:40:40 -08:00
raid5.c md/raid5: Assigning NULL to sh->batch_head before testing bit R5_Overlap of a stripe 2018-05-17 09:56:00 -07:00
raid5.h raid5: copy write hint from origin bio to stripe 2018-05-17 09:55:58 -07:00
raid10.c raid10: check bio in r10buf_pool_free to void NULL pointer dereference 2018-05-01 09:47:50 -07:00
raid10.h md: document lifetime of internal rdev pointer. 2018-02-18 10:22:27 -08:00