linux/drivers
Xiao Ni 5a409b4f56 MD: fix lock contention for flush bios
There is a lock contention when there are many processes which send flush bios
to md device. eg. Create many lvs on one raid device and mkfs.xfs on each lv.

Now it just can handle flush request sequentially. It needs to wait mddev->flush_bio
to be NULL, otherwise get mddev->lock.

This patch remove mddev->flush_bio and handle flush bio asynchronously.
I did a test with command dbench -s 128 -t 300. This is the test result:

=================Without the patch============================
 Operation                Count    AvgLat    MaxLat
 --------------------------------------------------
 Flush                    11165   167.595  5879.560
 Close                   107469     1.391  2231.094
 LockX                      384     0.003     0.019
 Rename                    5944     2.141  1856.001
 ReadX                   208121     0.003     0.074
 WriteX                   98259  1925.402 15204.895
 Unlink                   25198    13.264  3457.268
 UnlockX                    384     0.001     0.009
 FIND_FIRST               47111     0.012     0.076
 SET_FILE_INFORMATION     12966     0.007     0.065
 QUERY_FILE_INFORMATION   27921     0.004     0.085
 QUERY_PATH_INFORMATION  124650     0.005     5.766
 QUERY_FS_INFORMATION     22519     0.003     0.053
 NTCreateX               141086     4.291  2502.812

Throughput 3.7181 MB/sec (sync open)  128 clients  128 procs  max_latency=15204.905 ms

=================With the patch============================
 Operation                Count    AvgLat    MaxLat
 --------------------------------------------------
 Flush                     4500   174.134   406.398
 Close                    48195     0.060   467.062
 LockX                      256     0.003     0.029
 Rename                    2324     0.026     0.360
 ReadX                    78846     0.004     0.504
 WriteX                   66832   562.775  1467.037
 Unlink                    5516     3.665  1141.740
 UnlockX                    256     0.002     0.019
 FIND_FIRST               16428     0.015     0.313
 SET_FILE_INFORMATION      6400     0.009     0.520
 QUERY_FILE_INFORMATION   17865     0.003     0.089
 QUERY_PATH_INFORMATION   47060     0.078   416.299
 QUERY_FS_INFORMATION      7024     0.004     0.032
 NTCreateX                55921     0.854  1141.452

Throughput 11.744 MB/sec (sync open)  128 clients  128 procs  max_latency=1467.041 ms

The test is done on raid1 disk with two rotational disks

V5: V4 is more complicated than the version with memory pool. So revert to the memory pool
version

V4: use address of fbio to do hash to choose free flush info.
V3:
Shaohua suggests mempool is overkill. In v3 it allocs memory during creating raid device
and uses a simple bitmap to record which resource is free.

Fix a bug from v2. It should set flush_pending to 1 at first.

V2:
Neil pointed out two problems. One is counting error problem and another is return value
when allocat memory fails.
1. counting error problem
This isn't safe.  It is only safe to call rdev_dec_pending() on rdevs
that you previously called
                          atomic_inc(&rdev->nr_pending);
If an rdev was added to the list between the start and end of the flush,
this will do something bad.

Now it doesn't use bio_chain. It uses specified call back function for each
flush bio.
2. Returned on IO error when kmalloc fails is wrong.
I use mempool suggested by Neil in V2
3. Fixed some places pointed by Guoqing

Suggested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2018-05-21 09:30:26 -07:00
..
accessibility
acpi xen: fixes for 4.17-rc1 2018-04-12 11:04:35 -07:00
amba
android
ata Merge branch 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata 2018-04-03 17:42:25 -07:00
atm atm: iphase: fix spelling mistake: "Tansmit" -> "Transmit" 2018-04-19 13:41:49 -04:00
auxdisplay
base mm: check __highest_present_section_nr directly in memory_dev_init() 2018-04-11 10:28:31 -07:00
bcma
block rbd: notrim map option 2018-04-16 09:38:40 +02:00
bluetooth Bluetooth: Set HCI_QUIRK_SIMULTANEOUS_DISCOVERY for BTUSB_QCA_ROME 2018-04-01 21:43:02 +03:00
bus pci-v4.17-changes 2018-04-06 18:31:06 -07:00
cdrom
char RTC for 4.17 2018-04-10 10:22:27 -07:00
clk The large diff this time around is from the addition of a new clk driver 2018-04-13 15:51:06 -07:00
clocksource Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-04-16 12:44:03 -07:00
connector
cpufreq cpufreq: Drop cpufreq_table_validate_and_show() 2018-04-10 08:40:45 +02:00
cpuidle cpuidle: menu: Avoid selecting shallow states with stopped tick 2018-04-09 11:54:57 +02:00
crypto .gitignore: move *-asn1.[ch] patterns to the top-level .gitignore 2018-04-07 19:04:02 +09:00
dax libnvdimm for 4.17 2018-04-10 10:25:57 -07:00
dca
devfreq
dio
dma DMAengine updates for v4.17-rc1 2018-04-10 12:14:37 -07:00
dma-buf
edac * Add NVDIMM support to EDAC (Tony Luck) 2018-04-05 14:21:13 -07:00
eisa
extcon Char/Misc patches for 4.17-rc1 2018-04-04 20:07:20 -07:00
firewire
firmware Merge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging 2018-04-13 16:32:16 -07:00
fmc treewide: Fix typos in printk 2018-03-27 09:51:22 +02:00
fpga
fsi
gpio DeviceTree updates for 4.17: 2018-04-05 21:03:42 -07:00
gpu amdgpu, omap and snd regression fix 2018-04-12 20:56:10 -07:00
hid HID: i2c-hid: fix inverted return value from i2c_hid_command() 2018-04-19 09:25:15 +02:00
hsi
hv ARM: 2018-04-09 11:42:31 -07:00
hwmon hwmon updates for v4.17 2018-04-09 19:59:54 -07:00
hwspinlock
hwtracing Char/Misc patches for 4.17-rc1 2018-04-04 20:07:20 -07:00
i2c i2c: add param sanity check to i2c_transfer() 2018-04-11 23:33:46 +02:00
ide for-4.17/block-20180402 2018-04-05 14:27:02 -07:00
idle
iio This is the bulk of GPIO changes for the v4.17 kernel cycle: 2018-04-05 09:51:41 -07:00
infiniband Merge candidates for 4.17 merge window 2018-04-06 17:35:43 -07:00
input Changes to chrome-platform for v4.17 2018-04-13 16:20:36 -07:00
iommu IOMMU Updates for Linux v4.17 2018-04-11 18:50:41 -07:00
ipack
irqchip IOMMU Updates for Linux v4.17 2018-04-11 18:50:41 -07:00
isdn mISDN: Remove VLAs 2018-04-12 21:46:10 -04:00
leds
lightnvm lightnvm: pblk: remove some unnecessary NULL checks 2018-03-29 17:29:09 -06:00
macintosh powerpc updates for 4.17 2018-04-07 12:08:19 -07:00
mailbox
mcb
md MD: fix lock contention for flush bios 2018-05-21 09:30:26 -07:00
media remoteproc updates for v4.17 2018-04-10 12:09:27 -07:00
memory
memstick
message
mfd platform/chrome: mfd/cros_ec_dev: Add sysfs entry to set keyboard wake lid angle 2018-04-10 22:25:07 -07:00
misc * Fix 2032 time access issues and new compiler warnings 2018-04-12 10:21:19 -07:00
mmc MMC host: 2018-04-20 10:41:31 -07:00
mtd This pull request contains updates for both UBI and UBIFS: 2018-04-11 16:39:34 -07:00
mux
net bnxt_en: Fix memory fault in bnxt_ethtool_init() 2018-04-19 16:35:09 -04:00
nfc
ntb
nubus
nvdimm libnvdimm for 4.17 2018-04-10 10:25:57 -07:00
nvme nvme: expand nvmf_check_if_ready checks 2018-04-12 09:58:27 -06:00
nvmem Char/Misc patches for 4.17-rc1 2018-04-04 20:07:20 -07:00
of Kbuild updates for v4.17 (2nd) 2018-04-15 17:21:30 -07:00
opp
oprofile oprofilefs: don't oops on allocation failure 2018-03-29 15:07:48 -04:00
parisc parisc/pci: Switch LBA PCI bus from Hard Fail to Soft Fail mode 2018-03-27 18:52:22 +02:00
parport Char/Misc patches for 4.17-rc1 2018-04-04 20:07:20 -07:00
pci PCI: Remove messages about reassigning resources 2018-04-11 08:46:50 -05:00
pcmcia Merge branch 'for-linus-sa1100' of git://git.armlinux.org.uk/~rmk/linux-arm 2018-04-09 09:26:36 -07:00
perf ARM: SoC driver updates for 4.17 2018-04-05 21:29:35 -07:00
phy ARM: SoC platform updates for 4.17 2018-04-05 21:21:08 -07:00
pinctrl This is the bulk of GPIO changes for the v4.17 kernel cycle: 2018-04-05 09:51:41 -07:00
platform Changes to chrome-platform for v4.17 2018-04-13 16:20:36 -07:00
pnp
power ARM: SoC platform updates for 4.17 2018-04-05 21:21:08 -07:00
powercap
pps
ps3
ptp
pwm pwm: Changes for v4.17-rc1 2018-04-13 15:46:21 -07:00
rapidio rapidio: use a reference count for struct mport_dma_req 2018-04-11 10:28:37 -07:00
ras
regulator Merge remote-tracking branches 'regulator/topic/88pg86x', 'regulator/topic/dt', 'regulator/topic/formatting' and 'regulator/topic/gpio' into regulator-next 2018-03-28 10:33:53 +08:00
remoteproc remoteproc: fix null pointer dereference on glink only platforms 2018-04-05 22:53:16 -07:00
reset Merge branch 'reset/lookup' into reset/next 2018-03-27 11:03:43 +02:00
rpmsg rpmsg: smd: Use announce_create to process any receive work 2018-03-27 21:54:37 -07:00
rtc RTC for 4.17 2018-04-10 10:22:27 -07:00
s390 s390: remove couple of duplicate includes 2018-04-16 09:10:24 +02:00
sbus sparc64: Properly range check DAX completion index 2018-04-01 20:07:00 -04:00
scsi SCSI fixes on 20180415 2018-04-15 17:24:12 -07:00
sfi
sh
siox
slimbus
sn
soc The large diff this time around is from the addition of a new clk driver 2018-04-13 15:51:06 -07:00
soundwire
spi spi: SPI updates for v4.17 2018-04-03 12:06:21 -07:00
spmi
ssb
staging page cache: use xa_lock 2018-04-11 10:28:39 -07:00
target for-4.17/block-20180402 2018-04-05 14:27:02 -07:00
tc
tee
thermal Merge branches 'thermal-core' and 'thermal-soc' into next 2018-04-13 14:11:53 +08:00
thunderbolt
tty Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2018-04-09 09:04:10 -07:00
uio
usb Merge branch 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2018-04-07 11:11:41 -07:00
uwb
vfio VFIO updates for v4.17-rc1 2018-04-06 19:44:27 -07:00
vhost vhost: return bool from *_access_ok() functions 2018-04-11 10:54:06 -04:00
video fbdev changes for v4.17: 2018-04-10 10:20:00 -07:00
virt
virtio virtio: feature 2018-04-11 18:58:27 -07:00
visorbus
vlynq
vme
w1
watchdog linux-watchdog 4.17-rc1 merge window tag 2018-04-13 15:43:50 -07:00
xen xen: fixes and one header update for 4.17-rc2 2018-04-20 08:36:04 -07:00
zorro
Kconfig hwtracing: Add HW tracing support menu 2018-03-29 13:38:10 +03:00
Makefile