linux/drivers/md
Tang Junhui c4dc2497d5 bcache: fix high CPU occupancy during journal
After long time small writing I/O running, we found the occupancy of CPU
is very high and I/O performance has been reduced by about half:

[root@ceph151 internal]# top
top - 15:51:05 up 1 day,2:43,  4 users,  load average: 16.89, 15.15, 16.53
Tasks: 2063 total,   4 running, 2059 sleeping,   0 stopped,   0 zombie
%Cpu(s):4.3 us, 17.1 sy 0.0 ni, 66.1 id, 12.0 wa,  0.0 hi,  0.5 si,  0.0 st
KiB Mem : 65450044 total, 24586420 free, 38909008 used,  1954616 buff/cache
KiB Swap: 65667068 total, 65667068 free,        0 used. 25136812 avail Mem

  PID USER PR NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 2023 root 20  0       0      0      0 S 55.1  0.0   0:04.42 kworker/11:191
14126 root 20  0       0      0      0 S 42.9  0.0   0:08.72 kworker/10:3
 9292 root 20  0       0      0      0 S 30.4  0.0   1:10.99 kworker/6:1
 8553 ceph 20  0 4242492 1.805g  18804 S 30.0  2.9 410:07.04 ceph-osd
12287 root 20  0       0      0      0 S 26.7  0.0   0:28.13 kworker/7:85
31019 root 20  0       0      0      0 S 26.1  0.0   1:30.79 kworker/22:1
 1787 root 20  0       0      0      0 R 25.7  0.0   5:18.45 kworker/8:7
32169 root 20  0       0      0      0 S 14.5  0.0   1:01.92 kworker/23:1
21476 root 20  0       0      0      0 S 13.9  0.0   0:05.09 kworker/1:54
 2204 root 20  0       0      0      0 S 12.5  0.0   1:25.17 kworker/9:10
16994 root 20  0       0      0      0 S 12.2  0.0   0:06.27 kworker/5:106
15714 root 20  0       0      0      0 R 10.9  0.0   0:01.85 kworker/19:2
 9661 ceph 20  0 4246876 1.731g  18800 S 10.6  2.8 403:00.80 ceph-osd
11460 ceph 20  0 4164692 2.206g  18876 S 10.6  3.5 360:27.19 ceph-osd
 9960 root 20  0       0      0      0 S 10.2  0.0   0:02.75 kworker/2:139
11699 ceph 20  0 4169244 1.920g  18920 S 10.2  3.1 355:23.67 ceph-osd
 6843 ceph 20  0 4197632 1.810g  18900 S  9.6  2.9 380:08.30 ceph-osd

The kernel work consumed a lot of CPU, and I found they are running journal
work, The journal is reclaiming source and flush btree node with surprising
frequency.

Through further analysis, we found that in btree_flush_write(), we try to
get a btree node with the smallest fifo idex to flush by traverse all the
btree nodein c->bucket_hash, after we getting it, since no locker protects
it, this btree node may have been written to cache device by other works,
and if this occurred, we retry to traverse in c->bucket_hash and get
another btree node. When the problem occurrd, the retry times is very high,
and we consume a lot of CPU in looking for a appropriate btree node.

In this patch, we try to record 128 btree nodes with the smallest fifo idex
in heap, and pop one by one when we need to flush btree node. It greatly
reduces the time for the loop to find the appropriate BTREE node, and also
reduce the occupancy of CPU.

[note by mpl: this triggers a checkpatch error because of adjacent,
pre-existing style violations]

Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-07 12:50:01 -07:00
..
bcache bcache: fix high CPU occupancy during journal 2018-02-07 12:50:01 -07:00
persistent-data dm btree: fix serious bug in btree_split_beneath() 2018-01-17 09:07:55 -05:00
dm-bio-prison-v1.c dm bio prison: use rb_entry() rather than container_of() 2017-06-19 11:03:50 -04:00
dm-bio-prison-v1.h block: switch bios to blk_status_t 2017-06-09 09:27:32 -06:00
dm-bio-prison-v2.c dm bio prison: use rb_entry() rather than container_of() 2017-06-19 11:03:50 -04:00
dm-bio-prison-v2.h dm bio prison v2: new interface for the bio prison 2017-03-07 11:30:16 -05:00
dm-bio-record.h block: replace bi_bdev with a gendisk pointer and partitions index 2017-08-23 12:49:55 -06:00
dm-bufio.c dm bufio: eliminate unnecessary labels in dm_bufio_client_create() 2018-01-17 09:16:04 -05:00
dm-bufio.h dm integrity: optimize writing dm-bufio buffers that are partially changed 2017-08-28 11:47:17 -04:00
dm-builtin.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dm-cache-background-tracker.c dm cache background tracker: limit amount of background work that may be issued at once 2017-11-10 15:45:03 -05:00
dm-cache-background-tracker.h dm cache: significant rework to leverage dm-bio-prison-v2 2017-03-07 13:28:31 -05:00
dm-cache-block-types.h linux: drop __bitwise__ everywhere 2016-12-16 00:13:41 +02:00
dm-cache-metadata.c dm cache: convert dm_cache_metadata.ref_count from atomic_t to refcount_t 2017-10-24 15:09:51 -04:00
dm-cache-metadata.h dm cache: significant rework to leverage dm-bio-prison-v2 2017-03-07 13:28:31 -05:00
dm-cache-policy-internal.h dm cache: significant rework to leverage dm-bio-prison-v2 2017-03-07 13:28:31 -05:00
dm-cache-policy-smq.c dm cache policy smq: allocate cache blocks in order 2017-11-10 15:45:05 -05:00
dm-cache-policy.c
dm-cache-policy.h dm cache: significant rework to leverage dm-bio-prison-v2 2017-03-07 13:28:31 -05:00
dm-cache-target.c dm: fix various targets to dm_register_target after module __init resources created 2017-12-04 10:23:10 -05:00
dm-core.h dm: various cleanups to md->queue initialization code 2018-01-29 13:44:55 -05:00
dm-crypt.c - DM core fixes to ensure that bio submission follows a depth-first tree 2018-01-31 11:05:47 -08:00
dm-delay.c dm: backfill missing calls to mutex_destroy() 2018-01-17 09:16:15 -05:00
dm-era-target.c dm: do not set 'discards_supported' in targets that do not need it 2017-11-16 16:33:54 -05:00
dm-exception-store.c
dm-exception-store.h
dm-flakey.c dm flakey: check for null arg_name in parse_features() 2018-01-17 09:16:13 -05:00
dm-integrity.c dm integrity: don't store cipher request on the stack 2018-01-17 09:08:57 -05:00
dm-io.c dm io: remove BIOSET_NEED_RESCUER flag from bios bioset 2017-12-13 12:15:56 -05:00
dm-ioctl.c the rest of drivers/*: annotate ->poll() instances 2017-11-28 11:06:58 -05:00
dm-kcopyd.c dm: backfill missing calls to mutex_destroy() 2018-01-17 09:16:15 -05:00
dm-linear.c - Some request-based DM core and DM multipath fixes and cleanups 2017-09-14 13:43:16 -07:00
dm-log-userspace-base.c
dm-log-userspace-transfer.c
dm-log-userspace-transfer.h
dm-log-writes.c dm log writes: fix max length used for kstrndup 2018-01-17 09:16:16 -05:00
dm-log.c block,fs: use REQ_* flags directly 2016-11-01 09:43:26 -06:00
dm-mpath.c - DM core fixes to ensure that bio submission follows a depth-first tree 2018-01-31 11:05:47 -08:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h
dm-queue-length.c dm mpath selector: more evenly distribute ties 2018-01-29 13:44:58 -05:00
dm-raid1.c md: Convert timers to use timer_setup() 2017-11-14 20:11:57 -07:00
dm-raid.c - DM core fixes to ensure that bio submission follows a depth-first tree 2018-01-31 11:05:47 -08:00
dm-region-hash.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-round-robin.c dm round robin: revert "use percpu 'repeat_count' and 'current_path'" 2017-02-17 00:54:09 -05:00
dm-rq.c for-linus-20180204 2018-02-04 11:16:35 -08:00
dm-rq.h dm rq: do not update rq partially in each ending bio 2017-08-28 10:23:28 -04:00
dm-service-time.c dm mpath selector: more evenly distribute ties 2018-01-29 13:44:58 -05:00
dm-snap-persistent.c dm: make flush bios explicitly sync 2017-05-31 10:50:23 -04:00
dm-snap-transient.c
dm-snap.c dm snapshot: use mutex instead of rw_semaphore 2018-01-17 09:16:14 -05:00
dm-stats.c dm: backfill missing calls to mutex_destroy() 2018-01-17 09:16:15 -05:00
dm-stats.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dm-stripe.c - Some request-based DM core and DM multipath fixes and cleanups 2017-09-14 13:43:16 -07:00
dm-switch.c locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE() 2017-10-25 11:01:08 +02:00
dm-sysfs.c
dm-table.c dm table: fix NVMe bio-based dm_table_determine_type() validation 2018-01-29 13:44:56 -05:00
dm-target.c dm: don't return errnos from ->map 2017-06-09 09:27:32 -06:00
dm-thin-metadata.c dm thin metadata: THIN_MAX_CONCURRENT_LOCKS should be 6 2018-01-17 09:07:54 -05:00
dm-thin-metadata.h dm thin: fix a race condition between discarding and provisioning a block 2016-07-20 12:43:35 -04:00
dm-thin.c dm thin: fix trailing semicolon in __remap_and_issue_shared_cell 2018-01-29 13:44:57 -05:00
dm-uevent.c
dm-uevent.h
dm-unstripe.c dm unstripe: fix target length versus number of stripes size check 2018-01-29 13:44:58 -05:00
dm-verity-fec.c dm verity fec: fix GFP flags used with mempool_alloc() 2017-07-26 15:55:44 -04:00
dm-verity-fec.h dm verity fec: limit error correction recursion 2017-03-16 09:37:31 -04:00
dm-verity-target.c Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2017-11-14 10:52:09 -08:00
dm-verity.h dm: move dm-verity to generic async completion 2017-11-03 22:11:20 +08:00
dm-zero.c dm: don't return errnos from ->map 2017-06-09 09:27:32 -06:00
dm-zoned-metadata.c dm: backfill missing calls to mutex_destroy() 2018-01-17 09:16:15 -05:00
dm-zoned-reclaim.c dm zoned: use GFP_NOIO in I/O path 2017-07-26 15:55:43 -04:00
dm-zoned-target.c dm: backfill missing calls to mutex_destroy() 2018-01-17 09:16:15 -05:00
dm-zoned.h dm zoned: drive-managed zoned block device target 2017-06-19 11:05:20 -04:00
dm.c - DM core fixes to ensure that bio submission follows a depth-first tree 2018-01-31 11:05:47 -08:00
dm.h dm: move dm_table_destroy() to same header as dm_table_create() 2018-01-17 09:16:06 -05:00
Kconfig dm: add unstriped target 2018-01-17 09:16:00 -05:00
Makefile dm: add unstriped target 2018-01-17 09:16:00 -05:00
md-bitmap.c Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2017-11-14 16:07:26 -08:00
md-bitmap.h Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2017-11-14 16:07:26 -08:00
md-cluster.c md-cluster: update document for raid10 2017-11-01 21:32:25 -07:00
md-cluster.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
md-faulty.c md: rename some drivers/md/ files to have an "md-" prefix 2017-10-16 19:06:36 -07:00
md-linear.c md: rename some drivers/md/ files to have an "md-" prefix 2017-10-16 19:06:36 -07:00
md-linear.h Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2017-11-14 16:07:26 -08:00
md-multipath.c md: remove redundant variable q 2017-11-01 21:32:24 -07:00
md-multipath.h Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2017-11-14 16:07:26 -08:00
md.c Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2018-01-31 11:03:38 -08:00
md.h raid5-ppl: PPL support for disks with write-back cache enabled 2018-01-15 14:29:42 -08:00
raid0.c md: remove special meaning of ->quiesce(.., 2) 2017-11-01 21:32:20 -07:00
raid0.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
raid1-10.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
raid1.c md/raid1,raid10: silence warning about wait-within-wait 2017-12-11 08:52:34 -08:00
raid1.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
raid5-cache.c raid5-ppl: PPL support for disks with write-back cache enabled 2018-01-15 14:29:42 -08:00
raid5-log.h raid5-ppl: PPL support for disks with write-back cache enabled 2018-01-15 14:29:42 -08:00
raid5-ppl.c raid5-ppl: PPL support for disks with write-back cache enabled 2018-01-15 14:29:42 -08:00
raid5.c raid5-ppl: PPL support for disks with write-back cache enabled 2018-01-15 14:29:42 -08:00
raid5.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
raid10.c md/raid1,raid10: silence warning about wait-within-wait 2017-12-11 08:52:34 -08:00
raid10.h Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2017-11-14 16:07:26 -08:00