linux/drivers/md
Dennis Yang 583da48e38 md: update slab_cache before releasing new stripes when stripes resizing
When growing raid5 device on machine with small memory, there is chance that
mdadm will be killed and the following bug report can be observed. The same
bug could also be reproduced in linux-4.10.6.

[57600.075774] BUG: unable to handle kernel NULL pointer dereference at           (null)
[57600.083796] IP: [<ffffffff81a6aa87>] _raw_spin_lock+0x7/0x20
[57600.110378] PGD 421cf067 PUD 4442d067 PMD 0
[57600.114678] Oops: 0002 [#1] SMP
[57600.180799] CPU: 1 PID: 25990 Comm: mdadm Tainted: P           O    4.2.8 #1
[57600.187849] Hardware name: To be filled by O.E.M. To be filled by O.E.M./MAHOBAY, BIOS QV05AR66 03/06/2013
[57600.197490] task: ffff880044e47240 ti: ffff880043070000 task.ti: ffff880043070000
[57600.204963] RIP: 0010:[<ffffffff81a6aa87>]  [<ffffffff81a6aa87>] _raw_spin_lock+0x7/0x20
[57600.213057] RSP: 0018:ffff880043073810  EFLAGS: 00010046
[57600.218359] RAX: 0000000000000000 RBX: 000000000000000c RCX: ffff88011e296dd0
[57600.225486] RDX: 0000000000000001 RSI: ffffe8ffffcb46c0 RDI: 0000000000000000
[57600.232613] RBP: ffff880043073878 R08: ffff88011e5f8170 R09: 0000000000000282
[57600.239739] R10: 0000000000000005 R11: 28f5c28f5c28f5c3 R12: ffff880043073838
[57600.246872] R13: ffffe8ffffcb46c0 R14: 0000000000000000 R15: ffff8800b9706a00
[57600.253999] FS:  00007f576106c700(0000) GS:ffff88011e280000(0000) knlGS:0000000000000000
[57600.262078] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[57600.267817] CR2: 0000000000000000 CR3: 00000000428fe000 CR4: 00000000001406e0
[57600.274942] Stack:
[57600.276949]  ffffffff8114ee35 ffff880043073868 0000000000000282 000000000000eb3f
[57600.284383]  ffffffff81119043 ffff880043073838 ffff880043073838 ffff88003e197b98
[57600.291820]  ffffe8ffffcb46c0 ffff88003e197360 0000000000000286 ffff880043073968
[57600.299254] Call Trace:
[57600.301698]  [<ffffffff8114ee35>] ? cache_flusharray+0x35/0xe0
[57600.307523]  [<ffffffff81119043>] ? __page_cache_release+0x23/0x110
[57600.313779]  [<ffffffff8114eb53>] kmem_cache_free+0x63/0xc0
[57600.319344]  [<ffffffff81579942>] drop_one_stripe+0x62/0x90
[57600.324915]  [<ffffffff81579b5b>] raid5_cache_scan+0x8b/0xb0
[57600.330563]  [<ffffffff8111b98a>] shrink_slab.part.36+0x19a/0x250
[57600.336650]  [<ffffffff8111e38c>] shrink_zone+0x23c/0x250
[57600.342039]  [<ffffffff8111e4f3>] do_try_to_free_pages+0x153/0x420
[57600.348210]  [<ffffffff8111e851>] try_to_free_pages+0x91/0xa0
[57600.353959]  [<ffffffff811145b1>] __alloc_pages_nodemask+0x4d1/0x8b0
[57600.360303]  [<ffffffff8157a30b>] check_reshape+0x62b/0x770
[57600.365866]  [<ffffffff8157a4a5>] raid5_check_reshape+0x55/0xa0
[57600.371778]  [<ffffffff81583df7>] update_raid_disks+0xc7/0x110
[57600.377604]  [<ffffffff81592b73>] md_ioctl+0xd83/0x1b10
[57600.382827]  [<ffffffff81385380>] blkdev_ioctl+0x170/0x690
[57600.388307]  [<ffffffff81195238>] block_ioctl+0x38/0x40
[57600.393525]  [<ffffffff811731c5>] do_vfs_ioctl+0x2b5/0x480
[57600.399010]  [<ffffffff8115e07b>] ? vfs_write+0x14b/0x1f0
[57600.404400]  [<ffffffff811733cc>] SyS_ioctl+0x3c/0x70
[57600.409447]  [<ffffffff81a6ad97>] entry_SYSCALL_64_fastpath+0x12/0x6a
[57600.415875] Code: 00 00 00 00 55 48 89 e5 8b 07 85 c0 74 04 31 c0 5d c3 ba 01 00 00 00 f0 0f b1 17 85 c0 75 ef b0 01 5d c3 90 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 01 c3 55 89 c6 48 89 e5 e8 85 d1 63 ff 5d
[57600.435460] RIP  [<ffffffff81a6aa87>] _raw_spin_lock+0x7/0x20
[57600.441208]  RSP <ffff880043073810>
[57600.444690] CR2: 0000000000000000
[57600.448000] ---[ end trace cbc6b5cc4bf9831d ]---

The problem is that resize_stripes() releases new stripe_heads before assigning new
slab cache to conf->slab_cache. If the shrinker function raid5_cache_scan() gets called
after resize_stripes() starting releasing new stripes but right before new slab cache
being assigned, it is possible that these new stripe_heads will be freed with the old
slab_cache which was already been destoryed and that triggers this bug.

Signed-off-by: Dennis Yang <dennisyang@qnap.com>
Fixes: edbe83ab4c ("md/raid5: allow the stripe_cache to grow and shrink.")
Cc: stable@vger.kernel.org (4.1+)
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2017-04-10 09:27:12 -07:00
..
bcache drivers/md/bcache/util.h: remove duplicate inclusion of blkdev.h 2017-03-09 17:01:10 -08:00
persistent-data sched/headers: Prepare to move the get_task_struct()/put_task_struct() and related APIs from <linux/sched.h> to <linux/sched/task.h> 2017-03-02 08:42:40 +01:00
bitmap.c md: fix several trivial typos in comments 2017-03-23 22:54:57 -07:00
bitmap.h md: move bitmap_destroy to the beginning of __md_stop 2017-03-16 16:55:58 -07:00
dm-bio-prison.c
dm-bio-prison.h
dm-bio-record.h
dm-bufio.c sched/headers: Prepare to move the memalloc_noio_*() APIs to <linux/sched/mm.h> 2017-03-02 08:42:33 +01:00
dm-bufio.h
dm-builtin.c
dm-cache-block-types.h linux: drop __bitwise__ everywhere 2016-12-16 00:13:41 +02:00
dm-cache-metadata.c dm cache metadata: use cursor api in blocks_are_clean_separate_dirty() 2017-02-16 13:12:51 -05:00
dm-cache-metadata.h dm cache metadata: add "metadata2" feature 2017-02-16 13:12:47 -05:00
dm-cache-policy-cleaner.c dm cache: speed up writing of the hint array 2016-09-22 11:15:02 -04:00
dm-cache-policy-internal.h dm cache: speed up writing of the hint array 2016-09-22 11:15:02 -04:00
dm-cache-policy-smq.c dm cache policy smq: use hash_32() instead of hash_32_generic() 2016-12-08 19:42:37 -05:00
dm-cache-policy.c
dm-cache-policy.h dm cache: speed up writing of the hint array 2016-09-22 11:15:02 -04:00
dm-cache-target.c - Fix dm-raid transient device failure processing and other smaller 2017-02-21 12:11:41 -08:00
dm-core.h dm: always defer request allocation to the owner of the request_queue 2017-01-27 15:08:35 -07:00
dm-crypt.c KEYS: Differentiate uses of rcu_dereference_key() and user_key_payload() 2017-03-02 10:09:00 +11:00
dm-delay.c
dm-era-target.c block: Use pointer to backing_dev_info from request_queue 2017-02-02 08:20:48 -07:00
dm-exception-store.c
dm-exception-store.h
dm-flakey.c dm flakey: introduce "error_writes" feature 2016-12-13 15:01:31 -05:00
dm-io.c dm io: use bvec iterator helpers to implement .get_page and .next_page 2016-11-21 09:51:57 -05:00
dm-ioctl.c sched/headers: Prepare to move the memalloc_noio_*() APIs to <linux/sched/mm.h> 2017-03-02 08:42:33 +01:00
dm-kcopyd.c
dm-linear.c
dm-log-userspace-base.c
dm-log-userspace-transfer.c
dm-log-userspace-transfer.h
dm-log-writes.c Merge branch 'for-4.9/block' of git://git.kernel.dk/linux-block 2016-10-07 14:42:05 -07:00
dm-log.c block,fs: use REQ_* flags directly 2016-11-01 09:43:26 -06:00
dm-mpath.c Merge branch 'for-4.11/next' into for-4.11/linus-merge 2017-02-17 14:08:19 -07:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h
dm-queue-length.c
dm-raid1.c Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block 2016-12-13 10:19:16 -08:00
dm-raid.c dm raid: bump the target version 2017-02-28 16:47:52 -05:00
dm-region-hash.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-round-robin.c dm round robin: revert "use percpu 'repeat_count' and 'current_path'" 2017-02-17 00:54:09 -05:00
dm-rq.c dm-rq: don't dereference request payload after ending request 2017-02-24 13:19:32 -07:00
dm-rq.h dm: always defer request allocation to the owner of the request_queue 2017-01-27 15:08:35 -07:00
dm-service-time.c
dm-snap-persistent.c block,fs: use REQ_* flags directly 2016-11-01 09:43:26 -06:00
dm-snap-transient.c
dm-snap.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-stats.c dm stats: fix a leaked s->histogram_boundaries array 2017-02-16 14:17:07 -05:00
dm-stats.h
dm-stripe.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm-switch.c
dm-sysfs.c
dm-table.c block: Use pointer to backing_dev_info from request_queue 2017-02-02 08:20:48 -07:00
dm-target.c dm: always defer request allocation to the owner of the request_queue 2017-01-27 15:08:35 -07:00
dm-thin-metadata.c
dm-thin-metadata.h
dm-thin.c block: Use pointer to backing_dev_info from request_queue 2017-02-02 08:20:48 -07:00
dm-uevent.c
dm-uevent.h
dm-verity-fec.c
dm-verity-fec.h
dm-verity-target.c dm verity: fix incorrect error message 2016-11-21 09:52:01 -05:00
dm-verity.h
dm-zero.c block: rename bio bi_rw to bi_opf 2016-08-07 14:41:02 -06:00
dm.c blk: Ensure users for current->bio_list can see the full list. 2017-03-11 15:31:37 -07:00
dm.h dm: always defer request allocation to the owner of the request_queue 2017-01-27 15:08:35 -07:00
faulty.c md: fast clone bio in bio_clone_mddev() 2017-02-15 11:24:54 -08:00
Kconfig dm block manager: make block locking optional 2016-11-14 15:17:47 -05:00
linear.c Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2017-02-24 14:42:19 -08:00
linear.h md linear: fix a race between linear_add() and linear_congested() 2017-02-13 09:17:50 -08:00
Makefile raid5-ppl: Partial Parity Log write logging implementation 2017-03-16 16:55:54 -07:00
md-cluster.c md-cluster: add the support for resize 2017-03-16 16:55:50 -07:00
md-cluster.h md-cluster: add the support for resize 2017-03-16 16:55:50 -07:00
md.c MD: use per-cpu counter for writes_pending 2017-03-22 19:18:56 -07:00
md.h md: prepare for managing resync I/O pages in clean way 2017-03-24 10:41:36 -07:00
multipath.c Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2017-02-24 14:42:19 -08:00
multipath.h
raid0.c md: superblock changes for PPL 2017-03-16 16:55:53 -07:00
raid0.h
raid1.c md: raid1: kill warning on powerpc_pseries 2017-03-28 08:49:52 -07:00
raid1.h md: raid1: improve write behind 2017-03-24 10:41:37 -07:00
raid5-cache.c md/raid5-cache: fix payload endianness problem in raid5-cache 2017-03-25 09:38:22 -07:00
raid5-log.h md/raid5: call bio_endio() directly rather than queueing for later. 2017-03-22 19:16:12 -07:00
raid5-ppl.c raid5-ppl: silence a misleading warning message 2017-03-23 22:38:46 -07:00
raid5.c md: update slab_cache before releasing new stripes when stripes resizing 2017-04-10 09:27:12 -07:00
raid5.h md/raid5: remove over-loading of ->bi_phys_segments. 2017-03-22 19:16:56 -07:00
raid10.c md: raid10: avoid direct access to bvec table in handle_reshape_read_error 2017-03-24 10:41:37 -07:00
raid10.h md/raid10: add failfast handling for reads. 2016-11-22 09:14:28 -08:00