linux/drivers/md
Artur Paszkiewicz cc57858831 md/raid10: fix data corruption and crash during resync
The commit c31df25f20 ("md/raid10: make sync_request_write() call
bio_copy_data()") replaced manual data copying with bio_copy_data() but
it doesn't work as intended. The source bio (fbio) is already processed,
so its bvec_iter has bi_size == 0 and bi_idx == bi_vcnt.  Because of
this, bio_copy_data() either does not copy anything, or worse, copies
data from the ->bi_next bio if it is set.  This causes wrong data to be
written to drives during resync and sometimes lockups/crashes in
bio_copy_data():

[  517.338478] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [md126_raid10:3319]
[  517.347324] Modules linked in: raid10 xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul cryptd shpchp pcspkr ipmi_si ipmi_msghandler tpm_crb acpi_power_meter acpi_cpufreq ext4 mbcache jbd2 sr_mod cdrom sd_mod e1000e ax88179_178a usbnet mii ahci ata_generic crc32c_intel libahci ptp pata_acpi libata pps_core wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod
[  517.440555] CPU: 0 PID: 3319 Comm: md126_raid10 Not tainted 4.3.0-rc6+ #1
[  517.448384] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYDCRB1.86B.0055.D14.1509221924 09/22/2015
[  517.459768] task: ffff880153773980 ti: ffff880150df8000 task.ti: ffff880150df8000
[  517.468529] RIP: 0010:[<ffffffff812e1888>]  [<ffffffff812e1888>] bio_copy_data+0xc8/0x3c0
[  517.478164] RSP: 0018:ffff880150dfbc98  EFLAGS: 00000246
[  517.484341] RAX: ffff880169356688 RBX: 0000000000001000 RCX: 0000000000000000
[  517.492558] RDX: 0000000000000000 RSI: ffffea0001ac2980 RDI: ffffea0000d835c0
[  517.500773] RBP: ffff880150dfbd08 R08: 0000000000000001 R09: ffff880153773980
[  517.508987] R10: ffff880169356600 R11: 0000000000001000 R12: 0000000000010000
[  517.517199] R13: 000000000000e000 R14: 0000000000000000 R15: 0000000000001000
[  517.525412] FS:  0000000000000000(0000) GS:ffff880174a00000(0000) knlGS:0000000000000000
[  517.534844] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  517.541507] CR2: 00007f8a044d5fed CR3: 0000000169504000 CR4: 00000000001406f0
[  517.549722] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  517.557929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  517.566144] Stack:
[  517.568626]  ffff880174a16bc0 ffff880153773980 ffff880169356600 0000000000000000
[  517.577659]  0000000000000001 0000000000000001 ffff880153773980 ffff88016a61a800
[  517.586715]  ffff880150dfbcf8 0000000000000001 ffff88016dd209e0 0000000000001000
[  517.595773] Call Trace:
[  517.598747]  [<ffffffffa043ef95>] raid10d+0xfc5/0x1690 [raid10]
[  517.605610]  [<ffffffff816697ae>] ? __schedule+0x29e/0x8e2
[  517.611987]  [<ffffffff814ff206>] md_thread+0x106/0x140
[  517.618072]  [<ffffffff810c1d80>] ? wait_woken+0x80/0x80
[  517.624252]  [<ffffffff814ff100>] ? super_1_load+0x520/0x520
[  517.630817]  [<ffffffff8109ef89>] kthread+0xc9/0xe0
[  517.636506]  [<ffffffff8109eec0>] ? flush_kthread_worker+0x70/0x70
[  517.643653]  [<ffffffff8166d99f>] ret_from_fork+0x3f/0x70
[  517.649929]  [<ffffffff8109eec0>] ? flush_kthread_worker+0x70/0x70

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Reviewed-by: Shaohua Li <shli@kernel.org>
Cc: stable@vger.kernel.org (v4.2+)
Fixes: c31df25f20 ("md/raid10: make sync_request_write() call bio_copy_data()")
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-18 15:19:16 +11:00
..
bcache Merge branch 'for-4.4/io-poll' of git://git.kernel.dk/linux-block 2015-11-10 17:23:49 -08:00
persistent-data dm btree: fix bufio buffer leaks in dm_btree_del() error path 2015-12-10 10:30:18 -05:00
bitmap.c md-cluster: Use a small window for resync 2015-10-12 01:32:05 -05:00
bitmap.h md-cluster: Use a small window for resync 2015-10-12 01:32:05 -05:00
dm-bio-prison.c block: add a bi_error field to struct bio 2015-07-29 08:55:15 -06:00
dm-bio-prison.h dm bio prison: add dm_cell_promote_or_release() 2015-05-29 14:19:06 -04:00
dm-bio-record.h
dm-bufio.c dm: convert ffs to __ffs 2015-10-31 19:06:01 -04:00
dm-bufio.h
dm-builtin.c
dm-cache-block-types.h
dm-cache-metadata.c - Revert a dm-multipath change that caused a regression for unprivledged 2015-11-04 21:19:53 -08:00
dm-cache-metadata.h dm cache: add fail io mode and needs_check flag 2015-06-11 17:13:00 -04:00
dm-cache-policy-cleaner.c - Revert a dm-multipath change that caused a regression for unprivledged 2015-11-04 21:19:53 -08:00
dm-cache-policy-internal.h dm cache: age and write back cache entries even without active IO 2015-06-11 17:13:01 -04:00
dm-cache-policy-mq.c dm: convert ffs to __ffs 2015-10-31 19:06:01 -04:00
dm-cache-policy-smq.c dm: convert ffs to __ffs 2015-10-31 19:06:01 -04:00
dm-cache-policy.c
dm-cache-policy.h dm cache: age and write back cache entries even without active IO 2015-06-11 17:13:01 -04:00
dm-cache-target.c dm: drop NULL test before kmem_cache_destroy() and mempool_destroy() 2015-10-31 19:06:00 -04:00
dm-crypt.c dm crypt: fix a possible hang due to race condition on exit 2015-11-19 13:38:30 -05:00
dm-delay.c dm delay: document that offsets are specified in sectors 2015-10-31 19:06:05 -04:00
dm-era-target.c dm persistent data: eliminate unnecessary return values 2015-10-31 19:06:02 -04:00
dm-exception-store.c - Revert a dm-multipath change that caused a regression for unprivledged 2015-11-04 21:19:53 -08:00
dm-exception-store.h dm snapshot: add new persistent store option to support overflow 2015-10-09 16:57:03 -04:00
dm-flakey.c dm: refactor ioctl handling 2015-10-31 19:05:59 -04:00
dm-io.c dm: drop NULL test before kmem_cache_destroy() and mempool_destroy() 2015-10-31 19:06:00 -04:00
dm-ioctl.c char: make misc_deregister a void function 2015-08-05 10:35:49 -07:00
dm-kcopyd.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
dm-linear.c dm linear: remove redundant target name from error messages 2015-10-31 19:06:03 -04:00
dm-log-userspace-base.c dm: drop NULL test before kmem_cache_destroy() and mempool_destroy() 2015-10-31 19:06:00 -04:00
dm-log-userspace-transfer.c dm log userspace transfer: match wait_for_completion_timeout return type 2015-04-15 12:10:20 -04:00
dm-log-userspace-transfer.h
dm-log-writes.c dm: refactor ioctl handling 2015-10-31 19:05:59 -04:00
dm-log.c
dm-mpath.c dm mpath: fix infinite recursion in ioctl when no paths and !queue_if_no_path 2015-11-17 14:19:00 -05:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h
dm-queue-length.c
dm-raid1.c Merge tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm 2015-09-02 16:35:26 -07:00
dm-raid.c dm raid: fix round up of default region size 2015-10-02 12:02:31 -04:00
dm-region-hash.c dm: convert ffs to __ffs 2015-10-31 19:06:01 -04:00
dm-round-robin.c
dm-service-time.c
dm-snap-persistent.c - Revert a dm-multipath change that caused a regression for unprivledged 2015-11-04 21:19:53 -08:00
dm-snap-transient.c dm snapshot: add new persistent store option to support overflow 2015-10-09 16:57:03 -04:00
dm-snap.c dm snapshot: add new persistent store option to support overflow 2015-10-09 16:57:03 -04:00
dm-stats.c dm stats: report precise_timestamps and histogram in @stats_list output 2015-08-18 17:20:03 -04:00
dm-stats.h dm stats: support precise timestamps 2015-06-17 12:40:40 -04:00
dm-stripe.c Merge tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm 2015-09-02 16:35:26 -07:00
dm-switch.c dm switch: simplify conditional in alloc_region_table() 2015-10-31 19:06:06 -04:00
dm-sysfs.c dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr 2015-04-15 12:10:17 -04:00
dm-table.c block: Inline blk_integrity in struct gendisk 2015-10-21 14:42:42 -06:00
dm-target.c
dm-thin-metadata.c dm thin metadata: fix bug when taking a metadata snapshot 2015-12-09 13:18:12 -05:00
dm-thin-metadata.h dm thin metadata: add dm_thin_remove_range() 2015-06-11 17:13:04 -04:00
dm-thin.c dm thin: fix regression in advertised discard limits 2015-11-23 14:54:46 -05:00
dm-uevent.c
dm-uevent.h
dm-verity.c dm: refactor ioctl handling 2015-10-31 19:05:59 -04:00
dm-zero.c block: add a bi_error field to struct bio 2015-07-29 08:55:15 -06:00
dm.c dm: do not reuse dm_blk_ioctl block_device input as local variable 2015-11-17 14:18:49 -05:00
dm.h block: kill merge_bvec_fn() completely 2015-08-13 12:31:57 -06:00
faulty.c block: add a bi_error field to struct bio 2015-07-29 08:55:15 -06:00
Kconfig raid5-cache: add crc32c Kconfig dependency 2015-11-09 09:09:52 +11:00
linear.c block: kill merge_bvec_fn() completely 2015-08-13 12:31:57 -06:00
linear.h
Makefile raid5: add basic stripe log 2015-10-24 17:16:19 +11:00
md-cluster.c md-cluster: remove mddev arg from add_resync_info() 2015-10-24 17:16:18 +11:00
md-cluster.h md-cluster: Fix adding of new disk with new reload code 2015-10-12 03:35:30 -05:00
md.c block: change ->make_request_fn() and users to return a queue cookie 2015-11-07 10:40:46 -07:00
md.h MD: add new bit to indicate raid array with journal 2015-11-01 13:48:29 +11:00
multipath.c md: suspend i/o during runtime blk_integrity_unregister 2015-10-21 14:43:38 -06:00
multipath.h
raid0.c md/raid0: apply base queue limits *before* disk_stack_limits 2015-10-02 17:23:44 +10:00
raid0.h block: kill merge_bvec_fn() completely 2015-08-13 12:31:57 -06:00
raid1.c md updates for 4.4. 2015-11-04 21:12:47 -08:00
raid1.h md-cluster: Use a small window for resync 2015-10-12 01:32:05 -05:00
raid5-cache.c raid5-cache: start raid5 readonly if journal is missing 2015-11-01 13:48:29 +11:00
raid5.c md updates for 4.4. 2015-11-04 21:12:47 -08:00
raid5.h raid5-cache: IO error handling 2015-11-01 13:48:29 +11:00
raid10.c md/raid10: fix data corruption and crash during resync 2015-12-18 15:19:16 +11:00
raid10.h md/raid10: ensure device failure recorded before write request returns. 2015-08-31 19:43:45 +02:00