linux/drivers/md
Ilya Dryomov 5b1fe7bec8 dm cache metadata: set dirty on all cache blocks after a crash
Quoting Documentation/device-mapper/cache.txt:

  The 'dirty' state for a cache block changes far too frequently for us
  to keep updating it on the fly.  So we treat it as a hint.  In normal
  operation it will be written when the dm device is suspended.  If the
  system crashes all cache blocks will be assumed dirty when restarted.

This got broken in commit f177940a80 ("dm cache metadata: switch to
using the new cursor api for loading metadata") in 4.9, which removed
the code that consulted cmd->clean_when_opened (CLEAN_SHUTDOWN on-disk
flag) when loading cache blocks.  This results in data corruption on an
unclean shutdown with dirty cache blocks on the fast device.  After the
crash those blocks are considered clean and may get evicted from the
cache at any time.  This can be demonstrated by doing a lot of reads
to trigger individual evictions, but uncache is more predictable:

  ### Disable auto-activation in lvm.conf to be able to do uncache in
  ### time (i.e. see uncache doing flushing) when the fix is applied.

  # xfs_io -d -c 'pwrite -b 4M -S 0xaa 0 1G' /dev/vdb
  # vgcreate vg_cache /dev/vdb /dev/vdc
  # lvcreate -L 1G -n lv_slowdev vg_cache /dev/vdb
  # lvcreate -L 512M -n lv_cachedev vg_cache /dev/vdc
  # lvcreate -L 256M -n lv_metadev vg_cache /dev/vdc
  # lvconvert --type cache-pool --cachemode writeback vg_cache/lv_cachedev --poolmetadata vg_cache/lv_metadev
  # lvconvert --type cache vg_cache/lv_slowdev --cachepool vg_cache/lv_cachedev
  # xfs_io -d -c 'pwrite -b 4M -S 0xbb 0 512M' /dev/mapper/vg_cache-lv_slowdev
  # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
  0fe00000:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
  0fe00010:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
  # dmsetup status vg_cache-lv_slowdev
  0 2097152 cache 8 27/65536 128 8192/8192 1 100 0 0 0 8192 7065 2 metadata2 writeback 2 migration_threshold 2048 smq 0 rw -
                                                            ^^^^
                                7065 * 64k = 441M yet to be written to the slow device
  # echo b >/proc/sysrq-trigger

  # vgchange -ay vg_cache
  # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
  0fe00000:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
  0fe00010:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
  # lvconvert --uncache vg_cache/lv_slowdev
  Flushing 0 blocks for cache vg_cache/lv_slowdev.
  Logical volume "lv_cachedev" successfully removed
  Logical volume vg_cache/lv_slowdev is not cached.
  # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
  0fe00000:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa  ................
  0fe00010:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa  ................

This is the case with both v1 and v2 cache pool metatata formats.

After applying this patch:

  # vgchange -ay vg_cache
  # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
  0fe00000:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
  0fe00010:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
  # lvconvert --uncache vg_cache/lv_slowdev
  Flushing 3724 blocks for cache vg_cache/lv_slowdev.
  ...
  Flushing 71 blocks for cache vg_cache/lv_slowdev.
  Logical volume "lv_cachedev" successfully removed
  Logical volume vg_cache/lv_slowdev is not cached.
  # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
  0fe00000:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
  0fe00010:  bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................

Cc: stable@vger.kernel.org
Fixes: f177940a80 ("dm cache metadata: switch to using the new cursor api for loading metadata")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-08-09 12:14:32 -04:00
..
bcache docs: Fix some broken references 2018-06-15 18:10:01 -03:00
persistent-data dm bufio: move dm-bufio.h to include/linux/ 2018-04-03 15:04:23 -04:00
dm-bio-prison-v1.c dm: adjust structure members to improve alignment 2018-06-08 11:53:14 -04:00
dm-bio-prison-v1.h block: switch bios to blk_status_t 2017-06-09 09:27:32 -06:00
dm-bio-prison-v2.c dm: adjust structure members to improve alignment 2018-06-08 11:53:14 -04:00
dm-bio-prison-v2.h dm bio prison v2: new interface for the bio prison 2017-03-07 11:30:16 -05:00
dm-bio-record.h block: replace bi_bdev with a gendisk pointer and partitions index 2017-08-23 12:49:55 -06:00
dm-bufio.c dm bufio: fix buffer alignment 2018-04-30 11:51:39 -04:00
dm-builtin.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dm-cache-background-tracker.c dm cache background tracker: fix sparse warning 2018-04-30 15:40:40 -04:00
dm-cache-background-tracker.h dm cache: significant rework to leverage dm-bio-prison-v2 2017-03-07 13:28:31 -05:00
dm-cache-block-types.h linux: drop __bitwise__ everywhere 2016-12-16 00:13:41 +02:00
dm-cache-metadata.c dm cache metadata: set dirty on all cache blocks after a crash 2018-08-09 12:14:32 -04:00
dm-cache-metadata.h dm cache: significant rework to leverage dm-bio-prison-v2 2017-03-07 13:28:31 -05:00
dm-cache-policy-internal.h dm cache: significant rework to leverage dm-bio-prison-v2 2017-03-07 13:28:31 -05:00
dm-cache-policy-smq.c treewide: Use array_size() in vzalloc() 2018-06-12 16:19:22 -07:00
dm-cache-policy.c
dm-cache-policy.h dm cache: significant rework to leverage dm-bio-prison-v2 2017-03-07 13:28:31 -05:00
dm-cache-target.c dm kcopyd: return void from dm_kcopyd_copy() 2018-07-31 17:33:21 -04:00
dm-core.h dm: adjust structure members to improve alignment 2018-06-08 11:53:14 -04:00
dm-crypt.c dm crypt: convert essiv from ahash to shash 2018-07-27 15:24:28 -04:00
dm-delay.c dm delay: add flush as a third class of IO 2018-07-27 15:24:19 -04:00
dm-era-target.c dm: allow targets to return output from messages they are sent 2018-04-03 15:04:10 -04:00
dm-exception-store.c
dm-exception-store.h
dm-flakey.c dm: remove fmode_t argument from .prepare_ioctl hook 2018-04-04 12:12:39 -04:00
dm-integrity.c dm integrity: recalculate checksums on creation 2018-07-27 15:24:27 -04:00
dm-io.c dm: Use kzalloc for all structs with embedded biosets/mempools 2018-06-05 08:47:43 -06:00
dm-ioctl.c dm: report which conflicting type caused error during table_load() 2018-06-08 09:50:15 -04:00
dm-kcopyd.c dm kcopyd: avoid softlockup in run_complete_job 2018-08-08 09:16:24 -04:00
dm-linear.c dax: Introduce a ->copy_to_iter dax operation 2018-05-22 23:18:31 -07:00
dm-log-userspace-base.c dm: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
dm-log-userspace-transfer.c
dm-log-userspace-transfer.h
dm-log-writes.c dax: Introduce a ->copy_to_iter dax operation 2018-05-22 23:18:31 -07:00
dm-log.c block,fs: use REQ_* flags directly 2016-11-01 09:43:26 -06:00
dm-mpath.c block: sanitize blk_get_request calling conventions 2018-05-14 08:55:12 -06:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h
dm-queue-length.c dm mpath selector: more evenly distribute ties 2018-01-29 13:44:58 -05:00
dm-raid1.c dm kcopyd: return void from dm_kcopyd_copy() 2018-07-31 17:33:21 -04:00
dm-raid.c dm raid: don't use 'const' in function return 2018-06-22 14:51:12 -04:00
dm-region-hash.c - Error path bug fix for overflow tests (Dan) 2018-06-12 18:28:00 -07:00
dm-round-robin.c dm round robin: revert "use percpu 'repeat_count' and 'current_path'" 2017-02-17 00:54:09 -05:00
dm-rq.c dm: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
dm-rq.h dm rq: do not update rq partially in each ending bio 2017-08-28 10:23:28 -04:00
dm-service-time.c dm mpath selector: more evenly distribute ties 2018-01-29 13:44:58 -05:00
dm-snap-persistent.c dm bufio: move dm-bufio.h to include/linux/ 2018-04-03 15:04:23 -04:00
dm-snap-transient.c
dm-snap.c dm snapshot: remove stale FIXME in snapshot_map() 2018-08-08 20:50:58 -04:00
dm-stats.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
dm-stats.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dm-stripe.c dax: Introduce a ->copy_to_iter dax operation 2018-05-22 23:18:31 -07:00
dm-switch.c treewide: Use array_size() in vmalloc() 2018-06-12 16:19:22 -07:00
dm-sysfs.c
dm-table.c dm: prevent DAX mounts if not supported 2018-06-28 16:06:14 -04:00
dm-target.c dm: remove unused macro DM_MOD_NAME_SIZE 2018-04-03 15:04:15 -04:00
dm-thin-metadata.c dm thin metadata: remove needless work from __commit_transaction 2018-06-22 14:51:11 -04:00
dm-thin-metadata.h dm thin: fix a race condition between discarding and provisioning a block 2016-07-20 12:43:35 -04:00
dm-thin.c dm thin: stop no_space_timeout worker when switching to write-mode 2018-08-07 14:30:29 -04:00
dm-uevent.c
dm-uevent.h
dm-unstripe.c dm unstripe: remove unnecessary header includes 2018-04-03 15:04:15 -04:00
dm-verity-fec.c Refactors rslib and callers to provide a per-instance allocation area 2018-06-05 10:48:05 -07:00
dm-verity-fec.h dm: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
dm-verity-target.c treewide: kvzalloc() -> kvcalloc() 2018-06-12 16:19:22 -07:00
dm-verity.h dm verity: add 'check_at_most_once' option to only validate hashes once 2018-04-03 15:04:29 -04:00
dm-writecache.c dm writecache: report start_sector in status line 2018-07-27 15:28:58 -04:00
dm-zero.c dm: don't return errnos from ->map 2017-06-09 09:27:32 -06:00
dm-zoned-metadata.c dm: backfill missing calls to mutex_destroy() 2018-01-17 09:16:15 -05:00
dm-zoned-reclaim.c dm kcopyd: return void from dm_kcopyd_copy() 2018-07-31 17:33:21 -04:00
dm-zoned-target.c dm zoned: avoid triggering reclaim from inside dmz_map() 2018-06-22 14:51:12 -04:00
dm-zoned.h dm zoned: drive-managed zoned block device target 2017-06-19 11:05:20 -04:00
dm.c dm: prevent DAX mounts if not supported 2018-06-28 16:06:14 -04:00
dm.h dm: move dm_table_destroy() to same header as dm_table_create() 2018-01-17 09:16:06 -05:00
Kconfig dm: add writecache target 2018-06-08 11:59:51 -04:00
Makefile dm: add writecache target 2018-06-08 11:59:51 -04:00
md-bitmap.c treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
md-bitmap.h Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2017-11-14 16:07:26 -08:00
md-cluster.c treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
md-cluster.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
md-faulty.c md: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
md-linear.c md: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
md-linear.h Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2017-11-14 16:07:26 -08:00
md-multipath.c treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
md-multipath.h md: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
md.c MD: cleanup resources in failure 2018-06-18 09:46:13 -07:00
md.h Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2018-06-09 12:01:36 -07:00
raid0.c treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
raid0.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
raid1-10.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
raid1.c treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
raid1.h md: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
raid5-cache.c md: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
raid5-log.h raid5-ppl: fix handling flush requests 2018-02-21 09:40:40 -08:00
raid5-ppl.c md: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
raid5.c treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
raid5.h Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2018-06-09 12:01:36 -07:00
raid10.c md/raid10: fix that replacement cannot complete recovery after reassemble 2018-06-28 13:04:49 -07:00
raid10.h md: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00