linux/drivers/block
Ilya Dryomov 811c668877 rbd: fix rbd map vs notify races
A while ago, commit 9875201e10 ("rbd: fix use-after free of
rbd_dev->disk") fixed rbd unmap vs notify race by introducing
an exported wrapper for flushing notifies and sticking it into
do_rbd_remove().

A similar problem exists on the rbd map path, though: the watch is
registered in rbd_dev_image_probe(), while the disk is set up quite
a few steps later, in rbd_dev_device_setup().  Nothing prevents
a notify from coming in and crashing on a NULL rbd_dev->disk:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
    Call Trace:
     [<ffffffffa0508344>] rbd_watch_cb+0x34/0x180 [rbd]
     [<ffffffffa04bd290>] do_event_work+0x40/0xb0 [libceph]
     [<ffffffff8109d5db>] process_one_work+0x17b/0x470
     [<ffffffff8109e3ab>] worker_thread+0x11b/0x400
     [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
     [<ffffffff810a5acf>] kthread+0xcf/0xe0
     [<ffffffff810b41b3>] ? finish_task_switch+0x53/0x170
     [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
     [<ffffffff81645dd8>] ret_from_fork+0x58/0x90
     [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
    RIP  [<ffffffffa050828a>] rbd_dev_refresh+0xfa/0x180 [rbd]

If an error occurs during rbd map, we have to error out, potentially
tearing down a watch.  Just like on rbd unmap, notifies have to be
flushed, otherwise rbd_watch_cb() may end up trying to read in the
image header after rbd_dev_image_release() has run:

    Assertion failure in rbd_dev_header_info() at line 4722:

     rbd_assert(rbd_image_format_valid(rbd_dev->image_format));

    Call Trace:
     [<ffffffff81cccee0>] ? rbd_parent_request_create+0x150/0x150
     [<ffffffff81cd4e59>] rbd_dev_refresh+0x59/0x390
     [<ffffffff81cd5229>] rbd_watch_cb+0x69/0x290
     [<ffffffff81fde9bf>] do_event_work+0x10f/0x1c0
     [<ffffffff81107799>] process_one_work+0x689/0x1a80
     [<ffffffff811076f7>] ? process_one_work+0x5e7/0x1a80
     [<ffffffff81132065>] ? finish_task_switch+0x225/0x640
     [<ffffffff81107110>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
     [<ffffffff81108c69>] worker_thread+0xd9/0x1320
     [<ffffffff81108b90>] ? process_one_work+0x1a80/0x1a80
     [<ffffffff8111b02d>] kthread+0x21d/0x2e0
     [<ffffffff8111ae10>] ? kthread_stop+0x550/0x550
     [<ffffffff82022802>] ret_from_fork+0x22/0x40
     [<ffffffff8111ae10>] ? kthread_stop+0x550/0x550
    RIP  [<ffffffff81ccd8f9>] rbd_dev_header_info+0xa19/0x1e30

To fix this, a) check if RBD_DEV_FLAG_EXISTS is set before calling
revalidate_disk(), b) move ceph_osdc_flush_notifies() call into
rbd_dev_header_unwatch_sync() to cover rbd map error paths and c) turn
header read-in into a critical section.  The latter also happens to
take care of rbd map foo@bar vs rbd snap rm foo@bar race.

Fixes: http://tracker.ceph.com/issues/15490

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2016-04-28 10:07:22 +02:00
..
aoe mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
drbd mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage 2016-04-04 10:41:08 -07:00
mtip32xx mtip32xx: fix checks for dma mapping errors 2016-03-18 18:10:59 -07:00
paride paride: make 'verbose' parameter an 'int' again 2016-03-15 16:55:16 -07:00
rsxx rsxx: don't open-code memdup_user() 2016-01-06 08:25:24 -05:00
xen-blkback xen/blback: Fit the important information of the thread in 17 characters 2016-03-03 14:45:54 -07:00
zram zram: don't call idr_remove() from zram_remove() 2016-01-15 17:56:32 -08:00
amiflop.c block: drop owner assignment from platform_drivers 2014-10-20 16:20:18 +02:00
ataflop.c Merge branch 'for-3.16/core' of git://git.kernel.dk/linux-block into next 2014-06-02 09:29:34 -07:00
brd.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
cciss_cmd.h
cciss_scsi.c scsi: Do not set cmd_per_lun to 1 in the host template 2015-05-31 18:06:28 -07:00
cciss_scsi.h
cciss.c SCSI misc on 20160113 2016-01-13 19:37:36 -08:00
cciss.h
cryptoloop.c block: cryptoloop - Use new skcipher interface 2016-01-27 20:35:43 +08:00
DAC960.c block: use pci_zalloc_consistent 2014-08-08 15:57:28 -07:00
DAC960.h
floppy.c floppy: refactor open() flags handling 2016-02-06 23:00:22 +01:00
hd.c block: hd: remove deprecated IRQF_DISABLED 2014-10-01 08:16:07 -06:00
Kconfig cpqarray: remove it from the kernel 2016-03-14 09:06:01 -06:00
loop.c block: loop: fix filesystem corruption in case of aio/dio 2016-04-15 08:25:56 -06:00
loop.h block: loop: support DIO & AIO 2015-09-23 11:01:16 -06:00
Makefile drivers:block: cpqarray clean up 2016-03-15 15:59:47 -07:00
mg_disk.c block: drop owner assignment from platform_drivers 2014-10-20 16:20:18 +02:00
nbd.c nbd: use correct div_s64 helper 2016-03-04 17:20:12 -07:00
null_blk.c null_blk: add lightnvm null_blk device to the nullb_list 2016-03-18 18:10:37 -07:00
osdblk.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
pktcdvd.c Merge branch 'for-4.4/io-poll' of git://git.kernel.dk/linux-block 2015-11-10 17:23:49 -08:00
ps3disk.c
ps3vram.c block: change ->make_request_fn() and users to return a queue cookie 2015-11-07 10:40:46 -07:00
rbd_types.h
rbd.c rbd: fix rbd map vs notify races 2016-04-28 10:07:22 +02:00
skd_main.c block: have drivers use blk_queue_max_discard_sectors() 2015-07-17 08:41:53 -06:00
skd_s1120.h
smart1,2.h
sunvdc.c sunvdc: reconnect ldc after vds service domain restarts 2014-12-11 18:52:45 -08:00
swim3.c powerpc: Move Power Macintosh drivers to generic byteswappers 2015-03-23 14:29:40 +11:00
swim_asm.S
swim.c block: drop owner assignment from platform_drivers 2014-10-20 16:20:18 +02:00
sx8.c sx8: use real time for the command seconds 2015-12-23 08:42:59 -07:00
umem.c block: change ->make_request_fn() and users to return a queue cookie 2015-11-07 10:40:46 -07:00
umem.h
virtio_blk.c virtio_blk: VIRTIO_BLK_F_WCE->VIRTIO_BLK_F_FLUSH 2016-03-02 17:01:59 +02:00
xen-blkfront.c Merge branch 'for-4.6/drivers' of git://git.kernel.dk/linux-block 2016-03-18 17:13:31 -07:00
xsysace.c block: systemace: Remove .owner field for driver 2014-08-21 20:37:54 -05:00
z2ram.c block: remove struct request buffer member 2014-04-15 14:03:02 -06:00