If the dm_resume method is called on a device that is not suspended, the
method will suspend the device briefly, before resuming it (so that the
table will be swapped).
However, there was a bug that the return value of dm_suspended_md was not
checked. dm_suspended_md may return an error when it is interrupted by a
signal. In this case, do_resume would call dm_swap_table, which would
return -EINVAL.
This commit fixes the logic, so that error returned by dm_suspend is
checked and the resume operation is undone.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Cc: stable@vger.kernel.org
The kvmalloc function fails with a warning if the size is larger than
INT_MAX. The warning was triggered by a syscall testing robot.
In order to avoid the warning, this commit limits the number of targets to
1048576 and the size of the parameter area to 1073741824.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
`strncpy` is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.
We expect `spec->target_type` to be NUL-terminated based on its use with
a format string after `dm_table_add_target()` is called
| r = dm_table_add_target(table, spec->target_type,
| (sector_t) spec->sector_start,
| (sector_t) spec->length,
| target_params);
... wherein `spec->target_type` is passed as parameter `type` and later
printed with DMERR:
| DMERR("%s: %s: unknown target type", dm_device_name(t->md), type);
It appears that `spec` is not zero-allocated and thus NUL-padding may be
required in this ioctl context.
Considering the above, a suitable replacement is `strscpy_pad` due to
the fact that it guarantees NUL-termination whilst maintaining the
NUL-padding behavior that strncpy provides.
Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://github.com/KSPP/linux/issues/90
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
There's a race condition in the multipath target when retrieve_deps
races with multipath_message calling dm_get_device and dm_put_device.
retrieve_deps walks the list of open devices without holding any lock
but multipath may add or remove devices to the list while it is
running. The end result may be memory corruption or use-after-free
memory access.
See this description of a UAF with multipath_message():
https://listman.redhat.com/archives/dm-devel/2022-October/052373.html
Fix this bug by introducing a new rw semaphore "devices_lock". We grab
devices_lock for read in retrieve_deps and we grab it for write in
dm_get_device and dm_put_device.
Reported-by: Luo Meng <luomeng12@huawei.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Tested-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
- Fix DM crypt target's crypt_ctr_cipher_new return value on invalid
AEAD cipher.
- Fix DM flakey testing target's write bio corruption feature to
corrupt the data of a cloned bio instead of the original.
- Add random_read_corrupt and random_write_corrupt features to DM
flakey target.
- Fix ABBA deadlock in DM thin metadata by resetting associated bufio
client rather than destroying and recreating it.
- A couple other small DM thinp cleanups.
- Update DM core to support disabling block core IO stats accounting
and optimize away code that isn't needed if stats are disabled.
- Other small DM core cleanups.
- Improve DM integrity target to not require so much memory on 32 bit
systems. Also only allocate the recalculate buffer as needed (and
increasingly reduce its size on allocation failure).
- Update DM integrity to use %*ph for printing hexdump of a small
buffer. Also update DM integrity documentation.
- Various DM core ioctl interface hardening. Now more careful about
alignment of structures and processing of input passed to the kernel
from userspace. Also disallow the creation of DM devices named
"control", "." or ".."
- Eliminate GFP_NOIO workarounds for __vmalloc and kvmalloc in DM
core's ioctl and bufio code.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmScyjAACgkQxSPxCi2d
A1pilwgAucNIAB6uN4ke4WZrMVxSFntUkqDTkCs2Ycw+W4Tf1Mtrj/4WeBzFdJaA
oZMK04LUGaFFXn+halsCDzB354yT9C7V/KfXW8pCM1c9BRz4e8272i2HSN4WwD5n
BU4gVaOV5BwxynfF3Z5siRraad1AwmdoRGGsqzVRESAKaObXU//1tnO42UhxRVhn
nzFqhIm0xRcLAd8xIBlMsZQGIloicdDP9wZdWzTEDspQiwR2dFRmH9bUF8OmsS+h
KwhtDty7aZO+4gJ1ccBImijzQCmbAo7dmFhDfoLXaA5Jt6UwTXMeBHm4aUPMnvQe
NVXoRZJodDemwM642Q/Tx1SpsX6QmA==
=4R7u
-----END PGP SIGNATURE-----
Merge tag 'for-6.5/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Update DM crypt to allocate compound pages if possible
- Fix DM crypt target's crypt_ctr_cipher_new return value on invalid
AEAD cipher
- Fix DM flakey testing target's write bio corruption feature to
corrupt the data of a cloned bio instead of the original
- Add random_read_corrupt and random_write_corrupt features to DM
flakey target
- Fix ABBA deadlock in DM thin metadata by resetting associated bufio
client rather than destroying and recreating it
- A couple other small DM thinp cleanups
- Update DM core to support disabling block core IO stats accounting
and optimize away code that isn't needed if stats are disabled
- Other small DM core cleanups
- Improve DM integrity target to not require so much memory on 32 bit
systems. Also only allocate the recalculate buffer as needed (and
increasingly reduce its size on allocation failure)
- Update DM integrity to use %*ph for printing hexdump of a small
buffer. Also update DM integrity documentation
- Various DM core ioctl interface hardening. Now more careful about
alignment of structures and processing of input passed to the kernel
from userspace.
Also disallow the creation of DM devices named "control", "." or ".."
- Eliminate GFP_NOIO workarounds for __vmalloc and kvmalloc in DM
core's ioctl and bufio code
* tag 'for-6.5/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (28 commits)
dm: get rid of GFP_NOIO workarounds for __vmalloc and kvmalloc
dm integrity: scale down the recalculate buffer if memory allocation fails
dm integrity: only allocate recalculate buffer when needed
dm integrity: reduce vmalloc space footprint on 32-bit architectures
dm ioctl: Refuse to create device named "." or ".."
dm ioctl: Refuse to create device named "control"
dm ioctl: Avoid double-fetch of version
dm ioctl: structs and parameter strings must not overlap
dm ioctl: Avoid pointer arithmetic overflow
dm ioctl: Check dm_target_spec is sufficiently aligned
Documentation: dm-integrity: Document an example of how the tunables relate.
Documentation: dm-integrity: Document default values.
Documentation: dm-integrity: Document the meaning of "buffer".
Documentation: dm-integrity: Fix minor grammatical error.
dm integrity: Use %*ph for printing hexdump of a small buffer
dm thin: disable discards for thin-pool if no_discard_passdown
dm: remove stale/redundant dm_internal_{suspend,resume} prototypes in dm.h
dm: skip dm-stats work in alloc_io() unless needed
dm: avoid needless dm_io access if all IO accounting is disabled
dm: support turning off block-core's io stats accounting
...
In the past, the function __vmalloc didn't respect the GFP flags - it
allocated memory with the provided gfp flags, but it allocated page tables
with GFP_KERNEL. This was fixed in commit 451769ebb7 ("mm/vmalloc:
alloc GFP_NO{FS,IO} for vmalloc") so the memalloc_noio_{save,restore}
workaround is no longer needed.
The function kvmalloc didn't like flags different from GFP_KERNEL. This
was fixed in commit a421ef3030 ("mm: allow !GFP_KERNEL allocations
for kvmalloc"), so kvmalloc can now be called with GFP_NOIO.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmSV8dwQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpilGD/9Yys1oxIXJpRf00fzrylAlBthRxMjFQVWw
zAut106hAQiBHvU8IkmGA3MvEFVHxtzwYhHI7IR8K3aZBIqscweCqmVI9JyogJw9
U9Twnzel47VmuKdM94FeoN+hbj1fP8EWTjzmy67/zEEfFCdmHvNlMi3lSrGYIpFy
39LxTB99Y4UarM5PtWbes37GYYljzMSWKuo4AfBkvq1eQa+sZ0Vq2xAABKq3UM7f
apqhgHtkJooRePDP0eQp+kAyyVMgW2jIK+oIdJDxNF3CKTu2w40RzaYz6fp+jVSU
H4R/xS59GW4/xql+VBJDh/qJg9K62DPPYjlW8BmSR8+IjvfFpsyH3/MacE50CD3P
20fs/Mnj49H79fDrQEHJI53cOOb2EmUitbwLbvOcColNTPpt8loBtdQxjF2RMU8R
Nyort9DJPFclYCxky1LYg1CNEC2Ln4Zy/jD47wPvqRmOQphOoVlV/hPnOEqvjaZC
49Vn70W2DeE9cXvYI7ha+XIg6/oj+Gs3iusEbV08Ci7EAtXgI+ZUUsQ97K8UNiUh
h2lqSJtuI7lBpYP9sf+BeCch5UCC+xGYyTdoM5f58lehWBBPtbs0g7S9RyRyOYxe
n+yxEUo3dAGzJ/xsKAjinbZfeWIpr0b1TkAh4w3Cq/BKzRr9Bp8lBAxYuancbQ+Y
1ADPteUOTA==
=zP4Y
-----END PGP SIGNATURE-----
Merge tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
- NVMe pull request via Keith:
- Various cleanups all around (Irvin, Chaitanya, Christophe)
- Better struct packing (Christophe JAILLET)
- Reduce controller error logs for optional commands (Keith)
- Support for >=64KiB block sizes (Daniel Gomez)
- Fabrics fixes and code organization (Max, Chaitanya, Daniel
Wagner)
- bcache updates via Coly:
- Fix a race at init time (Mingzhe Zou)
- Misc fixes and cleanups (Andrea, Thomas, Zheng, Ye)
- use page pinning in the block layer for dio (David)
- convert old block dio code to page pinning (David, Christoph)
- cleanups for pktcdvd (Andy)
- cleanups for rnbd (Guoqing)
- use the unchecked __bio_add_page() for the initial single page
additions (Johannes)
- fix overflows in the Amiga partition handling code (Michael)
- improve mq-deadline zoned device support (Bart)
- keep passthrough requests out of the IO schedulers (Christoph, Ming)
- improve support for flush requests, making them less special to deal
with (Christoph)
- add bdev holder ops and shutdown methods (Christoph)
- fix the name_to_dev_t() situation and use cases (Christoph)
- decouple the block open flags from fmode_t (Christoph)
- ublk updates and cleanups, including adding user copy support (Ming)
- BFQ sanity checking (Bart)
- convert brd from radix to xarray (Pankaj)
- constify various structures (Thomas, Ivan)
- more fine grained persistent reservation ioctl capability checks
(Jingbo)
- misc fixes and cleanups (Arnd, Azeem, Demi, Ed, Hengqi, Hou, Jan,
Jordy, Li, Min, Yu, Zhong, Waiman)
* tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linux: (266 commits)
scsi/sg: don't grab scsi host module reference
ext4: Fix warning in blkdev_put()
block: don't return -EINVAL for not found names in devt_from_devname
cdrom: Fix spectre-v1 gadget
block: Improve kernel-doc headers
blk-mq: don't insert passthrough request into sw queue
bsg: make bsg_class a static const structure
ublk: make ublk_chr_class a static const structure
aoe: make aoe_class a static const structure
block/rnbd: make all 'class' structures const
block: fix the exclusive open mask in disk_scan_partitions
block: add overflow checks for Amiga partition support
block: change all __u32 annotations to __be32 in affs_hardblocks.h
block: fix signed int overflow in Amiga partition support
block: add capacity validation in bdev_add_partition()
block: fine-granular CAP_SYS_ADMIN for Persistent Reservation
block: disallow Persistent Reservation on partitions
reiserfs: fix blkdev_put() warning from release_journal_dev()
block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions()
block: document the holder argument to blkdev_get_by_path
...
Using either of these is going to greatly confuse userspace, as they are
not valid symlink names and so creating the usual /dev/mapper/NAME
symlink will not be possible. As creating a device with either of these
names is almost certainly a userspace bug, just error out.
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Typical userspace setups create a symlink under /dev/mapper with the
name of the device, but /dev/mapper/control is reserved for DM's control
device. Therefore, trying to create such a device is almost certain to
be a userspace bug.
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
The version is fetched once in check_version(), which then does some
validation and then overwrites the version in userspace with the API
version supported by the kernel. copy_params() then fetches the version
from userspace *again*, and this time no validation is done. The result
is that the kernel's version number is completely controllable by
userspace, provided that userspace can win a race condition.
Fix this flaw by not copying the version back to the kernel the second
time. This is not exploitable as the version is not further used in the
kernel. However, it could become a problem if future patches start
relying on the version field.
Cc: stable@vger.kernel.org
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
The NUL terminator for each target parameter string must precede the
following 'struct dm_target_spec'. Otherwise, dm_split_args() might
corrupt this struct. Furthermore, the first 'struct dm_target_spec'
must come after the 'struct dm_ioctl', as if it overlaps too much
dm_split_args() could corrupt the 'struct dm_ioctl'.
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Especially on 32-bit systems, it is possible for the pointer
arithmetic to overflow and cause a userspace pointer to be
dereferenced in the kernel.
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Otherwise subsequent code, if given malformed input, could dereference
a misaligned 'struct dm_target_spec *'.
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> # use %zu
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
As described in commit 38d11da522 ("dm: don't lock fs when the map is
NULL in process of resume"), a deadlock may be triggered between
do_resume() and do_mount().
This commit preserves the fix from commit 38d11da522 but moves it to
where it also serves to fix a similar deadlock between do_suspend()
and do_mount(). It does so, if the active map is NULL, by clearing
DM_SUSPEND_LOCKFS_FLAG in dm_suspend() which is called by both
do_suspend() and do_resume().
Fixes: 38d11da522 ("dm: don't lock fs when the map is NULL in process of resume")
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
The only overlap between the block open flags mapped into the fmode_t and
other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new
blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and
->ioctl and stop abusing fmode_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd]
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit fa247089de ("dm: requeue IO if mapping table not yet available")
added a detection of whether the mapping table is available in the IO
submission process. If the mapping table is unavailable, it returns
BLK_STS_RESOURCE and requeues the IO.
This can lead to the following deadlock problem:
dm create mount
ioctl(DM_DEV_CREATE_CMD)
ioctl(DM_TABLE_LOAD_CMD)
do_mount
vfs_get_tree
ext4_get_tree
get_tree_bdev
sget_fc
alloc_super
// got &s->s_umount
down_write_nested(&s->s_umount, ...);
ext4_fill_super
ext4_load_super
ext4_read_bh
submit_bio
// submit and wait io end
ioctl(DM_DEV_SUSPEND_CMD)
dev_suspend
do_resume
dm_suspend
__dm_suspend
lock_fs
freeze_bdev
get_active_super
grab_super
// wait for &s->s_umount
down_write(&s->s_umount);
dm_swap_table
__bind
// set md->map(can't get here)
IO will be continuously requeued while holding the lock since mapping
table is NULL. At the same time, mapping table won't be set since the
lock is not available.
Like request-based DM, bio-based DM also has the same problem.
It's not proper to just abort IO if the mapping table not available.
So clear DM_SKIP_LOCKFS_FLAG when the mapping table is NULL, this
allows the DM table to be loaded and the IO submitted upon resume.
Fixes: fa247089de ("dm: requeue IO if mapping table not yet available")
Cc: stable@vger.kernel.org
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
syzkaller found the following problematic rwsem locking (with write
lock already held):
down_read+0x9d/0x450 kernel/locking/rwsem.c:1509
dm_get_inactive_table+0x2b/0xc0 drivers/md/dm-ioctl.c:773
__dev_status+0x4fd/0x7c0 drivers/md/dm-ioctl.c:844
table_clear+0x197/0x280 drivers/md/dm-ioctl.c:1537
In table_clear, it first acquires a write lock
https://elixir.bootlin.com/linux/v6.2/source/drivers/md/dm-ioctl.c#L1520
down_write(&_hash_lock);
Then before the lock is released at L1539, there is a path shown above:
table_clear -> __dev_status -> dm_get_inactive_table -> down_read
https://elixir.bootlin.com/linux/v6.2/source/drivers/md/dm-ioctl.c#L773
down_read(&_hash_lock);
It tries to acquire the same read lock again, resulting in the deadlock
problem.
Fix this by moving table_clear()'s __dev_status() call to after its
up_write(&_hash_lock);
Cc: stable@vger.kernel.org
Reported-by: Zheng Zhang <zheng.zhang@email.ucr.edu>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
__hash_remove() removes hash_cell with _hash_lock locked, so acquiring
_hash_lock can guarantee no-NULL hc returned from dm_get_mdptr() must
have not been removed and hc->md must still be md.
__hash_remove() also acquires dm_hash_cells_mutex before setting mdptr
as NULL. So in dm_copy_name_and_uuid(), after acquiring
dm_hash_cells_mutex and ensuring returned hc is not NULL, the returned
hc must still be alive and hc->md must still be md.
Remove the unnecessary hc->md != md checks when using dm_get_mdptr()
with _hash_lock or dm_hash_cells_mutex acquired.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Also update dm_early_create() to take _hash_lock when calling both
__get_name_cell and __hash_remove -- given dm_early_create()'s early
boot usecase this locking isn't about correctness but it allows
lockdep_assert_held() to be added to __hash_remove.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
'GPL-2.0-only' is used instead of 'GPL-2.0' because SPDX has
deprecated its use.
Suggested-by: John Wiele <jwiele@redhat.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Device mapper sends an uevent when the device is suspended, using the
function set_capacity_and_notify. However, this causes a race condition
with udev.
Udev skips scanning dm devices that are suspended. If we send an uevent
while we are suspended, udev will be racing with device mapper resume
code. If the device mapper resume code wins the race, udev will process
the uevent after the device is resumed and it will properly scan the
device.
However, if udev wins the race, it will receive the uevent, find out that
the dm device is suspended and skip scanning the device. This causes bugs
such as systemd unmounting the device - see
https://bugzilla.redhat.com/show_bug.cgi?id=2158628
This commit fixes this race.
We replace the function set_capacity_and_notify with set_capacity, so that
the uevent is not sent at this point. In do_resume, we detect if the
capacity has changed and we pass a boolean variable need_resize_uevent to
dm_kobject_uevent. dm_kobject_uevent adds "RESIZE=1" to the uevent if
need_resize_uevent is set.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Peter Rajnoha <prajnoha@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
The expression 'indata[3] > ULONG_MAX' always evaluates to false since
indata[] is declared as an array of *unsigned long* elements and #define
ULONG_MAX represents the max value of that exact type...
Note that gcc seems to be able to detect the dead code here and eliminate
this check anyway...
Found by Linux Verification Center (linuxtesting.org) with the SVACE static
analysis tool.
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Change the ioctl codes from DM_DEV_ARM_POLL to DM_DEV_ARM_POLL_CMD and
from DM_GET_TARGET_VERSION to DM_GET_TARGET_VERSION_CMD.
Note that the "cmd" field of "struct _ioctls" is never used, thus this
commit doesn't fix any bug, it just makes the code consistent.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
No need to modify info->vers if we overwrite it immediately.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
__list_versions will first estimate the required space using the
"dm_target_iterate(list_version_get_needed, &needed)" call and then will
fill the space using the "dm_target_iterate(list_version_get_info,
&iter_info)" call. Each of these calls locks the targets using the
"down_read(&_lock)" and "up_read(&_lock)" calls, however between the first
and second "dm_target_iterate" there is no lock held and the target
modules can be loaded at this point, so the second "dm_target_iterate"
call may need more space than what was the first "dm_target_iterate"
returned.
The code tries to handle this overflow (see the beginning of
list_version_get_info), however this handling is incorrect.
The code sets "param->data_size = param->data_start + needed" and
"iter_info.end = (char *)vers+len" - "needed" is the size returned by the
first dm_target_iterate call; "len" is the size of the buffer allocated by
userspace.
"len" may be greater than "needed"; in this case, the code will write up
to "len" bytes into the buffer, however param->data_size is set to
"needed", so it may write data past the param->data_size value. The ioctl
interface copies only up to param->data_size into userspace, thus part of
the result will be truncated.
Fix this bug by setting "iter_info.end = (char *)vers + needed;" - this
guarantees that the second "dm_target_iterate" call will write only up to
the "needed" buffer and it will exit with "DM_BUFFER_FULL_FLAG" if it
overflows the "needed" space - in this case, userspace will allocate a
larger buffer and retry.
Note that there is also a bug in list_version_get_needed - we need to add
"strlen(tt->name) + 1" to the needed size, not "strlen(tt->name)".
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Change DMWARN to DMERR in cases when there is an unrecoverable error.
Change DMWARN to DMCRIT when handling of a case is unimplemented.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
More efficient and readable to just access table->num_targets directly.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
This will help triage bugs when userspace is passing invalid ioctl
structure to the kernel.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
[snitzer: log errors using DMERR instead of DMWARN]
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
It appears like cmd could be a Spectre v1 gadget as it's supplied by a
user and used as an array index. Prevent the contents of kernel memory
from being leaked to userspace via speculative execution by using
array_index_nospec.
Signed-off-by: Jordy Zomer <jordy@pwning.systems>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
A given block device is identified by it's name and UUID. However, both
these parameters can be renamed. For an external attestation service to
correctly attest a given device, it needs to keep track of these rename
events.
Update the device data with the new values for IMA measurements. Measure
both old and new device name/UUID parameters in the same IMA measurement
event, so that the old and the new values can be connected later.
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
For a given block device, an inactive table slot contains the parameters
to configure the device with. The inactive table can be cleared
multiple times, accidentally or maliciously, which may impact the
functionality of the device, and compromise the system. Therefore it is
important to measure and log the event when a table is cleared.
Measure device parameters, and table hashes when the inactive table slot
is cleared.
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Presence of an active block-device, configured with expected parameters,
is important for an external attestation service to determine if a system
meets the attestation requirements. Therefore it is important for DM to
measure the device remove events.
Measure device parameters and table hashes when the device is removed,
using either remove or remove_all.
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
A given block device can load a table multiple times, with different
input parameters, before eventually resuming it. Further, a device may
be suspended and then resumed. The device may never resume after a
table-load. Because of the above valid scenarios for a given device,
it is important to measure and log the device resume event using IMA.
Also, if the table is large, measuring it in clear-text each time the
device changes state, will unnecessarily increase the size of IMA log.
Since the table clear-text is already measured during table-load event,
measuring the hash during resume should be sufficient to validate the
table contents.
Measure the device parameters, and hash of the active table, when the
device is resumed.
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
DM configures a block device with various target specific attributes
passed to it as a table. DM loads the table, and calls each target’s
respective constructors with the attributes as input parameters.
Some of these attributes are critical to ensure the device meets
certain security bar. Thus, IMA should measure these attributes, to
ensure they are not tampered with, during the lifetime of the device.
So that the external services can have high confidence in the
configuration of the block-devices on a given system.
Some devices may have large tables. And a given device may change its
state (table-load, suspend, resume, rename, remove, table-clear etc.)
many times. Measuring these attributes each time when the device
changes its state will significantly increase the size of the IMA logs.
Further, once configured, these attributes are not expected to change
unless a new table is loaded, or a device is removed and recreated.
Therefore the clear-text of the attributes should only be measured
during table load, and the hash of the active/inactive table should be
measured for the remaining device state changes.
Export IMA function ima_measure_critical_data() to allow measurement
of DM device parameters, as well as target specific attributes, during
table load. Compute the hash of the inactive table and store it for
measurements during future state change. If a load is called multiple
times, update the inactive table hash with the hash of the latest
populated table. So that the correct inactive table hash is measured
when the device transitions to different states like resume, remove,
rename, etc.
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com> # leak fix
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Move setting md->type from both callers into dm_setup_md_queue.
This ensures that md->type is only set to a valid value after the queue
has been fully setup, something we'll rely on future changes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we set non-empty param->name or param->uuid on the DM_LIST_DEVICES_CMD
ioctl, the set values are considered filter prefixes. The ioctl will only
return entries with matching name or uuid prefix.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
When LVM needs to find a device with a particular UUID it needs to ask for
UUID for each device. This patch returns UUID directly in the list of
devices, so that LVM doesn't have to query all the devices with an ioctl.
The UUID is returned if the flag DM_UUID_FLAG is set in the parameters.
Returning UUID is done in backward-compatible way. There's one unused
32-bit word value after the event number. This patch sets the bit
DM_NAME_LIST_FLAG_HAS_UUID if UUID is present and
DM_NAME_LIST_FLAG_DOESNT_HAVE_UUID if it isn't (if none of these bits is
set, then we have an old kernel that doesn't support returning UUIDs). The
UUID is stored after this word. The 'next' value is updated to point after
the UUID, so that old version of libdevmapper will skip the UUID without
attempting to interpret it.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
For high numbers of DM devices the 64-entry hash table has non-trivial
overhead. Fix this by replacing the hash table with a red-black tree.
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
If there are not any dm devices, we need to zero the "dev" argument in
the first structure dm_name_list. However, this can cause out of
bounds write, because the "needed" variable is zero and len may be
less than eight.
Fix this bug by reporting DM_BUFFER_FULL_FLAG if the result buffer is
too small to hold the "nl->dev" value.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 2ca4c92f58 ("dm ioctl: prevent empty message")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>