mainlining shenanigans
Go to file
Liu Bo 02a3307aa9 btrfs: fix reading stale metadata blocks after degraded raid1 mounts
If a btree block, aka. extent buffer, is not available in the extent
buffer cache, it'll be read out from the disk instead, i.e.

btrfs_search_slot()
  read_block_for_search()  # hold parent and its lock, go to read child
    btrfs_release_path()
    read_tree_block()  # read child

Unfortunately, the parent lock got released before reading child, so
commit 5bdd3536cb ("Btrfs: Fix block generation verification race") had
used 0 as parent transid to read the child block.  It forces
read_tree_block() not to check if parent transid is different with the
generation id of the child that it reads out from disk.

A simple PoC is included in btrfs/124,

0. A two-disk raid1 btrfs,

1. Right after mkfs.btrfs, block A is allocated to be device tree's root.

2. Mount this filesystem and put it in use, after a while, device tree's
   root got COW but block A hasn't been allocated/overwritten yet.

3. Umount it and reload the btrfs module to remove both disks from the
   global @fs_devices list.

4. mount -odegraded dev1 and write some data, so now block A is allocated
   to be a leaf in checksum tree.  Note that only dev1 has the latest
   metadata of this filesystem.

5. Umount it and mount it again normally (with both disks), since raid1
   can pick up one disk by the writer task's pid, if btrfs_search_slot()
   needs to read block A, dev2 which does NOT have the latest metadata
   might be read for block A, then we got a stale block A.

6. As parent transid is not checked, block A is marked as uptodate and
   put into the extent buffer cache, so the future search won't bother
   to read disk again, which means it'll make changes on this stale
   one and make it dirty and flush it onto disk.

To avoid the problem, parent transid needs to be passed to
read_tree_block().

In order to get a valid parent transid, we need to hold the parent's
lock until finishing reading child.

This patch needs to be slightly adapted for stable kernels, the
&first_key parameter added to read_tree_block() is from 4.16+
(581c176041). The fix is to replace 0 by 'gen'.

Fixes: 5bdd3536cb ("Btrfs: Fix block generation verification race")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-17 14:18:25 +02:00
arch Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-03-25 07:36:02 -10:00
block for-linus-20180302 2018-03-02 09:35:36 -08:00
certs certs/blacklist_nohashes.c: fix const confusion in certs blacklist 2018-02-21 15:35:43 -08:00
crypto X.509: fix NULL dereference when restricting key with unsupported_sig 2018-02-22 14:38:34 +00:00
Documentation Staging/IIO fixes for 4.16-rc7 2018-03-23 11:11:32 -07:00
drivers dmaengine fixes for v4.16-rc7 2018-03-25 07:45:10 -10:00
firmware kbuild: remove all dummy assignments to obj- 2017-11-18 11:46:06 +09:00
fs btrfs: fix reading stale metadata blocks after degraded raid1 mounts 2018-05-17 14:18:25 +02:00
include btrfs: qgroup: Update trace events for metadata reservation 2018-03-31 02:01:05 +02:00
init jump_label: Disable jump labels in __exit code 2018-03-20 08:57:17 +01:00
ipc Revert "mqueue: switch to on-demand creation of internal mount" 2018-03-24 19:34:23 -05:00
kernel Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-03-25 07:34:50 -10:00
lib libcrc32c: Add crc32c_impl function 2018-03-26 15:09:38 +02:00
LICENSES LICENSES: Add MPL-1.1 license 2018-01-06 10:59:44 -07:00
mm mm, thp: do not cause memcg oom for thp 2018-03-22 17:07:02 -07:00
net Two more fixes (in three patches): 2018-03-22 13:19:10 -04:00
samples - do not build samples when cross compiling (Michal Hocko) 2018-02-27 10:39:29 -08:00
scripts kbuild: Handle builtin dtb file names containing hyphens 2018-03-09 01:14:38 +09:00
security integrity/security: fix digsig.c build error with header file 2018-02-22 20:09:08 -08:00
sound ALSA: aloop: Fix access to not-yet-ready substream via cable 2018-03-22 10:40:27 +01:00
tools Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-03-25 07:36:02 -10:00
usr initramfs: fix initramfs rebuilds w/ compression after disabling 2017-11-03 07:39:19 -07:00
virt kvm/arm fixes for 4.16, take 2 2018-03-15 21:45:37 +01:00
.cocciconfig
.get_maintainer.ignore
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore .gitignore: ignore ASN.1 auto generated files 2018-02-14 21:05:38 +01:00
.mailmap mailmap: update Mark Yao's email address 2018-01-04 16:45:09 -08:00
COPYING
CREDITS MAINTAINERS: update TPM driver infrastructure changes 2017-11-09 17:58:40 -08:00
Kbuild Kbuild updates for v4.15 2017-11-17 17:45:29 -08:00
Kconfig License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
MAINTAINERS MAINTAINERS: update Mark Fasheh's e-mail 2018-03-22 17:07:01 -07:00
Makefile Linux 4.16-rc7 2018-03-25 12:44:30 -10:00
README README: add a new README file, pointing to the Documentation/ 2016-10-24 08:12:35 -02:00

Linux kernel
============

This file was moved to Documentation/admin-guide/README.rst

Please notice that there are several guides for kernel developers and users.
These guides can be rendered in a number of formats, like HTML and PDF.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.