mainlining shenanigans
Go to file
Filipe Manana 609e804d77 Btrfs: fix file corruption after snapshotting due to mix of buffered/DIO writes
When we are mixing buffered writes with direct IO writes against the same
file and snapshotting is happening concurrently, we can end up with a
corrupt file content in the snapshot. Example:

1) Inode/file is empty.

2) Snapshotting starts.

2) Buffered write at offset 0 length 256Kb. This updates the i_size of the
   inode to 256Kb, disk_i_size remains zero. This happens after the task
   doing the snapshot flushes all existing delalloc.

3) DIO write at offset 256Kb length 768Kb. Once the ordered extent
   completes it sets the inode's disk_i_size to 1Mb (256Kb + 768Kb) and
   updates the inode item in the fs tree with a size of 1Mb (which is
   the value of disk_i_size).

4) The dealloc for the range [0, 256Kb[ did not start yet.

5) The transaction used in the DIO ordered extent completion, which updated
   the inode item, is committed by the snapshotting task.

6) Snapshot creation completes.

7) Dealloc for the range [0, 256Kb[ is flushed.

After that when reading the file from the snapshot we always get zeroes for
the range [0, 256Kb[, the file has a size of 1Mb and the data written by
the direct IO write is found. From an application's point of view this is
a corruption, since in the source subvolume it could never read a version
of the file that included the data from the direct IO write without the
data from the buffered write included as well. In the snapshot's tree,
file extent items are missing for the range [0, 256Kb[.

The issue, obviously, does not happen when using the -o flushoncommit
mount option.

Fix this by flushing delalloc for all the roots that are about to be
snapshotted when committing a transaction. This guarantees total ordering
when updating the disk_i_size of an inode since the flush for dealloc is
done when a transaction is in the TRANS_STATE_COMMIT_START state and wait
is done once no more external writers exist. This is similar to what we
do when using the flushoncommit mount option, but we do it only if the
transaction has snapshots to create and only for the roots of the
subvolumes to be snapshotted. The bulk of the dealloc is flushed in the
snapshot creation ioctl, so the flush work we do inside the transaction
is minimized.

This issue, involving buffered and direct IO writes with snapshotting, is
often triggered by fstest btrfs/078, and got reported by fsck when not
using the NO_HOLES features, for example:

  $ cat results/btrfs/078.full
  (...)
  _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
  *** fsck.btrfs output ***
  [1/7] checking root items
  [2/7] checking extents
  [3/7] checking free space cache
  [4/7] checking fs roots
  root 258 inode 264 errors 100, file extent discount
  Found file extent holes:
        start: 524288, len: 65536
  ERROR: errors found in fs roots

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-03-13 17:13:48 +01:00
arch Bug fixes. 2019-02-24 09:47:07 -08:00
block for-linus-20190215 2019-02-15 09:12:28 -08:00
certs kbuild: remove redundant target cleaning on failure 2019-01-06 09:46:51 +09:00
crypto net: crypto set sk to NULL when af_alg_release. 2019-02-18 12:01:24 -08:00
Documentation Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-02-24 09:28:26 -08:00
drivers Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-02-24 09:28:26 -08:00
firmware kbuild: change filechk to surround the given command with { } 2019-01-06 09:46:51 +09:00
fs Btrfs: fix file corruption after snapshotting due to mix of buffered/DIO writes 2019-03-13 17:13:48 +01:00
include btrfs: qgroup: Move reserved data accounting from btrfs_delayed_ref_head to btrfs_qgroup_extent_record 2019-02-25 14:13:39 +01:00
init revert "initramfs: cleanup incomplete rootfs" 2019-02-21 09:00:59 -08:00
ipc ipc: IPCMNI limit check for semmni 2018-10-31 08:54:14 -07:00
kernel Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-02-24 09:28:26 -08:00
lib Merge branch 'fixes-v5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-02-20 09:09:33 -08:00
LICENSES This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
mm mm, memory_hotplug: fix off-by-one in is_pageblock_removable 2019-02-21 09:01:01 -08:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-02-24 09:28:26 -08:00
samples samples: mei: use /dev/mei0 instead of /dev/mei 2019-01-30 15:24:45 +01:00
scripts kallsyms: Handle too long symbols in kallsyms.c 2019-01-28 13:02:09 +09:00
security Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-02-24 09:28:26 -08:00
sound sound fixes for 5.0 2019-02-20 09:42:52 -08:00
tools selftests: fib_tests: sleep after changing carrier. again. 2019-02-23 18:34:20 -08:00
usr user/Makefile: Fix typo and capitalization in comment section 2018-12-11 00:18:03 +09:00
virt KVM/ARM fixes for 5.0: 2019-02-13 19:39:24 +01:00
.clang-format clang-format: Update .clang-format with the latest for_each macro list 2019-01-19 19:26:06 +01:00
.cocciconfig
.get_maintainer.ignore
.gitattributes
.gitignore kbuild: Add support for DT binding schema checks 2018-12-13 09:41:32 -06:00
.mailmap A few early MIPS fixes for 4.21: 2019-01-05 12:48:25 -08:00
COPYING
CREDITS CREDITS/MAINTAINERS: Retire parisc-linux.org email domain 2019-02-21 20:16:10 +01:00
Kbuild kbuild: use assignment instead of define ... endef for filechk_* rules 2019-01-06 10:22:35 +09:00
Kconfig kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
MAINTAINERS Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-02-24 09:28:26 -08:00
Makefile Linux 5.0-rc8 2019-02-24 16:46:45 -08:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.