mainlining shenanigans
Go to file
Jan Kara 142bbdfccc cfq: Disable writeback throttling by default
Writeback throttling does not play well with CFQ since that also tries
to throttle async writes. As a result async writeback can get starved in
presence of readers. As an example take a benchmark simulating
postgreSQL database running over a standard rotating SATA drive. There
are 16 processes doing random reads from a huge file (2*machine memory),
1 process doing random writes to the huge file and calling fsync once
per 50000 writes and 1 process doing sequential 8k writes to a
relatively small file wrapping around at the end of the file and calling
fsync every 5 writes. Under this load read latency easily exceeds the
target latency of 75 ms (just because there are so many reads happening
against a relatively slow disk) and thus writeback is throttled to a
point where only 1 write request is allowed at a time. Blktrace data
then looks like:

  8,0    1        0     8.347751764     0  m   N cfq workload slice:40000000
  8,0    1        0     8.347755256     0  m   N cfq293A  / set_active wl_class: 0 wl_type:0
  8,0    1        0     8.347784100     0  m   N cfq293A  / Not idling.  st->count:1
  8,0    1     3814     8.347763916  5839 UT   N [kworker/u9:2] 1
  8,0    0        0     8.347777605     0  m   N cfq293A  / Not idling.  st->count:1
  8,0    1        0     8.347784100     0  m   N cfq293A  / Not idling.  st->count:1
  8,0    3     1596     8.354364057     0  C   R 156109528 + 8 (6906954) [0]
  8,0    3        0     8.354383193     0  m   N cfq6196SN / complete rqnoidle 0
  8,0    3        0     8.354386476     0  m   N cfq schedule dispatch
  8,0    3        0     8.354399397     0  m   N cfq293A  / Not idling.  st->count:1
  8,0    3        0     8.354404705     0  m   N cfq293A  / dispatch_insert
  8,0    3        0     8.354409454     0  m   N cfq293A  / dispatched a request
  8,0    3        0     8.354412527     0  m   N cfq293A  / activate rq, drv=1
  8,0    3     1597     8.354414692     0  D   W 145961400 + 24 (6718452) [swapper/0]
  8,0    3        0     8.354484184     0  m   N cfq293A  / Not idling.  st->count:1
  8,0    3        0     8.354487536     0  m   N cfq293A  / slice expired t=0
  8,0    3        0     8.354498013     0  m   N / served: vt=5888102466265088 min_vt=5888074869387264
  8,0    3        0     8.354502692     0  m   N cfq293A  / sl_used=6737519 disp=1 charge=6737519 iops=0 sect=24
  8,0    3        0     8.354505695     0  m   N cfq293A  / del_from_rr
...
  8,0    0     1810     8.354728768     0  C   W 145961400 + 24 (314076) [0]
  8,0    0        0     8.354746927     0  m   N cfq293A  / complete rqnoidle 0
...
  8,0    1     3829     8.389886102  5839  G   W 145962968 + 24 [kworker/u9:2]
  8,0    1     3830     8.389888127  5839  P   N [kworker/u9:2]
  8,0    1     3831     8.389908102  5839  A   W 145978336 + 24 <- (8,4) 44000
  8,0    1     3832     8.389910477  5839  Q   W 145978336 + 24 [kworker/u9:2]
  8,0    1     3833     8.389914248  5839  I   W 145962968 + 24 (28146) [kworker/u9:2]
  8,0    1        0     8.389919137     0  m   N cfq293A  / insert_request
  8,0    1        0     8.389924305     0  m   N cfq293A  / add_to_rr
  8,0    1     3834     8.389933175  5839 UT   N [kworker/u9:2] 1
...
  8,0    0        0     9.455290997     0  m   N cfq workload slice:40000000
  8,0    0        0     9.455294769     0  m   N cfq293A  / set_active wl_class:0 wl_type:0
  8,0    0        0     9.455303499     0  m   N cfq293A  / fifo=ffff880003166090
  8,0    0        0     9.455306851     0  m   N cfq293A  / dispatch_insert
  8,0    0        0     9.455311251     0  m   N cfq293A  / dispatched a request
  8,0    0        0     9.455314324     0  m   N cfq293A  / activate rq, drv=1
  8,0    0     2043     9.455316210  6204  D   W 145962968 + 24 (1065401962) [pgioperf]
  8,0    0        0     9.455392407     0  m   N cfq293A  / Not idling.  st->count:1
  8,0    0        0     9.455395969     0  m   N cfq293A  / slice expired t=0
  8,0    0        0     9.455404210     0  m   N / served: vt=5888958194597888 min_vt=5888941810597888
  8,0    0        0     9.455410077     0  m   N cfq293A  / sl_used=4000000 disp=1 charge=4000000 iops=0 sect=24
  8,0    0        0     9.455416851     0  m   N cfq293A  / del_from_rr
...
  8,0    0     2045     9.455648515     0  C   W 145962968 + 24 (332305) [0]
  8,0    0        0     9.455668350     0  m   N cfq293A  / complete rqnoidle 0
...
  8,0    1     4371     9.455710115  5839  G   W 145978336 + 24 [kworker/u9:2]
  8,0    1     4372     9.455712350  5839  P   N [kworker/u9:2]
  8,0    1     4373     9.455730159  5839  A   W 145986616 + 24 <- (8,4) 52280
  8,0    1     4374     9.455732674  5839  Q   W 145986616 + 24 [kworker/u9:2]
  8,0    1     4375     9.455737563  5839  I   W 145978336 + 24 (27448) [kworker/u9:2]
  8,0    1        0     9.455742871     0  m   N cfq293A  / insert_request
  8,0    1        0     9.455747550     0  m   N cfq293A  / add_to_rr
  8,0    1     4376     9.455756629  5839 UT   N [kworker/u9:2] 1

So we can see a Q event for a write request, then IO is blocked by
writeback throttling and G and I events for the request happen only once
other writeback IO is completed. Thus CFQ always sees only one write
request. When it sees it, it queues the async queue behind all the read
queues and the async queue gets scheduled after about one second. When
it is scheduled, that one request gets dispatched and async queue is
expired as it has no more requests to submit. Overall we submit about
one write request per second.

Although this scheduling is beneficial for read latency, writes are
heavily starved and this causes large delays all over the system (due to
processes blocking on page lock, transaction starts, etc.). When
writeback throttling is disabled, write throughput is about one fifth of
a read throughput which roughly matches readers/writers ratio and
overall the system stalls are much shorter.

Mixing writeback throttling logic with CFQ throttling logic is always a
recipe for surprises as CFQ assumes it sees the big part of the picture
which is not necessarily true when writeback throttling is blocking
requests. So disable writeback throttling logic by default when CFQ is
used as an IO scheduler.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-05 08:15:08 -06:00
arch powerpc fixes for 4.11 #5 2017-03-19 18:49:28 -07:00
block cfq: Disable writeback throttling by default 2017-04-05 08:15:08 -06:00
certs certs: Add a secondary system keyring that can be added to dynamically 2016-04-11 22:48:09 +01:00
crypto net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
Documentation blk-throttle: make throtl_slice tunable 2017-03-28 08:02:20 -06:00
drivers nvme: factor request completion code into a common helper 2017-04-04 09:48:23 -06:00
firmware WHENCE: use https://linuxtv.org for LinuxTV URLs 2015-12-04 10:35:11 -02:00
fs block: Fix oops in locked_inode_to_wb_and_lock_list() 2017-03-22 20:11:33 -06:00
include nvme_fc: Clean up host fcpio done status handling 2017-04-04 09:48:23 -06:00
init Change get_random_{int,log} to use the CRNG used by /dev/urandom and 2017-03-11 09:08:47 -08:00
ipc Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-03-03 10:16:38 -08:00
kernel Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-03-18 08:33:44 -07:00
lib kobject: Export kobject_get_unless_zero() 2017-03-22 20:11:35 -06:00
mm bdi: Rename cgwb_bdi_destroy() to cgwb_bdi_unregister() 2017-03-22 20:11:31 -06:00
net net/utils: generic inet_pton_with_scope helper 2017-04-04 09:48:23 -06:00
samples Merge branch 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-03-03 11:38:56 -08:00
scripts Merge branch 'akpm' (patches from Andrew) 2017-03-10 08:34:42 -08:00
security Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-03-03 10:16:38 -08:00
sound scripts/spelling.txt: add "disble(d)" pattern and fix typo instances 2017-03-09 17:01:09 -08:00
tools Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-03-17 13:59:52 -07:00
usr kbuild: initramfs cleanup, set target from Kconfig 2017-01-05 09:40:16 -08:00
virt KVM: arm/arm64: VGIC: Fix command handling while ITS being disabled 2017-03-07 15:44:08 +00:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Add hch to .get_maintainer.ignore 2015-08-21 14:30:10 -07:00
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore Merge branch 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild 2016-08-02 16:48:52 -04:00
.mailmap mailmap: add codeaurora.org names for nameless email commits 2017-01-10 18:31:55 -08:00
COPYING
CREDITS MAINTAINERS: Remove old e-mail address 2017-02-13 12:24:56 -05:00
Kbuild scripts/gdb: provide linux constants 2016-05-23 17:04:14 -07:00
Kconfig
MAINTAINERS MAINTAINERS: usb251xb: remove reference inexistent file 2017-03-09 10:34:16 +01:00
Makefile Linux 4.11-rc3 2017-03-19 19:09:39 -07:00
README README: add a new README file, pointing to the Documentation/ 2016-10-24 08:12:35 -02:00

Linux kernel
============

This file was moved to Documentation/admin-guide/README.rst

Please notice that there are several guides for kernel developers and users.
These guides can be rendered in a number of formats, like HTML and PDF.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.