Pull btrfs updates from David Sterba:
"The changes range through all types: cleanups, core chagnes, sanity
checks, fixes, other user visible changes, detailed list below:
- deprecated: user transaction ioctl
- mount option ssd does not change allocation alignments
- degraded read-write mount is allowed if all the raid profile
constraints are met, now based on more accurate check
- defrag: do not reset compression afterwards; the NOCOMPRESS flag
can be now overriden by defrag
- prep work for better extent reference tracking (related to the
qgroup slowness with balance)
- prep work for compression heuristics
- memory allocation reductions (may help latencies on a loaded
system)
- better accounting for io waiting states
- error handling improvements (removed BUGs)
- added more sanity checks for shared refs
- fix readdir vs pagefault deadlock under some circumstances
- fix for 'no-hole' mode, certain combination of compressed and
inline extents
- send: fix emission of invalid clone operations
- fixup file mode if setting acls fail
- more fixes from fuzzing
- oher cleanups"
* 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (104 commits)
btrfs: submit superblock io with REQ_META and REQ_PRIO
btrfs: remove unnecessary memory barrier in btrfs_direct_IO
btrfs: remove superfluous chunk_tree argument from btrfs_alloc_dev_extent
btrfs: Remove chunk_objectid parameter of btrfs_alloc_dev_extent
btrfs: pass fs_info to btrfs_del_root instead of tree_root
Btrfs: add one more sanity check for shared ref type
Btrfs: remove BUG_ON in __add_tree_block
Btrfs: remove BUG() in add_data_reference
Btrfs: remove BUG() in print_extent_item
Btrfs: remove BUG() in btrfs_extent_inline_ref_size
Btrfs: convert to use btrfs_get_extent_inline_ref_type
Btrfs: add a helper to retrive extent inline ref type
btrfs: scrub: simplify scrub worker initialization
btrfs: scrub: clean up division in scrub_find_csum
btrfs: scrub: clean up division in __scrub_mark_bitmap
btrfs: scrub: use bool for flush_all_writes
btrfs: preserve i_mode if __btrfs_set_acl() fails
btrfs: Remove extraneous chunk_objectid variable
btrfs: Remove chunk_objectid argument from btrfs_make_block_group
btrfs: Remove extra parentheses from condition in copy_items()
...
- Continue to refactor the mmc block code to prepare for blkmq
- Move mmc block debugfs into block module
- Next step for eMMC CMDQ by adding a new mmc host interface for it
- Move Kconfig option MMC_DEBUG from core to host
- Some additional minor improvements
MMC host:
- Declare structs as const when applicable
- Explicitly request exclusive reset control when applicable
- Improve some error paths and other various cleanups
- sdhci: Preparations to support SDHCI OMAP
- sdhci: Improve some PM related code
- sdhci: Re-factoring and modernizations
- sdhci-xenon: Add runtime PM and system sleep support
- sdhci-xenon: Add support for eMMC HS400 Enhanced Strobe
- sdhci-cadence: Add system sleep support
- sdhci-of-at91: Improve system sleep support
- dw_mmc: Add support for Hisilicon hi3660
- sunxi: Add support for A83T eMMC
- sunxi: Add support for DDR52 mode
- meson-gx: Add support for UHS-I SD-cards
- meson-gx: Cleanups and improvements
- tmio: Fix CMD12 (STOP) handling
- tmio: Cleanups and improvements
- renesas_sdhi: Add r8a7743/5 support
- renesas-sdhi: Add support for R-Car Gen3 SDHI DMAC
- renesas_sdhi: Cleanups and improvements
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZr7CEAAoJEP4mhCVzWIwp5WQQAK9l2Hg1k4tFzxQ5EmKB/9Sm
r3eS2GrosqrsCffR3vSSKnmva/lOrQuBzhqzx1MvWAByMUc5w8Yc8OowrKGhCWm9
pAzi/3Tnjf7A9nAq+0NeGhkwybckam9ZpGhMyC1E4bp63g6PCoHjTcqOMVnjYxHz
cUNNQUz7oCjW6tjtpvdJQWZuIGiScNuyxrTYKi8SUpQZ0LQo8nU9DujKcwsKsZed
gYEIimqOqZnGz1rWs/EP2Y5TSoPVxvnb6nc90gt8kh0nfXYumxKjEmHZ0PB7K97b
pioCN/THtkDgdYn8j3gnDXZYYa6JA4fKKOw+S6VZraLoVLeDtLo5zK353Rr3BscI
SddxLePp5WclRal+WulLLJs1FeY5PN3ji+mxC3FAG6cvCqIyosyU8HKG79Lhwwl6
7qlaDf27BhK71Sf17jzxtc5OwVTkSsY+9iKzVZAw5tIHSLR+nwhjM2vlAVU+oG2r
KAsuVO1CVAqYbeIBJ85R6bPzgRGxQ0Kmkqwxe1QDVhgXl3eC5Ot5N/bOifv7HzV+
m+6W1Wdw6/tUKD5g5c6s2WMijXgTdEnfj7dYXmHHN4q1abAKj0cOVjXtmVb90DHM
5tvfxNurQZCCLo2A88/BYXRd299vBzOy9HAWvMvt5effQfxgFfpC1gc9NkfUTfkA
FTOQ96vOpOmAH5uA0Xvm
=850Z
-----END PGP SIGNATURE-----
Merge tag 'mmc-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC updates from Ulf Hansson:
"MMC core:
- Continue to refactor the mmc block code to prepare for blkmq
- Move mmc block debugfs into block module
- Next step for eMMC CMDQ by adding a new mmc host interface for it
- Move Kconfig option MMC_DEBUG from core to host
- Some additional minor improvements
MMC host:
- Declare structs as const when applicable
- Explicitly request exclusive reset control when applicable
- Improve some error paths and other various cleanups
- sdhci: Preparations to support SDHCI OMAP
- sdhci: Improve some PM related code
- sdhci: Re-factoring and modernizations
- sdhci-xenon: Add runtime PM and system sleep support
- sdhci-xenon: Add support for eMMC HS400 Enhanced Strobe
- sdhci-cadence: Add system sleep support
- sdhci-of-at91: Improve system sleep support
- dw_mmc: Add support for Hisilicon hi3660
- sunxi: Add support for A83T eMMC
- sunxi: Add support for DDR52 mode
- meson-gx: Add support for UHS-I SD-cards
- meson-gx: Cleanups and improvements
- tmio: Fix CMD12 (STOP) handling
- tmio: Cleanups and improvements
- renesas_sdhi: Add r8a7743/5 support
- renesas-sdhi: Add support for R-Car Gen3 SDHI DMAC
- renesas_sdhi: Cleanups and improvements"
* tag 'mmc-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (145 commits)
mmc: renesas_sdhi: Add r8a7743/5 support
mmc: meson-gx: fix __ffsdi2 undefined on arm32
mmc: sdhci-xenon: add runtime pm support and reimplement standby
mmc: core: Move mmc_start_areq() declaration
mmc: mmci: stop building qcom dml as module
mmc: sunxi: Reset the device at probe time
clk: sunxi-ng: Provide a default reset hook
mmc: meson-gx: rework tuning function
mmc: meson-gx: change default tx phase
mmc: meson-gx: implement voltage switch callback
mmc: meson-gx: use CCF to handle the clock phases
mmc: meson-gx: implement card_busy callback
mmc: meson-gx: simplify interrupt handler
mmc: meson-gx: work around clk-stop issue
mmc: meson-gx: fix dual data rate mode frequencies
mmc: meson-gx: rework clock init function
mmc: meson-gx: rework clk_set function
mmc: meson-gx: rework set_ios function
mmc: meson-gx: cfg init overwrite values
mmc: meson-gx: initialize sane clk default before clock register
...
Pull block layer updates from Jens Axboe:
"This is the first pull request for 4.14, containing most of the code
changes. It's a quiet series this round, which I think we needed after
the churn of the last few series. This contains:
- Fix for a registration race in loop, from Anton Volkov.
- Overflow complaint fix from Arnd for DAC960.
- Series of drbd changes from the usual suspects.
- Conversion of the stec/skd driver to blk-mq. From Bart.
- A few BFQ improvements/fixes from Paolo.
- CFQ improvement from Ritesh, allowing idling for group idle.
- A few fixes found by Dan's smatch, courtesy of Dan.
- A warning fixup for a race between changing the IO scheduler and
device remova. From David Jeffery.
- A few nbd fixes from Josef.
- Support for cgroup info in blktrace, from Shaohua.
- Also from Shaohua, new features in the null_blk driver to allow it
to actually hold data, among other things.
- Various corner cases and error handling fixes from Weiping Zhang.
- Improvements to the IO stats tracking for blk-mq from me. Can
drastically improve performance for fast devices and/or big
machines.
- Series from Christoph removing bi_bdev as being needed for IO
submission, in preparation for nvme multipathing code.
- Series from Bart, including various cleanups and fixes for switch
fall through case complaints"
* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
kernfs: checking for IS_ERR() instead of NULL
drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
drbd: Fix allyesconfig build, fix recent commit
drbd: switch from kmalloc() to kmalloc_array()
drbd: abort drbd_start_resync if there is no connection
drbd: move global variables to drbd namespace and make some static
drbd: rename "usermode_helper" to "drbd_usermode_helper"
drbd: fix race between handshake and admin disconnect/down
drbd: fix potential deadlock when trying to detach during handshake
drbd: A single dot should be put into a sequence.
drbd: fix rmmod cleanup, remove _all_ debugfs entries
drbd: Use setup_timer() instead of init_timer() to simplify the code.
drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
drbd: new disk-option disable-write-same
drbd: Fix resource role for newly created resources in events2
drbd: mark symbols static where possible
drbd: Send P_NEG_ACK upon write error in protocol != C
drbd: add explicit plugging when submitting batches
drbd: change list_for_each_safe to while(list_first_entry_or_null)
drbd: introduce drbd_recv_header_maybe_unplug
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJZsBbYAAoJELDendYovxMv4hoH/39psrSeHw2hPX78KJ6orq4v
mTVEP2gLA/qxaM03EnFljXfd88J8NcJsxv7vVjh/U4xRwntvAMovCkygkkO1aw93
nZEhUq6IGupr8KzmqQi5U7WtiWAXFwDbGSasnOKEj/lLa7E0/9MsYYQ01FS6oFkc
c9CHONaCWepdz0Xpt7s6BKyzo74ZbJeCc5rUZU81oH40XphaZEoy8E9NOgDdfz3l
VvPSaxZvebynT8JKDe4KxrMPpBjhr7mwgLcXk/Zy2EzOzxFSxXLsDAnwjtCW1gTh
lPLD4TkgtziDfPfZXxFH3J34IUe1tZ2M+7Cz157FBu6BKX/g9ETQT24DXWDzFuI=
=cgfV
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.14b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- the new pvcalls backend for routing socket calls from a guest to dom0
- some cleanups of Xen code
- a fix for wrong usage of {get,put}_cpu()
* tag 'for-linus-4.14b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (27 commits)
xen/mmu: set MMU_NORMAL_PT_UPDATE in remap_area_mfn_pte_fn
xen: Don't try to call xen_alloc_p2m_entry() on autotranslating guests
xen/events: events_fifo: Don't use {get,put}_cpu() in xen_evtchn_fifo_init()
xen/pvcalls: use WARN_ON(1) instead of __WARN()
xen: remove not used trace functions
xen: remove unused function xen_set_domain_pte()
xen: remove tests for pvh mode in pure pv paths
xen-platform: constify pci_device_id.
xen: cleanup xen.h
xen: introduce a Kconfig option to enable the pvcalls backend
xen/pvcalls: implement write
xen/pvcalls: implement read
xen/pvcalls: implement the ioworker functions
xen/pvcalls: disconnect and module_exit
xen/pvcalls: implement release command
xen/pvcalls: implement poll command
xen/pvcalls: implement accept command
xen/pvcalls: implement listen command
xen/pvcalls: implement bind command
xen/pvcalls: implement connect command
...
Merge updates from Andrew Morton:
- various misc bits
- DAX updates
- OCFS2
- most of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (119 commits)
mm,fork: introduce MADV_WIPEONFORK
x86,mpx: make mpx depend on x86-64 to free up VMA flag
mm: add /proc/pid/smaps_rollup
mm: hugetlb: clear target sub-page last when clearing huge page
mm: oom: let oom_reap_task and exit_mmap run concurrently
swap: choose swap device according to numa node
mm: replace TIF_MEMDIE checks by tsk_is_oom_victim
mm, oom: do not rely on TIF_MEMDIE for memory reserves access
z3fold: use per-cpu unbuddied lists
mm, swap: don't use VMA based swap readahead if HDD is used as swap
mm, swap: add sysfs interface for VMA based swap readahead
mm, swap: VMA based swap readahead
mm, swap: fix swap readahead marking
mm, swap: add swap readahead hit statistics
mm/vmalloc.c: don't reinvent the wheel but use existing llist API
mm/vmstat.c: fix wrong comment
selftests/memfd: add memfd_create hugetlbfs selftest
mm/shmem: add hugetlbfs support to memfd_create()
mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups
mm/vmalloc.c: halve the number of comparisons performed in pcpu_get_vm_areas()
...
Introduce MADV_WIPEONFORK semantics, which result in a VMA being empty
in the child process after fork. This differs from MADV_DONTFORK in one
important way.
If a child process accesses memory that was MADV_WIPEONFORK, it will get
zeroes. The address ranges are still valid, they are just empty.
If a child process accesses memory that was MADV_DONTFORK, it will get a
segmentation fault, since those address ranges are no longer valid in
the child after fork.
Since MADV_DONTFORK also seems to be used to allow very large programs
to fork in systems with strict memory overcommit restrictions, changing
the semantics of MADV_DONTFORK might break existing programs.
MADV_WIPEONFORK only works on private, anonymous VMAs.
The use case is libraries that store or cache information, and want to
know that they need to regenerate it in the child process after fork.
Examples of this would be:
- systemd/pulseaudio API checks (fail after fork) (replacing a getpid
check, which is too slow without a PID cache)
- PKCS#11 API reinitialization check (mandated by specification)
- glibc's upcoming PRNG (reseed after fork)
- OpenSSL PRNG (reseed after fork)
The security benefits of a forking server having a re-inialized PRNG in
every child process are pretty obvious. However, due to libraries
having all kinds of internal state, and programs getting compiled with
many different versions of each library, it is unreasonable to expect
calling programs to re-initialize everything manually after fork.
A further complication is the proliferation of clone flags, programs
bypassing glibc's functions to call clone directly, and programs calling
unshare, causing the glibc pthread_atfork hook to not get called.
It would be better to have the kernel take care of this automatically.
The patch also adds MADV_KEEPONFORK, to undo the effects of a prior
MADV_WIPEONFORK.
This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:
https://man.openbsd.org/minherit.2
[akpm@linux-foundation.org: numerically order arch/parisc/include/uapi/asm/mman.h #defines]
Link: http://lkml.kernel.org/r/20170811212829.29186-3-riel@redhat.com
Signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Florian Weimer <fweimer@redhat.com>
Reported-by: Colm MacCártaigh <colm@allcosts.net>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Cc: <linux-api@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When servicing mmap() reads from file holes the current DAX code
allocates a page cache page of all zeroes and places the struct page
pointer in the mapping->page_tree radix tree.
This has three major drawbacks:
1) It consumes memory unnecessarily. For every 4k page that is read via
a DAX mmap() over a hole, we allocate a new page cache page. This
means that if you read 1GiB worth of pages, you end up using 1GiB of
zeroed memory. This is easily visible by looking at the overall
memory consumption of the system or by looking at /proc/[pid]/smaps:
7f62e72b3000-7f63272b3000 rw-s 00000000 103:00 12 /root/dax/data
Size: 1048576 kB
Rss: 1048576 kB
Pss: 1048576 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 1048576 kB
Private_Dirty: 0 kB
Referenced: 1048576 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB
2) It is slower than using a common zero page because each page fault
has more work to do. Instead of just inserting a common zero page we
have to allocate a page cache page, zero it, and then insert it. Here
are the average latencies of dax_load_hole() as measured by ftrace on
a random test box:
Old method, using zeroed page cache pages: 3.4 us
New method, using the common 4k zero page: 0.8 us
This was the average latency over 1 GiB of sequential reads done by
this simple fio script:
[global]
size=1G
filename=/root/dax/data
fallocate=none
[io]
rw=read
ioengine=mmap
3) The fact that we had to check for both DAX exceptional entries and
for page cache pages in the radix tree made the DAX code more
complex.
Solve these issues by following the lead of the DAX PMD code and using a
common 4k zero page instead. As with the PMD code we will now insert a
DAX exceptional entry into the radix tree instead of a struct page
pointer which allows us to remove all the special casing in the DAX
code.
Note that we do still pretty aggressively check for regular pages in the
DAX radix tree, especially where we take action based on the bits set in
the page. If we ever find a regular page in our radix tree now that
most likely means that someone besides DAX is inserting pages (which has
happened lots of times in the past), and we want to find that out early
and fail loudly.
This solution also removes the extra memory consumption. Here is that
same /proc/[pid]/smaps after 1GiB of reading from a hole with the new
code:
7f2054a74000-7f2094a74000 rw-s 00000000 103:00 12 /root/dax/data
Size: 1048576 kB
Rss: 0 kB
Pss: 0 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 0 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB
Overall system memory consumption is similarly improved.
Another major change is that we remove dax_pfn_mkwrite() from our fault
flow, and instead rely on the page fault itself to make the PTE dirty
and writeable. The following description from the patch adding the
vm_insert_mixed_mkwrite() call explains this a little more:
"To be able to use the common 4k zero page in DAX we need to have our
PTE fault path look more like our PMD fault path where a PTE entry
can be marked as dirty and writeable as it is first inserted rather
than waiting for a follow-up dax_pfn_mkwrite() =>
finish_mkwrite_fault() call.
Right now we can rely on having a dax_pfn_mkwrite() call because we
can distinguish between these two cases in do_wp_page():
case 1: 4k zero page => writable DAX storage
case 2: read-only DAX storage => writeable DAX storage
This distinction is made by via vm_normal_page(). vm_normal_page()
returns false for the common 4k zero page, though, just as it does
for DAX ptes. Instead of special casing the DAX + 4k zero page case
we will simplify our DAX PTE page fault sequence so that it matches
our DAX PMD sequence, and get rid of the dax_pfn_mkwrite() helper.
We will instead use dax_iomap_fault() to handle write-protection
faults.
This means that insert_pfn() needs to follow the lead of
insert_pfn_pmd() and allow us to pass in a 'mkwrite' flag. If
'mkwrite' is set insert_pfn() will do the work that was previously
done by wp_page_reuse() as part of the dax_pfn_mkwrite() call path"
Link: http://lkml.kernel.org/r/20170724170616.25810-4-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking updates from David Miller:
1) Support ipv6 checksum offload in sunvnet driver, from Shannon
Nelson.
2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric
Dumazet.
3) Allow generic XDP to work on virtual devices, from John Fastabend.
4) Add bpf device maps and XDP_REDIRECT, which can be used to build
arbitrary switching frameworks using XDP. From John Fastabend.
5) Remove UFO offloads from the tree, gave us little other than bugs.
6) Remove the IPSEC flow cache, from Florian Westphal.
7) Support ipv6 route offload in mlxsw driver.
8) Support VF representors in bnxt_en, from Sathya Perla.
9) Add support for forward error correction modes to ethtool, from
Vidya Sagar Ravipati.
10) Add time filter for packet scheduler action dumping, from Jamal Hadi
Salim.
11) Extend the zerocopy sendmsg() used by virtio and tap to regular
sockets via MSG_ZEROCOPY. From Willem de Bruijn.
12) Significantly rework value tracking in the BPF verifier, from Edward
Cree.
13) Add new jump instructions to eBPF, from Daniel Borkmann.
14) Rework rtnetlink plumbing so that operations can be run without
taking the RTNL semaphore. From Florian Westphal.
15) Support XDP in tap driver, from Jason Wang.
16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal.
17) Add Huawei hinic ethernet driver.
18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan
Delalande.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits)
i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq
i40e: avoid NVM acquire deadlock during NVM update
drivers: net: xgene: Remove return statement from void function
drivers: net: xgene: Configure tx/rx delay for ACPI
drivers: net: xgene: Read tx/rx delay for ACPI
rocker: fix kcalloc parameter order
rds: Fix non-atomic operation on shared flag variable
net: sched: don't use GFP_KERNEL under spin lock
vhost_net: correctly check tx avail during rx busy polling
net: mdio-mux: add mdio_mux parameter to mdio_mux_init()
rxrpc: Make service connection lookup always check for retry
net: stmmac: Delete dead code for MDIO registration
gianfar: Fix Tx flow control deactivation
cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6
cxgb4: Fix pause frame count in t4_get_port_stats
cxgb4: fix memory leak
tun: rename generic_xdp to skb_xdp
tun: reserve extra headroom only when XDP is set
net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping
net: dsa: bcm_sf2: Advertise number of egress queues
...
This extends bridge fdb table tracepoints to also cover
learned fdb entries in the br_fdb_update path. Note that
unlike other tracepoints I have moved this to when the fdb
is modified because this is in the datapath and can generate
a lot of noise in the trace output. br_fdb_update is also called
from added_by_user context in the NTF_USE case which is already
traced ..hence the !added_by_user check.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are some Xen specific trace functions defined in
include/trace/events/xen.h. Remove them.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
The function xen_set_domain_pte() is used nowhere in the kernel.
Remove it.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Most of the information needed to issue requests to a CQE is already in
struct mmc_request and struct mmc_data. Add data block address, some flags,
and the task id (tag), and allow for cmd being NULL which it is for CQE
tasks.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
A few useful tracepoints to trace bridge forwarding
database updates.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Creating as specific xdp_redirect_map variant of the xdp tracepoints
allow users to write simpler/faster BPF progs that get attached to
these tracepoints.
Goal is to still keep the tracepoints in xdp_redirect and xdp_redirect_map
similar enough, that a tool can read the top part of the TP_STRUCT and
produce similar monitor statistics.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a need to separate the xdp_redirect tracepoint into two
tracepoints, for separating the error case from the normal forward
case.
Due to the extreme speeds XDP is operating at, loading a tracepoint
have a measurable impact. Single core XDP REDIRECT (ethtool tuned
rx-usecs 25) can do 13.7 Mpps forwarding, but loading a simple
bpf_prog at the tracepoint (with a return 0) reduce perf to 10.2 Mpps
(CPU E5-1650 v4 @ 3.60GHz, driver: ixgbe)
The overhead of loading a bpf-based tracepoint can be calculated to
cost 25 nanosec ((1/13782002-1/10267937)*10^9 = -24.83 ns).
Using perf record on the tracepoint event, with a non-matching --filter
expression, the overhead is much larger. Performance drops to 8.3 Mpps,
cost 48 nanosec ((1/13782002-1/8312497)*10^9 = -47.74))
Having a separate tracepoint for err cases, which should be less
frequent, allow running a continuous monitor for errors while not
affecting the redirect forward performance (this have also been
verified by measurements).
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Given previous patch expose the map_id, it seems natural to also
report the bpf prog id.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To make sense of the map index, the tracepoint user also need to know
that map we are talking about. Supply the map pointer but only expose
the map->id.
The 'to_index' is renamed 'to_ifindex'. In the xdp_redirect_map case,
this is the result of the devmap lookup. The map lookup key is exposed
as map_index, which is needed to troubleshoot in case the lookup failed.
The 'to_ifindex' is placed after 'err' to keep TP_STRUCT as common as
possible.
This also keeps the TP_STRUCT similar enough, that userspace can write
a monitor program, that doesn't need to care about whether
bpf_redirect or bpf_redirect_map were used.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Supplying the action argument XDP_REDIRECT to the tracepoint xdp_redirect
is redundant as it is only called in-case this action was specified.
Remove the argument, but keep "act" member of the tracepoint struct and
populate it with XDP_REDIRECT. This makes it easier to write a common bpf_prog
processing events.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the net_device string name from the xdp_exception tracepoint,
like the xdp_redirect tracepoint.
Align the TP_STRUCT to have common entries between these two
tracepoint.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is too much overhead in the current trace_xdp_redirect
tracepoint as it does strcpy and strlen on the net_device names.
Besides, exposing the ifindex/index is actually the information that
is needed in the tracepoint to diagnose issues. When a lookup fails
(either ifindex or devmap index) then there is a need for saying which
to_index that have issues.
V2: Adjust args to be aligned with trace_xdp_exception.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This way we don't need a block_device structure to submit I/O. The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open. Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).
For the actual I/O path all that we need is the gendisk, which exists
once per block device. But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.
Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The return error code need to be included in the tracepoint
xdp:xdp_redirect, else its not possible to distinguish successful or
failed XDP_REDIRECT transmits.
XDP have no queuing mechanism. Thus, it is fairly easily to overrun a
NIC transmit queue. The eBPF program invoking helpers (bpf_redirect
or bpf_redirect_map) to redirect a packet doesn't get any feedback
whether the packet was actually transmitted.
Info on failed transmits in the tracepoint xdp:xdp_redirect, is
interesting as this opens for providing a feedback-loop to the
receiving XDP program.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The main purpose of this tracepoint is to monitor bulk dequeue
in the network qdisc layer, as it cannot be deducted from the
existing qdisc stats.
The txq_state can be used for determining the reason for zero packet
dequeues, see enum netdev_queue_state_t.
Notice all packets doesn't necessary activate this tracepoint. As
qdiscs with flag TCQ_F_CAN_BYPASS, can directly invoke
sch_direct_xmit() when qdisc_qlen is zero.
Remember that perf record supports filters like:
perf record -e qdisc:qdisc_dequeue \
--filter 'ifindex == 4 && (packets > 1 || txq_state > 0)'
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All callers of flush_space pass the same number for orig/num_bytes
arguments. Let's remove one of the numbers and also modify the trace
point to show only a single number - bytes requested.
Seems that last point where the two parameters were treated differently
is before the ticketed enospc rework.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch adds a tracepoint event for prelim_ref insertion and
merging. For each, the ref being inserted or merged and the count
of tree nodes is issued.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Tracepoint arguments are all read-only. If we mark the arguments
as const, we're able to keep or convert those arguments to const
where appropriate.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While comparing signed and unsigned variables, compiler will converts the
signed value to unsigned one, due to this reason, {in,de}crease_sleep_time
may return overflowed result.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This series includes some mlx5 updates for both net-next and rdma trees.
From Saeed,
Core driver updates to allow selectively building the driver with
or without some large driver components, such as
- E-Switch (Ethernet SRIOV support).
- Multi-Physical Function Switch (MPFs) support.
For that we split E-Switch and MPFs functionalities into separate files.
From Erez,
Delay mlx5_core events when mlx5 interfaces, namely mlx5_ib, registration
is taking place and until it completes.
From Rabie,
Increase the maximum supported flow counters.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJZiDoAAAoJEEg/ir3gV/o+594H/RH5kRwC719s/5YQFJXvGsVC
fjtj3UUJPLrWB8XBh7a4PRcxXPIHaFKJuY3MU7KHFIeZQFklJcit3njjpxDlUINo
F5S1LHBSYBkeMD/ksWBA8OLCBprNGN6WQ2tuFfAjZlQQ44zqv8LJmegoDtW9bGRy
aGAkjUmALEblQsq81y0BQwN2/8DA8HAywrs8L2dkH1LHwijoIeYMZFOtKugv1FbB
ABSKxcU7D/NYw6rsVdZG59fHFQ+eKOspDFqBZrUzfQ+zUU2hFFo96ovfXBfIqYCV
7BtJuKXu2LeGPzFLsuw4h1131iqFT1iSMy9fEhf/4OwaL/KPP/+Umy8vP/XfM+U=
=wCpd
-----END PGP SIGNATURE-----
Merge tag 'mlx5-shared-2017-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================
mlx5-shared-2017-08-07
This series includes some mlx5 updates for both net-next and rdma trees.
From Saeed,
Core driver updates to allow selectively building the driver with
or without some large driver components, such as
- E-Switch (Ethernet SRIOV support).
- Multi-Physical Function Switch (MPFs) support.
For that we split E-Switch and MPFs functionalities into separate files.
From Erez,
Delay mlx5_core events when mlx5 interfaces, namely mlx5_ib, registration
is taking place and until it completes.
From Rabie,
Increase the maximum supported flow counters.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Two variables in ext4_inode_info, i_reserved_meta_blocks and
i_allocated_meta_blocks, are unused. Removing them saves a little
memory per in-memory inode and cleans up clutter in several tracepoints.
Adjust tracepoint output from ext4_alloc_da_blocks() for consistency
and fix a typo and whitespace near these changes.
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
inode number and generation can identify a kernfs node. We are going to
export the identification by exportfs operations, so put ino and
generation into a separate structure. It's convenient when later patches
use the identification.
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Strings used in event tracing need to be specially handled, for example,
being copied to the trace buffer instead of being pointed to by the trace
buffer. Although the TPS() macro can be used to "launder" pointed-to
strings, this might not be all that effective within a loadable module.
This commit therefore copies rcutorture's strings to the trace buffer.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
This adds a trace event for xdp redirect which may help when debugging
XDP programs that use redirect bpf commands.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
the page allocator. This has been true but only for allocations
requests larger than PAGE_ALLOC_COSTLY_ORDER. It has been always
ignored for smaller sizes. This is a bit unfortunate because there is
no way to express the same semantic for those requests and they are
considered too important to fail so they might end up looping in the
page allocator for ever, similarly to GFP_NOFAIL requests.
Now that the whole tree has been cleaned up and accidental or misled
usage of __GFP_REPEAT flag has been removed for !costly requests we can
give the original flag a better name and more importantly a more useful
semantic. Let's rename it to __GFP_RETRY_MAYFAIL which tells the user
that the allocator would try really hard but there is no promise of a
success. This will work independent of the order and overrides the
default allocator behavior. Page allocator users have several levels of
guarantee vs. cost options (take GFP_KERNEL as an example)
- GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_
attempt to free memory at all. The most light weight mode which even
doesn't kick the background reclaim. Should be used carefully because
it might deplete the memory and the next user might hit the more
aggressive reclaim
- GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic
allocation without any attempt to free memory from the current
context but can wake kswapd to reclaim memory if the zone is below
the low watermark. Can be used from either atomic contexts or when
the request is a performance optimization and there is another
fallback for a slow path.
- (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) -
non sleeping allocation with an expensive fallback so it can access
some portion of memory reserves. Usually used from interrupt/bh
context with an expensive slow path fallback.
- GFP_KERNEL - both background and direct reclaim are allowed and the
_default_ page allocator behavior is used. That means that !costly
allocation requests are basically nofail but there is no guarantee of
that behavior so failures have to be checked properly by callers
(e.g. OOM killer victim is allowed to fail currently).
- GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior
and all allocation requests fail early rather than cause disruptive
reclaim (one round of reclaim in this implementation). The OOM killer
is not invoked.
- GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator
behavior and all allocation requests try really hard. The request
will fail if the reclaim cannot make any progress. The OOM killer
won't be triggered.
- GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior
and all allocation requests will loop endlessly until they succeed.
This might be really dangerous especially for larger orders.
Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL
because they already had their semantic. No new users are added.
__alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if
there is no progress and we have already passed the OOM point.
This means that all the reclaim opportunities have been exhausted except
the most disruptive one (the OOM killer) and a user defined fallback
behavior is more sensible than keep retrying in the page allocator.
[akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c]
[mhocko@suse.com: semantic fix]
Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz
[mhocko@kernel.org: address other thing spotted by Vlastimil]
Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alex Belits <alex.belits@cavium.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: NeilBrown <neilb@suse.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull i2c updates from Wolfram Sang:
"This pull request contains:
- i2c core reorganization. One source file became too monolithic. It
is now split up, yet we still have the same named object as the
final output. This should ease maintenance.
- new drivers: ZTE ZX2967 family, ASPEED 24XX/25XX
- designware driver gained slave mode support
- xgene-slimpro driver gained ACPI support
- bigger overhaul for pca-platform driver
- the algo-bit module now supports messages with enforced STOP
- slightly bigger than usual set of driver updates and improvements
and with much appreciated quality assurance from Andy Shevchenko"
* 'i2c/for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (51 commits)
i2c: Provide a stub for i2c_detect_slave_mode()
i2c: designware: Let slave adapter support be optional
i2c: designware: Make HW init functions static
i2c: designware: fix spelling mistakes
i2c: pca-platform: propagate error from i2c_pca_add_numbered_bus
i2c: pca-platform: correctly set algo_data.reset_chip
i2c: acpi: Do not create i2c-clients for LNXVIDEO ACPI devices
i2c: designware: enable SLAVE in platform module
i2c: designware: add SLAVE mode functions
i2c: zx2967: drop COMPILE_TEST dependency
i2c: zx2967: always use the same device when printing errors
i2c: pca-platform: use dev_warn/dev_info instead of printk
i2c: pca-platform: use device managed allocations
i2c: pca-platform: add devicetree awareness
i2c: pca-platform: switch to struct gpio_desc
dt-bindings: add bindings for i2c-pca-platform
i2c: cadance: fix ctrl/addr reg write order
i2c: zx2967: add i2c controller driver for ZTE's zx2967 family
dt: bindings: add documentation for zx2967 family i2c controller
i2c: algo-bit: add support for I2C_M_STOP
...
Merge more updates from Andrew Morton:
- most of the rest of MM
- KASAN updates
- lib/ updates
- checkpatch updates
- some binfmt_elf changes
- various misc bits
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (115 commits)
kernel/exit.c: avoid undefined behaviour when calling wait4()
kernel/signal.c: avoid undefined behaviour in kill_something_info
binfmt_elf: safely increment argv pointers
s390: reduce ELF_ET_DYN_BASE
powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB
arm64: move ELF_ET_DYN_BASE to 4GB / 4MB
arm: move ELF_ET_DYN_BASE to 4MB
binfmt_elf: use ELF_ET_DYN_BASE only for PIE
fs, epoll: short circuit fetching events if thread has been killed
checkpatch: improve multi-line alignment test
checkpatch: improve macro reuse test
checkpatch: change format of --color argument to --color[=WHEN]
checkpatch: silence perl 5.26.0 unescaped left brace warnings
checkpatch: improve tests for multiple line function definitions
checkpatch: remove false warning for commit reference
checkpatch: fix stepping through statements with $stat and ctx_statement_block
checkpatch: [HLP]LIST_HEAD is also declaration
checkpatch: warn when a MAINTAINERS entry isn't [A-Z]:\t
checkpatch: improve the unnecessary OOM message test
lib/bsearch.c: micro-optimize pivot position calculation
...
During the debugging of the problem described in
https://lkml.org/lkml/2017/5/17/542 and fixed by Tetsuo Handa in
https://lkml.org/lkml/2017/5/19/383 , I've found that the existing debug
output is not really useful to understand issues related to the oom
reaper.
So, I assume, that adding some tracepoints might help with debugging of
similar issues.
Trace the following events:
1) a process is marked as an oom victim,
2) a process is added to the oom reaper list,
3) the oom reaper starts reaping process's mm,
4) the oom reaper finished reaping,
5) the oom reaper skips reaping.
How it works in practice? Below is an example which show how the problem
mentioned above can be found: one process is added twice to the
oom_reaper list:
$ cd /sys/kernel/debug/tracing
$ echo "oom:mark_victim" > set_event
$ echo "oom:wake_reaper" >> set_event
$ echo "oom:skip_task_reaping" >> set_event
$ echo "oom:start_task_reaping" >> set_event
$ echo "oom:finish_task_reaping" >> set_event
$ cat trace_pipe
allocate-502 [001] .... 91.836405: mark_victim: pid=502
allocate-502 [001] .N.. 91.837356: wake_reaper: pid=502
allocate-502 [000] .N.. 91.871149: wake_reaper: pid=502
oom_reaper-23 [000] .... 91.871177: start_task_reaping: pid=502
oom_reaper-23 [000] .N.. 91.879511: finish_task_reaping: pid=502
oom_reaper-23 [000] .... 91.879580: skip_task_reaping: pid=502
Link: http://lkml.kernel.org/r/20170530185231.GA13412@castle
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After enabling CONFIG_TRACE_ENUM_MAP_FILE (which will soon be renamed to
CONFIG_TRACE_EVAL_MAP_FILE), I am able to examine the enums that have
been evaluated:
# cat /sys/kernel/debug/tracing/enum_map
(which will soon be renamed to eval_map)
And it showed some interesting results:
[..]
ZONE_MOVABLE 3 (oom)
ZONE_NORMAL 2 (oom)
ZONE_DMA32 1 (oom)
ZONE_DMA 0 (oom)
3 3 (oom)
2 2 (oom)
1 1 (oom)
COMPACT_PRIO_ASYNC 2 (oom)
COMPACT_PRIO_SYNC_LIGHT 1 (oom)
COMPACT_PRIO_SYNC_FULL 0 (oom)
[..]
ZONE_DMA 0 (vmscan)
3 3 (vmscan)
2 2 (vmscan)
1 1 (vmscan)
COMPACT_PRIO_ASYNC 2 (vmscan)
[..]
ZONE_DMA 0 (kmem)
3 3 (kmem)
2 2 (kmem)
1 1 (kmem)
COMPACT_PRIO_ASYNC 2 (kmem)
[..]
ZONE_DMA 0 (compaction)
3 3 (compaction)
2 2 (compaction)
1 1 (compaction)
COMPACT_PRIO_ASYNC 2 (compaction)
[..]
The name within the parenthesis are the trace systems that the enum/eval
maps are associated with. When there's a number evaluated to another
number, that tells me that the TRACE_DEFINE_ENUM() was used on a #define
and not an enum. As #defines get converted normally, they are not needed
to be evaluated.
Each of the above trace systems with the number to number evaluation
included the file include/trace/events/mmflags.h which has:
/* High-level compaction status feedback */
#define COMPACTION_FAILED 1
#define COMPACTION_WITHDRAWN 2
#define COMPACTION_PROGRESS 3
[..]
#define COMPACTION_FEEDBACK \
EM(COMPACTION_FAILED, "failed") \
EM(COMPACTION_WITHDRAWN, "withdrawn") \
EMe(COMPACTION_PROGRESS, "progress")
Which is still needed for the __print_symbolic() usage in the
trace_event. But it is not needed to be evaluated.
Removing the evaluation part removes the unnecessary evaluations of
numbers to numbers.
Link: http://lkml.kernel.org/r/20170615074944.7be9a647@gandalf.local.home
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In this round, we've added new features such as disk quota and statx, and
modified internal bio management flow to merge more IOs depending on block
types. We've also made internal threads freezeable for Android battery life.
In addition to them, there are some patches to avoid lock contention as well
as a couple of deadlock conditions.
= Enhancement
- support usrquota, grpquota, and statx
- manage DATA/NODE typed bios separately to serialize more IOs
- modify f2fs_lock_op/wio_mutex to avoid lock contention
- prevent lock contention in migratepage
= Bug fix
- miss to load written inode flag
- fix worst case victim selection in GC
- freezeable GC and discard threads for Android battery life
- sanitize f2fs metadata to deal with security hole
- clean up sysfs-related code and docs
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAllj6fMACgkQQBSofoJI
UNJ6Ng/+PqdGV/b6KroYIXI/scFx/1t87/0W+rY9tyLr1jX7nIHn9KLPjeDdvdlk
5vEeZ/dGfW8wSI+ESzscvKberG2QlOPwJRyTB4jWR+bLatwzg7YjEblz+RX4/wfJ
jKjnR7M//gRdhHdqA0xXrqguAjPbcEDK2RiVbhioMjWbZ/77j0IjcRokjMYdEf0m
cJc2oMXFtlo+DJ1h9/8BmwQPTI9FfVdgbkPFTTJzV0ydQnBdxcAigrzwYZhPOVv0
n2M1dKOiQewB4OADMuepZLFqJheItlgG9wlvEjGq7zTd5epHXRIqhM6h9GikQVb9
YKAkajlKfWcwEXaEcVXtsMHC9x69Yf8xxOSQ1VrhypSUNbaynC9LDsErJx6yrF3P
XC5baiqXsd/btg7tfrHJjk3gI+ck97d6TrTfUVR91X+1Tpkz7cyB226WxFKbyOG3
EYCFVMbrIN2CaHHt1xWIT2zCfX5w9ycp8kFjY6jPi0OOZrKXpFw+1AwwTu9kn4xJ
iuUc8pmc0/FyPqokmLef4Qp/RRM83+f+nzW/y//lkEf3nMn6qlHzNI1RAxXnBvGV
DMXzuJDcJcHGcSDr7mWyKkm6gYcak/E4DdQLQqJ6VCt6KCdCEXP/XDlig5ey5ODY
uGEr1QhXIpiYAON45HUi3gmytB3J3ZdzzpsG1PEco4+hjSuFhyE=
=N4GZ
-----END PGP SIGNATURE-----
Merge tag 'for-f2fs-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've added new features such as disk quota and statx,
and modified internal bio management flow to merge more IOs depending
on block types. We've also made internal threads freezeable for
Android battery life. In addition to them, there are some patches to
avoid lock contention as well as a couple of deadlock conditions.
Enhancements:
- support usrquota, grpquota, and statx
- manage DATA/NODE typed bios separately to serialize more IOs
- modify f2fs_lock_op/wio_mutex to avoid lock contention
- prevent lock contention in migratepage
Bug fixes:
- fix missing load of written inode flag
- fix worst case victim selection in GC
- freezeable GC and discard threads for Android battery life
- sanitize f2fs metadata to deal with security hole
- clean up sysfs-related code and docs"
* tag 'for-f2fs-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (59 commits)
f2fs: support plain user/group quota
f2fs: avoid deadlock caused by lock order of page and lock_op
f2fs: use spin_{,un}lock_irq{save,restore}
f2fs: relax migratepage for atomic written page
f2fs: don't count inode block in in-memory inode.i_blocks
Revert "f2fs: fix to clean previous mount option when remount_fs"
f2fs: do not set LOST_PINO for renamed dir
f2fs: do not set LOST_PINO for newly created dir
f2fs: skip ->writepages for {mete,node}_inode during recovery
f2fs: introduce __check_sit_bitmap
f2fs: stop gc/discard thread in prior during umount
f2fs: introduce reserved_blocks in sysfs
f2fs: avoid redundant f2fs_flush after remount
f2fs: report # of free inodes more precisely
f2fs: add ioctl to do gc with target block address
f2fs: don't need to check encrypted inode for partial truncation
f2fs: measure inode.i_blocks as generic filesystem
f2fs: set CP_TRIMMED_FLAG correctly
f2fs: require key for truncate(2) of encrypted file
f2fs: move sysfs code from super.c to fs/f2fs/sysfs.c
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZXhmCAAoJEAAOaEEZVoIVpRkP/1qlYn3pq6d5Kuz84pejOmlL
5jbkS/cOmeTxeUU4+B1xG8Lx7bAk8PfSXQOADbSJGiZd0ug95tJxplFYIGJzR/tG
aNMHeu/BVKKhUKORGuKR9rJKtwC839L/qao+yPBo5U3mU4L73rFWX8fxFuhSJ8HR
hvkgBu3Hx6GY59CzxJ8iJzj+B+uPSFrNweAk0+0UeWkBgTzEdiGqaXBX4cHIkq/5
hMoCG+xnmwHKbCBsQ5js+YJT+HedZ4lvfjOqGxgElUyjJ7Bkt/IFYOp8TUiu193T
tA4UinDjN8A7FImmIBIftrECmrAC9HIGhGZroYkMKbb8ReDR2ikE5FhKEpuAGU3a
BXBgX2mPQuArvZWM7qeJCkxV9QJ0u/8Ykbyzo30iPrICyrzbEvIubeB/mDA034+Z
Z0/z8C3v7826F3zP/NyaQEojUgRq30McMOIS8GMnx15HJwRsRKlzjfy9Wm4tWhl0
t3nH1jMqAZ7068s6rfh/oCwdgGOwr5o4hW/bnlITzxbjWQUOnZIe7KBxIezZJ2rv
OcIwd5qE8PNtpagGj5oUbnjGOTkERAgsMfvPk5tjUNt28/qUlVs2V0aeo47dlcsh
oYr8WMOIzw98Rl7Bo70mplLrqLD6nGl0LfXOyUlT4STgLWW4ksmLVuJjWIUxcO/0
yKWjj9wfYRQ0vSUqhsI5
=3Z93
-----END PGP SIGNATURE-----
Merge tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull Writeback error handling updates from Jeff Layton:
"This pile represents the bulk of the writeback error handling fixes
that I have for this cycle. Some of the earlier patches in this pile
may look trivial but they are prerequisites for later patches in the
series.
The aim of this set is to improve how we track and report writeback
errors to userland. Most applications that care about data integrity
will periodically call fsync/fdatasync/msync to ensure that their
writes have made it to the backing store.
For a very long time, we have tracked writeback errors using two flags
in the address_space: AS_EIO and AS_ENOSPC. Those flags are set when a
writeback error occurs (via mapping_set_error) and are cleared as a
side-effect of filemap_check_errors (as you noted yesterday). This
model really sucks for userland.
Only the first task to call fsync (or msync or fdatasync) will see the
error. Any subsequent task calling fsync on a file will get back 0
(unless another writeback error occurs in the interim). If I have
several tasks writing to a file and calling fsync to ensure that their
writes got stored, then I need to have them coordinate with one
another. That's difficult enough, but in a world of containerized
setups that coordination may even not be possible.
But wait...it gets worse!
The calls to filemap_check_errors can be buried pretty far down in the
call stack, and there are internal callers of filemap_write_and_wait
and the like that also end up clearing those errors. Many of those
callers ignore the error return from that function or return it to
userland at nonsensical times (e.g. truncate() or stat()). If I get
back -EIO on a truncate, there is no reason to think that it was
because some previous writeback failed, and a subsequent fsync() will
(incorrectly) return 0.
This pile aims to do three things:
1) ensure that when a writeback error occurs that that error will be
reported to userland on a subsequent fsync/fdatasync/msync call,
regardless of what internal callers are doing
2) report writeback errors on all file descriptions that were open at
the time that the error occurred. This is a user-visible change,
but I think most applications are written to assume this behavior
anyway. Those that aren't are unlikely to be hurt by it.
3) document what filesystems should do when there is a writeback
error. Today, there is very little consistency between them, and a
lot of cargo-cult copying. We need to make it very clear what
filesystems should do in this situation.
To achieve this, the set adds a new data type (errseq_t) and then
builds new writeback error tracking infrastructure around that. Once
all of that is in place, we change the filesystems to use the new
infrastructure for reporting wb errors to userland.
Note that this is just the initial foray into cleaning up this mess.
There is a lot of work remaining here:
1) convert the rest of the filesystems in a similar fashion. Once the
initial set is in, then I think most other fs' will be fairly
simple to convert. Hopefully most of those can in via individual
filesystem trees.
2) convert internal waiters on writeback to use errseq_t for
detecting errors instead of relying on the AS_* flags. I have some
draft patches for this for ext4, but they are not quite ready for
prime time yet.
This was a discussion topic this year at LSF/MM too. If you're
interested in the gory details, LWN has some good articles about this:
https://lwn.net/Articles/718734/https://lwn.net/Articles/724307/"
* tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
btrfs: minimal conversion to errseq_t writeback error reporting on fsync
xfs: minimal conversion to errseq_t writeback error reporting
ext4: use errseq_t based error handling for reporting data writeback errors
fs: convert __generic_file_fsync to use errseq_t based reporting
block: convert to errseq_t based writeback error tracking
dax: set errors in mapping when writeback fails
Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors
mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error
fs: new infrastructure for writeback error handling and reporting
lib: add errseq_t type and infrastructure for handling it
mm: don't TestClearPageError in __filemap_fdatawait_range
mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails
jbd2: don't clear and reset errors after waiting on writeback
buffer: set errors in mapping at the time that the error occurs
fs: check for writeback errors after syncing out buffers in generic_file_fsync
buffer: use mapping_set_error instead of setting the flag
mm: fix mapping_set_error call in me_pagecache_dirty
- Added TRACE_DEFINE_SIZEOF() which allows trace events that use
sizeof() it the TP_printk() to be converted to the actual size such
that trace-cmd and perf can parse them correctly.
- Some rework of the TRACE_DEFINE_ENUM() such that the above
TRACE_DEFINE_SIZEOF() could reuse the same code.
- Recording of tgid (Thread Group ID). This is similar to how
task COMMs are recorded (cached at sched_switch), where it is
in a table and used on output of the trace and trace_pipe files.
- Have ":mod:<module>" be cached when written into set_ftrace_filter.
Then the functions of the module will be traced at module load.
- Some random clean ups and small fixes.
-----BEGIN PGP SIGNATURE-----
iQExBAABCAAbBQJZXjYuFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
fsgIAKUvhpn2igoYCR9tWqu+DovEmwxCIumbCzmCFQcRKlLttRte94yY5+W9hnV0
JPzd9T9zBDVqq1fI7iIop1SuTwEfKW6lJom0usZ8AFpK+YKm6FHnQ28POlvHzre2
lzO41tpRWiehLQsITZ47eByhsvEfhx86mYT/oM1JSR6Pii1OpjyNYmDMw6BaMNBT
kSCQFgIhzAhVuHjwAnB/S++E/ou7M5bCwCb5CNh7MubKubV5upHpoJcgYGO+WWa6
56H/iEhff4EECTGJVefd8e78MtJPL8EsuM0nAcMPlnl8AaiOpP7XCdlgTwdefLvP
b3o+nP15voSHkARGXC6eM6gH0po=
=rvGB
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"The new features of this release:
- Added TRACE_DEFINE_SIZEOF() which allows trace events that use
sizeof() it the TP_printk() to be converted to the actual size such
that trace-cmd and perf can parse them correctly.
- Some rework of the TRACE_DEFINE_ENUM() such that the above
TRACE_DEFINE_SIZEOF() could reuse the same code.
- Recording of tgid (Thread Group ID). This is similar to how task
COMMs are recorded (cached at sched_switch), where it is in a table
and used on output of the trace and trace_pipe files.
- Have ":mod:<module>" be cached when written into set_ftrace_filter.
Then the functions of the module will be traced at module load.
- Some random clean ups and small fixes"
* tag 'trace-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (26 commits)
ftrace: Test for NULL iter->tr in regex for stack_trace_filter changes
ftrace: Decrement count for dyn_ftrace_total_info for init functions
ftrace: Unlock hash mutex on failed allocation in process_mod_list()
tracing: Add support for display of tgid in trace output
tracing: Add support for recording tgid of tasks
ftrace: Decrement count for dyn_ftrace_total_info file
ftrace: Remove unused function ftrace_arch_read_dyn_info()
sh/ftrace: Remove only user of ftrace_arch_read_dyn_info()
ftrace: Have cached module filters be an active filter
ftrace: Implement cached modules tracing on module load
ftrace: Have the cached module list show in set_ftrace_filter
ftrace: Add :mod: caching infrastructure to trace_array
tracing: Show address when function names are not found
ftrace: Add missing comment for FTRACE_OPS_FL_RCU
tracing: Rename update the enum_map file
tracing: Add TRACE_DEFINE_SIZEOF() macros
tracing: define TRACE_DEFINE_SIZEOF() macro to map sizeof's to their values
tracing: Rename enum_replace to eval_replace
trace: rename enum_map functions
trace: rename trace.c enum functions
...
Pull percpu updates from Tejun Heo:
"These are the percpu changes for the v4.13-rc1 merge window. There are
a couple visibility related changes - tracepoints and allocator stats
through debugfs, along with __ro_after_init markings and a cosmetic
rename in percpu_counter.
Please note that the simple O(#elements_in_the_chunk) area allocator
used by percpu allocator is again showing scalability issues,
primarily with bpf allocating and freeing large number of counters.
Dennis is working on the replacement allocator and the percpu
allocator will be seeing increased churns in the coming cycles"
* 'for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: fix static checker warnings in pcpu_destroy_chunk
percpu: fix early calls for spinlock in pcpu_stats
percpu: resolve err may not be initialized in pcpu_alloc
percpu_counter: Rename __percpu_counter_add to percpu_counter_add_batch
percpu: add tracepoint support for percpu memory
percpu: expose statistics about percpu memory via debugfs
percpu: migrate percpu data structures to internal header
percpu: add missing lockdep_assert_held to func pcpu_free_area
mark most percpu globals as __ro_after_init
Most filesystems currently use mapping_set_error and
filemap_check_errors for setting and reporting/clearing writeback errors
at the mapping level. filemap_check_errors is indirectly called from
most of the filemap_fdatawait_* functions and from
filemap_write_and_wait*. These functions are called from all sorts of
contexts to wait on writeback to finish -- e.g. mostly in fsync, but
also in truncate calls, getattr, etc.
The non-fsync callers are problematic. We should be reporting writeback
errors during fsync, but many places spread over the tree clear out
errors before they can be properly reported, or report errors at
nonsensical times.
If I get -EIO on a stat() call, there is no reason for me to assume that
it is because some previous writeback failed. The fact that it also
clears out the error such that a subsequent fsync returns 0 is a bug,
and a nasty one since that's potentially silent data corruption.
This patch adds a small bit of new infrastructure for setting and
reporting errors during address_space writeback. While the above was my
original impetus for adding this, I think it's also the case that
current fsync semantics are just problematic for userland. Most
applications that call fsync do so to ensure that the data they wrote
has hit the backing store.
In the case where there are multiple writers to the file at the same
time, this is really hard to determine. The first one to call fsync will
see any stored error, and the rest get back 0. The processes with open
fds may not be associated with one another in any way. They could even
be in different containers, so ensuring coordination between all fsync
callers is not really an option.
One way to remedy this would be to track what file descriptor was used
to dirty the file, but that's rather cumbersome and would likely be
slow. However, there is a simpler way to improve the semantics here
without incurring too much overhead.
This set adds an errseq_t to struct address_space, and a corresponding
one is added to struct file. Writeback errors are recorded in the
mapping's errseq_t, and the one in struct file is used as the "since"
value.
This changes the semantics of the Linux fsync implementation such that
applications can now use it to determine whether there were any
writeback errors since fsync(fd) was last called (or since the file was
opened in the case of fsync having never been called).
Note that those writeback errors may have occurred when writing data
that was dirtied via an entirely different fd, but that's the case now
with the current mapping_set_error/filemap_check_error infrastructure.
This will at least prevent you from getting a false report of success.
The new behavior is still consistent with the POSIX spec, and is more
reliable for application developers. This patch just adds some basic
infrastructure for doing this, and ensures that the f_wb_err "cursor"
is properly set when a file is opened. Later patches will change the
existing code to use this new infrastructure for reporting errors at
fsync time.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Pull btrfs updates from David Sterba:
"The core updates improve error handling (mostly related to bios), with
the usual incremental work on the GFP_NOFS (mis)use removal,
refactoring or cleanups. Except the two top patches, all have been in
for-next for an extensive amount of time.
User visible changes:
- statx support
- quota override tunable
- improved compression thresholds
- obsoleted mount option alloc_start
Core updates:
- bio-related updates:
- faster bio cloning
- no allocation failures
- preallocated flush bios
- more kvzalloc use, memalloc_nofs protections, GFP_NOFS updates
- prep work for btree_inode removal
- dir-item validation
- qgoup fixes and updates
- cleanups:
- removed unused struct members, unused code, refactoring
- argument refactoring (fs_info/root, caller -> callee sink)
- SEARCH_TREE ioctl docs"
* 'for-4.13-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (115 commits)
btrfs: Remove false alert when fiemap range is smaller than on-disk extent
btrfs: Don't clear SGID when inheriting ACLs
btrfs: fix integer overflow in calc_reclaim_items_nr
btrfs: scrub: fix target device intialization while setting up scrub context
btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges
btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered write and quotas being enabled
btrfs: qgroup: Return actually freed bytes for qgroup release or free data
btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function
btrfs: qgroup: Add quick exit for non-fs extents
Btrfs: rework delayed ref total_bytes_pinned accounting
Btrfs: return old and new total ref mods when adding delayed refs
Btrfs: always account pinned bytes when dropping a tree block ref
Btrfs: update total_bytes_pinned when pinning down extents
Btrfs: make BUG_ON() in add_pinned_bytes() an ASSERT()
Btrfs: make add_pinned_bytes() take an s64 num_bytes instead of u64
btrfs: fix validation of XATTR_ITEM dir items
btrfs: Verify dir_item in iterate_object_props
btrfs: Check name_len before in btrfs_del_root_ref
btrfs: Check name_len before reading btrfs_get_name
...
Pull networking updates from David Miller:
"Reasonably busy this cycle, but perhaps not as busy as in the 4.12
merge window:
1) Several optimizations for UDP processing under high load from
Paolo Abeni.
2) Support pacing internally in TCP when using the sch_fq packet
scheduler for this is not practical. From Eric Dumazet.
3) Support mutliple filter chains per qdisc, from Jiri Pirko.
4) Move to 1ms TCP timestamp clock, from Eric Dumazet.
5) Add batch dequeueing to vhost_net, from Jason Wang.
6) Flesh out more completely SCTP checksum offload support, from
Davide Caratti.
7) More plumbing of extended netlink ACKs, from David Ahern, Pablo
Neira Ayuso, and Matthias Schiffer.
8) Add devlink support to nfp driver, from Simon Horman.
9) Add RTM_F_FIB_MATCH flag to RTM_GETROUTE queries, from Roopa
Prabhu.
10) Add stack depth tracking to BPF verifier and use this information
in the various eBPF JITs. From Alexei Starovoitov.
11) Support XDP on qed device VFs, from Yuval Mintz.
12) Introduce BPF PROG ID for better introspection of installed BPF
programs. From Martin KaFai Lau.
13) Add bpf_set_hash helper for TC bpf programs, from Daniel Borkmann.
14) For loads, allow narrower accesses in bpf verifier checking, from
Yonghong Song.
15) Support MIPS in the BPF selftests and samples infrastructure, the
MIPS eBPF JIT will be merged in via the MIPS GIT tree. From David
Daney.
16) Support kernel based TLS, from Dave Watson and others.
17) Remove completely DST garbage collection, from Wei Wang.
18) Allow installing TCP MD5 rules using prefixes, from Ivan
Delalande.
19) Add XDP support to Intel i40e driver, from Björn Töpel
20) Add support for TC flower offload in nfp driver, from Simon
Horman, Pieter Jansen van Vuuren, Benjamin LaHaise, Jakub
Kicinski, and Bert van Leeuwen.
21) IPSEC offloading support in mlx5, from Ilan Tayari.
22) Add HW PTP support to macb driver, from Rafal Ozieblo.
23) Networking refcount_t conversions, From Elena Reshetova.
24) Add sock_ops support to BPF, from Lawrence Brako. This is useful
for tuning the TCP sockopt settings of a group of applications,
currently via CGROUPs"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1899 commits)
net: phy: dp83867: add workaround for incorrect RX_CTRL pin strap
dt-bindings: phy: dp83867: provide a workaround for incorrect RX_CTRL pin strap
cxgb4: Support for get_ts_info ethtool method
cxgb4: Add PTP Hardware Clock (PHC) support
cxgb4: time stamping interface for PTP
nfp: default to chained metadata prepend format
nfp: remove legacy MAC address lookup
nfp: improve order of interfaces in breakout mode
net: macb: remove extraneous return when MACB_EXT_DESC is defined
bpf: add missing break in for the TCP_BPF_SNDCWND_CLAMP case
bpf: fix return in load_bpf_file
mpls: fix rtm policy in mpls_getroute
net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t
net, ax25: convert ax25_route.refcount from atomic_t to refcount_t
net, ax25: convert ax25_uid_assoc.refcount from atomic_t to refcount_t
net, sctp: convert sctp_ep_common.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_transport.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_chunk.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_datamsg.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_auth_bytes.refcnt from atomic_t to refcount_t
...
There's only one big change in this release but it's a very big change,
Geert Uytterhoeven has implemented support for SPI slave mode. This
feature has been on the cards since the subsystem was originally merged
back in the mists of time so it's great that Geert stepped up and
finally implemented it.
- SPI slave support, together with wholesale renaming of SPI
controllers from master to controller which went surprisingly
smoothly. This is already used with Renesas SoCs and support is in
the works for i.MX too.
- New drivers for Meson SPICC and ST STM32
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAllbsdwTHGJyb29uaWVA
a2VybmVsLm9yZwAKCRAk1otyXVSH0F1/B/4jzpT5JSSKxoC0upLswKQPqOsmkugF
n/ne5TofpH8YdchFgH1IRa6KhIBXW4aYCHkGnZvsc8hQcGfu1juEmC1YlDTwm2fB
/z7LUG0O7BCRQuvxRy2Sj2m+/hLLBhs1AGu1Ht0yj4rbAewJMEJLAL+DB13Oy2Iv
Tm0TASU0t/1FHXuCBsy4cpOnyrZuvMdnP5WOxfZjL738gk1EmmTgjKKGA9wiRYLF
NedOC1Tlaam27jXGvysLcRkrIf6HKDTYl39UuSBAeFZnPwxbvCYLe8Ft2xPRaynn
WbgqxZdrntv9KIduRnUpiA1EqIVovZ94sNgRpo8eAn1xIcIrYgAO6+wR
=NkAY
-----END PGP SIGNATURE-----
Merge tag 'spi-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"There's only one big change in this release but it's a very big
change: Geert Uytterhoeven has implemented support for SPI slave mode.
This feature has been on the cards since the subsystem was originally
merged back in the mists of time so it's great that Geert stepped up
and finally implemented it.
- SPI slave support, together with wholesale renaming of SPI
controllers from master to controller which went surprisingly
smoothly. This is already used with Renesas SoCs and support is in
the works for i.MX too.
- New drivers for Meson SPICC and ST STM32"
* tag 'spi-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (57 commits)
spi: loopback-test: Fix kfree() NULL pointer error.
spi: loopback-test: fix spelling mistake: "reruning" -> "rerunning"
spi: sirf: fix spelling mistake: "registerred" -> "registered"
spi: stm32: fix potential dereference null return value
spi: stm32: enhance DMA error management
spi: stm32: add runtime PM support
spi: stm32: use normal conditional statements instead of ternary operator
spi: stm32: replace st, spi-midi with st, spi-midi-ns to fit bindings
spi: stm32: fix example with st, spi-midi-ns property
spi: stm32: fix compatible to fit with new bindings
spi: stm32: use SoC specific compatible
spi: rockchip: Disable Runtime PM when chip select is asserted
spi: rockchip: Set GPIO_SS flag to enable Slave Select with GPIO CS
spi: atmel: fix corrupted data issue on SAM9 family SoCs
spi: stm32: fix error check on mbr being -ve
spi: add driver for STM32 SPI controller
spi: Document the STM32 SPI bindings
spi/bcm63xx: Fix checkpatch warnings
spi: imx: Check for allocation failure earlier
spi: mediatek: add spi support for mt2712 IC
...
Here is the "big" char/misc driver patchset for 4.13-rc1.
Lots of stuff in here, a large thunderbolt update, w1 driver header
reorg, the new mux driver subsystem, google firmware driver updates, and
a raft of other smaller things. Full details in the shortlog.
All of these have been in linux-next for a while with the only reported
issue being a merge problem with this tree and the jc-docs tree in the
w1 documentation area. The fix should be obvious for what to do when it
happens, if not, we can send a follow-up patch for it afterward.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWVpXKA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynLrQCdG9SxRjAbOd6pT9Fr2NAzpUG84YsAoLw+I3iO
EMi60UXWqAFJbtVMS9Aj
=yrSq
-----END PGP SIGNATURE-----
Merge tag 'char-misc-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc updates from Greg KH:
"Here is the "big" char/misc driver patchset for 4.13-rc1.
Lots of stuff in here, a large thunderbolt update, w1 driver header
reorg, the new mux driver subsystem, google firmware driver updates,
and a raft of other smaller things. Full details in the shortlog.
All of these have been in linux-next for a while with the only
reported issue being a merge problem with this tree and the jc-docs
tree in the w1 documentation area"
* tag 'char-misc-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (147 commits)
misc: apds990x: Use sysfs_match_string() helper
mei: drop unreachable code in mei_start
mei: validate the message header only in first fragment.
DocBook: w1: Update W1 file locations and names in DocBook
mux: adg792a: always require I2C support
nvmem: rockchip-efuse: add support for rk322x-efuse
nvmem: core: add locking to nvmem_find_cell
nvmem: core: Call put_device() in nvmem_unregister()
nvmem: core: fix leaks on registration errors
nvmem: correct Broadcom OTP controller driver writes
w1: Add subsystem kernel public interface
drivers/fsi: Add module license to core driver
drivers/fsi: Use asynchronous slave mode
drivers/fsi: Add hub master support
drivers/fsi: Add SCOM FSI client device driver
drivers/fsi/gpio: Add tracepoints for GPIO master
drivers/fsi: Add GPIO based FSI master
drivers/fsi: Document FSI master sysfs files in ABI
drivers/fsi: Add error handling for slave
drivers/fsi: Add tracepoints for low-level operations
...
Add support for tracepoints to the following events: chunk allocation,
chunk free, area allocation, area free, and area allocation failure.
This should let us replay percpu memory requests and evaluate
corresponding decisions.
Signed-off-by: Dennis Zhou <dennisz@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Commit 81fb6f77a0 (btrfs: qgroup: Add new trace point for
qgroup data reserve) added the following events which aren't used.
btrfs__qgroup_data_map
btrfs_qgroup_init_data_rsv_map
btrfs_qgroup_free_data_rsv_map
So remove them.
CC: quwenruo@cn.fujitsu.com
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are a few places in the kernel where sizeof() is already
being used. Update those locations with TRACE_DEFINE_SIZEOF.
Link: http://lkml.kernel.org/r/20170531215653.3240-12-jeremy.linton@arm.com
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Perf has a problem that if sizeof() macros are used within TRACE_EVENT()
macro's they end up in userspace as "sizeof(kernel structure)" which
cannot properly be parsed. Add a macro which can forward this data
through the eval_map for userspace utilization.
Link: http://lkml.kernel.org/r/20170531215653.3240-10-jeremy.linton@arm.com
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Each enum is loaded into the trace_enum_map, as we
are now using this for more than enums rename it.
Link: http://lkml.kernel.org/r/20170531215653.3240-3-jeremy.linton@arm.com
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The kernel and its modules have sections containing the enum
string to value conversions. Rename this section because we
intend to store more than enums in it.
Link: http://lkml.kernel.org/r/20170531215653.3240-2-jeremy.linton@arm.com
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Now struct spi_master is used for both SPI master and slave controllers,
it makes sense to rename it to struct spi_controller, and replace
"master" by "controller" where appropriate.
For now this conversion is done for SPI core infrastructure only.
Wrappers are provided for backwards compatibility, until all SPI drivers
have been converted.
Noteworthy details:
- SPI_MASTER_GPIO_SS is retained, as it only makes sense for SPI
master controllers,
- spi_busnum_to_master() is retained, as it looks up masters only,
- A new field spi_device.controller is added, but spi_device.master is
retained for compatibility (both are always initialized by
spi_alloc_device()),
- spi_flash_read() is used by SPI masters only.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Mark Brown <broonie@kernel.org>
Trace low level read and write FSI bus operations.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Christopher Bostic <cbostic@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently rcu_barrier() uses call_rcu() to enqueue new callbacks
on each CPU with a non-empty callback list. This works, but means
that rcu_barrier() forces grace periods that are not otherwise needed.
The key point is that rcu_barrier() never needs to wait for a grace
period, but instead only for all pre-existing callbacks to be invoked.
This means that rcu_barrier()'s new callbacks should be placed in
the callback-list segment containing the last pre-existing callback.
This commit makes this change using the new rcu_segcblist_entrain()
function.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Make it possible for a client to use AuriStor's service upgrade facility.
The client does this by adding an RXRPC_UPGRADE_SERVICE control message to
the first sendmsg() of a call. This takes no parameters.
When recvmsg() starts returning data from the call, the service ID field in
the returned msg_name will reflect the result of the upgrade attempt. If
the upgrade was ignored, srx_service will match what was set in the
sendmsg(); if the upgrade happened the srx_service will be altered to
indicate the service the server upgraded to.
Note that:
(1) The choice of upgrade service is up to the server
(2) Further client calls to the same server that would share a connection
are blocked if an upgrade probe is in progress.
(3) This should only be used to probe the service. Clients should then
use the returned service ID in all subsequent communications with that
server (and not set the upgrade). Note that the kernel will not
retain this information should the connection expire from its cache.
(4) If a server that supports upgrading is replaced by one that doesn't,
whilst a connection is live, and if the replacement is running, say,
OpenAFS 1.6.4 or older or an older IBM AFS, then the replacement
server will not respond to packets sent to the upgraded connection.
At this point, calls will time out and the server must be reprobed.
Signed-off-by: David Howells <dhowells@redhat.com>
Break out the exported SMBus functions and the emulation layer into a
separate file. This also involved splitting up the tracing header into
an I2C and an SMBus part.
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Split DATA/NODE type bio cache according to different temperature,
so write IOs with the same temperature can be merged in corresponding
bio cache as much as possible, otherwise, different temperature write
IOs submitting into one bio cache will always cause split of bio.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Merged IO flow doesn't need to care about read IOs.
f2fs_submit_merged_bio -> f2fs_submit_merged_write
f2fs_submit_merged_bios -> f2fs_submit_merged_writes
f2fs_submit_merged_bio_cond -> f2fs_submit_merged_write_cond
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Pull thermal management updates from Zhang Rui:
- Fix a problem where orderly_shutdown() is called for multiple times
due to multiple critical overheating events raised in a short period
by platform thermal driver. (Keerthy)
- Introduce a backup thermal shutdown mechanism, which invokes
kernel_power_off()/emergency_restart() directly, after
orderly_shutdown() being issued for certain amount of time(specified
via Kconfig). This is useful in certain conditions that userspace may
be unable to power off the system in a clean manner and leaves the
system in a critical state, like in the middle of driver probing
phase. (Keerthy)
- Introduce a new interface in thermal devfreq_cooling code so that the
driver can provide more precise data regarding actual power to the
thermal governor every time the power budget is calculated. (Lukasz
Luba)
- Introduce BCM 2835 soc thermal driver and northstar thermal driver,
within a new sub-folder. (Rafał Miłecki)
- Introduce DA9062/61 thermal driver. (Steve Twiss)
- Remove non-DT booting on TI-SoC driver. Also add support to fetching
coefficients from DT. (Keerthy)
- Refactorf RCAR Gen3 thermal driver. (Niklas Söderlund)
- Small fix on MTK and intel-soc-dts thermal driver. (Dawei Chien,
Brian Bian)
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (25 commits)
thermal: core: Add a back up thermal shutdown mechanism
thermal: core: Allow orderly_poweroff to be called only once
Thermal: Intel SoC DTS: Change interrupt request behavior
trace: thermal: add another parameter 'power' to the tracing function
thermal: devfreq_cooling: add new interface for direct power read
thermal: devfreq_cooling: refactor code and add get_voltage function
thermal: mt8173: minor mtk_thermal.c cleanups
thermal: bcm2835: move to the broadcom subdirectory
thermal: broadcom: ns: specify myself as MODULE_AUTHOR
thermal: da9062/61: Thermal junction temperature monitoring driver
Documentation: devicetree: thermal: da9062/61 TJUNC temperature binding
thermal: broadcom: add Northstar thermal driver
dt-bindings: thermal: add support for Broadcom's Northstar thermal
thermal: bcm2835: add thermal driver for bcm2835 SoC
dt-bindings: Add thermal zone to bcm2835-thermal example
thermal: rcar_gen3_thermal: add suspend and resume support
thermal: rcar_gen3_thermal: store device match data in private structure
thermal: rcar_gen3_thermal: enable hardware interrupts for trip points
thermal: rcar_gen3_thermal: record and check number of TSCs found
thermal: rcar_gen3_thermal: check that TSC exists before memory allocation
...
Pull btrfs updates from Chris Mason:
"This has fixes and cleanups Dave Sterba collected for the merge
window.
The biggest functional fixes are between btrfs raid5/6 and scrub, and
raid5/6 and device replacement. Some of our pending qgroup fixes are
included as well while I bash on the rest in testing.
We also have the usual set of cleanups, including one that makes
__btrfs_map_block() much more maintainable, and conversions from
atomic_t to refcount_t"
* 'for-linus-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (71 commits)
btrfs: fix the gfp_mask for the reada_zones radix tree
Btrfs: fix reported number of inode blocks
Btrfs: send, fix file hole not being preserved due to inline extent
Btrfs: fix extent map leak during fallocate error path
Btrfs: fix incorrect space accounting after failure to insert inline extent
Btrfs: fix invalid attempt to free reserved space on failure to cow range
btrfs: Handle delalloc error correctly to avoid ordered extent hang
btrfs: Fix metadata underflow caused by btrfs_reloc_clone_csum error
btrfs: check if the device is flush capable
btrfs: delete unused member nobarriers
btrfs: scrub: Fix RAID56 recovery race condition
btrfs: scrub: Introduce full stripe lock for RAID56
btrfs: Use ktime_get_real_ts for root ctime
Btrfs: handle only applicable errors returned by btrfs_get_extent
btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option
btrfs: use q which is already obtained from bdev_get_queue
Btrfs: switch to div64_u64 if with a u64 divisor
Btrfs: update scrub_parity to use u64 stripe_len
Btrfs: enable repair during read for raid56 profile
btrfs: use clear_page where appropriate
...
This includes:
* Some code optimizations for the Intel VT-d driver
* Code to switch off a previously enabled Intel IOMMU
* Support for 'struct iommu_device' for OMAP, Rockchip and
Mediatek IOMMUs
* Some header optimizations for IOMMU core code headers and a
few fixes that became necessary in other parts of the kernel
because of that
* ACPI/IORT updates and fixes
* Some Exynos IOMMU optimizations
* Code updates for the IOMMU dma-api code to bring it closer to
use per-cpu iova caches
* New command-line option to set default domain type allocated
by the iommu core code
* Another command line option to allow the Intel IOMMU switched
off in a tboot environment
* ARM/SMMU: TLB sync optimisations for SMMUv2, Support for using
an IDENTITY domain in conjunction with DMA ops, Support for
SMR masking, Support for 16-bit ASIDs (was previously broken)
* Various other small fixes and improvements
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJZEY4XAAoJECvwRC2XARrjth0QAKV56zjnFclv39aDo6eCq9CT
51+XT4bPY5VKQ2+Jx76TBNObHmGK+8KEMHfT9khpWJtFCDyy25SGckLry1nYqmZs
tSTsbj4sOeCyKzOLITlRN9/OzKXkjKAxYuq+sQZZFDFYf3kCM/eag0dGAU6aVLNp
tkIal3CSpGjCQ9M5JohrtQ1mwiGqCIkMIgvnBjRw+bfpLnQNG+VL6VU2G3RAkV2b
5Vbdoy+P7ZQnJSZr/bibYL2BaQs2diR4gOppT5YbsfniMq4QYSjheu1xBboGX8b7
sx8yuPi4370irSan0BDvlvdQdjBKIRiDjfGEKDhRwPhtvN6JREGakhEOC8MySQ37
mP96B72Lmd+a7DEl5udOL7tQILA0DcUCX0aOyF714khnZuFU5tVlCotb/36xeJ+T
FPc3RbEVQ90m8dYU6MNJ+ahtb/ZapxGTRfisIigB6wlnZa0Evabp9EJSce6oJMkm
whbBhDubeEU18n9XAaofMbu+P2LAzq8cxiRMlsDvT4mIy7jO86jjCmhpu1Tfn2GY
4wrEQZdWOMvhUsIhObXA0aC3BzC506uvnKPW3qy041RaxBuelWiBi29qzYbhxzkr
DLDpWbUZNYPyFJjttpavyQb2/XRduBTJdVP1pQpkJNDsW5jLiBkpSqm9xNADapRY
vLSYRX0JCIquaD+PAuxn
=3aE8
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
- code optimizations for the Intel VT-d driver
- ability to switch off a previously enabled Intel IOMMU
- support for 'struct iommu_device' for OMAP, Rockchip and Mediatek
IOMMUs
- header optimizations for IOMMU core code headers and a few fixes that
became necessary in other parts of the kernel because of that
- ACPI/IORT updates and fixes
- Exynos IOMMU optimizations
- updates for the IOMMU dma-api code to bring it closer to use per-cpu
iova caches
- new command-line option to set default domain type allocated by the
iommu core code
- another command line option to allow the Intel IOMMU switched off in
a tboot environment
- ARM/SMMU: TLB sync optimisations for SMMUv2, Support for using an
IDENTITY domain in conjunction with DMA ops, Support for SMR masking,
Support for 16-bit ASIDs (was previously broken)
- various other small fixes and improvements
* tag 'iommu-updates-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (63 commits)
soc/qbman: Move dma-mapping.h include to qman_priv.h
soc/qbman: Fix implicit header dependency now causing build fails
iommu: Remove trace-events include from iommu.h
iommu: Remove pci.h include from trace/events/iommu.h
arm: dma-mapping: Don't override dma_ops in arch_setup_dma_ops()
ACPI/IORT: Fix CONFIG_IOMMU_API dependency
iommu/vt-d: Don't print the failure message when booting non-kdump kernel
iommu: Move report_iommu_fault() to iommu.c
iommu: Include device.h in iommu.h
x86, iommu/vt-d: Add an option to disable Intel IOMMU force on
iommu/arm-smmu: Return IOVA in iova_to_phys when SMMU is bypassed
iommu/arm-smmu: Correct sid to mask
iommu/amd: Fix incorrect error handling in amd_iommu_bind_pasid()
iommu: Make iommu_bus_notifier return NOTIFY_DONE rather than error code
omap3isp: Remove iommu_group related code
iommu/omap: Add iommu-group support
iommu/omap: Make use of 'struct iommu_device'
iommu/omap: Store iommu_dev pointer in arch_data
iommu/omap: Move data structures to omap-iommu.h
iommu/omap: Drop legacy-style device support
...
Merge more updates from Andrew Morton:
- the rest of MM
- various misc things
- procfs updates
- lib/ updates
- checkpatch updates
- kdump/kexec updates
- add kvmalloc helpers, use them
- time helper updates for Y2038 issues. We're almost ready to remove
current_fs_time() but that awaits a btrfs merge.
- add tracepoints to DAX
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits)
drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4
selftests/vm: add a test for virtual address range mapping
dax: add tracepoint to dax_insert_mapping()
dax: add tracepoint to dax_writeback_one()
dax: add tracepoints to dax_writeback_mapping_range()
dax: add tracepoints to dax_load_hole()
dax: add tracepoints to dax_pfn_mkwrite()
dax: add tracepoints to dax_iomap_pte_fault()
mtd: nand: nandsim: convert to memalloc_noreclaim_*()
treewide: convert PF_MEMALLOC manipulations to new helpers
mm: introduce memalloc_noreclaim_{save,restore}
mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC
mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required
mm/huge_memory.c: use zap_deposited_table() more
time: delete CURRENT_TIME_SEC and CURRENT_TIME
gfs2: replace CURRENT_TIME with current_time
apparmorfs: replace CURRENT_TIME with current_time()
lustre: replace CURRENT_TIME macro
fs: ubifs: replace CURRENT_TIME_SEC with current_time
fs: ufs: use ktime_get_real_ts64() for birthtime
...
Add a tracepoint to dax_insert_mapping(), following the same logging
conventions as the rest of DAX. This tracepoint, along with the one in
dax_load_hole(), lets us know how a DAX PTE fault was serviced.
Here is an example DAX fault that inserts a PTE mapping:
small-1126 [007] ....
145.451604: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220
small-1126 [007] ....
145.452317: dax_insert_mapping: dev 259:0 ino 0x1003 shared write address 0x10420000 radix_entry 0x100006
small-1126 [007] ....
145.452399: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 MAJOR|NOPAGE
Link: http://lkml.kernel.org/r/20170221195116.13278-7-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a tracepoint to dax_writeback_one(), following the same logging
conventions as the rest of DAX.
Here is an example range writeback which ends up flushing one PMD and
one PTE:
test-1265 [003] ....
496.615250: dax_writeback_range: dev 259:0 ino 0x1003 pgoff 0x0-0x7ffffffffffff
test-1265 [003] ....
496.616263: dax_writeback_one: dev 259:0 ino 0x1003 pgoff 0x0 pglen 0x200
test-1265 [003] ....
496.616270: dax_writeback_one: dev 259:0 ino 0x1003 pgoff 0x305 pglen 0x1
test-1265 [003] ....
496.616272: dax_writeback_range_done: dev 259:0 ino 0x1003 pgoff 0x0-0x7ffffffffffff
[akpm@linux-foundation.org: struct blk_dax_ctl has disappeared]
Link: http://lkml.kernel.org/r/20170221195116.13278-6-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add tracepoints to dax_writeback_mapping_range(), following the same
logging conventions as the rest of DAX.
Here is an example writeback call:
msync-1085 [006] ....
200.902565: dax_writeback_range: dev 259:0 ino 0x1003 pgoff 0x200-0x2ff
msync-1085 [006] ....
200.902579: dax_writeback_range_done: dev 259:0 ino 0x1003 pgoff 0x200-0x2ff
[ross.zwisler@linux.intel.com: fix regression in dax_writeback_mapping_range()]
Link: http://lkml.kernel.org/r/20170314215358.31451-1-ross.zwisler@linux.intel.com
Link: http://lkml.kernel.org/r/20170221195116.13278-5-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add tracepoints to dax_load_hole(), following the same logging conventions
as the rest of DAX.
Here is the logging generated by a PTE read from a hole:
read-1075 [002] ....
62.362108: dax_pte_fault: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280
read-1075 [002] ....
62.362140: dax_load_hole: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 NOPAGE
read-1075 [002] ....
62.362141: dax_pte_fault_done: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 NOPAGE
Link: http://lkml.kernel.org/r/20170221195116.13278-4-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add tracepoints to dax_pfn_mkwrite(), following the same logging
conventions as the rest of DAX.
Here is an example PTE fault followed by a pfn_mkwrite:
small_aligned-1094 [002] ....
374.084998: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200
small_aligned-1094 [002] ....
374.085145: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 MAJOR|NOPAGE
small_aligned-1094 [002] ....
374.085165: dax_pfn_mkwrite: dev 259:0 ino 0x1003 shared WRITE|MKWRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 NOPAGE
Link: http://lkml.kernel.org/r/20170221195116.13278-3-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "second round of tracepoints for DAX".
This second round of DAX tracepoint patches adds tracing to the PTE
fault path (dax_iomap_pte_fault(), dax_pfn_mkwrite(), dax_load_hole(),
dax_insert_mapping()) and to the writeback path
(dax_writeback_mapping_range(), dax_writeback_one()).
The purpose of this tracing is to give us a high level view of what DAX
is doing, whether faults are being serviced by PMDs or PTEs, and by real
storage or by zero pages covering holes.
I do have some patches nearly ready which also add tracing to
grab_mapping_entry() and dax_insert_mapping_entry(). These are more
targeted at logging how we are interacting with the radix tree, how we
use empty entries for locking, whether we "downgrade" huge zero pages to
4k PTE sized allocations, etc. In the end it seemed to me that this
might be too detailed to have as constantly present tracepoints, but if
anyone sees value in having tracepoints like this in the DAX code
permanently (Jan?), please let me know and I'll add those last two
patches.
All these tracepoints were done to be consistent with the style of the
XFS tracepoints and with the existing DAX PMD tracepoints.
This patch (of 6):
Add tracepoints to dax_iomap_pte_fault(), following the same logging
conventions as the rest of DAX.
Here is an example fault that initially tries to be serviced by the PMD
fault handler but which falls back to PTEs because the VMA isn't large
enough to hold a PMD:
small-1086 [005] ....
71.140014: xfs_filemap_huge_fault: dev 259:0 ino 0x1003
small-1086 [005] ....
71.140027: dax_pmd_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400
small-1086 [005] ....
71.140028: dax_pmd_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400 FALLBACK
small-1086 [005] ....
71.140035: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220
small-1086 [005] ....
71.140396: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 MAJOR|NOPAGE
Link: http://lkml.kernel.org/r/20170221195116.13278-2-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In this round, we've focused on enhancing performance with regards to block
allocation, GC, and discard/in-place-update IO controls. There are a bunch
of clean-ups as well as minor bug fixes.
= Enhancement
- disable heap-based allocation by default
- issue small-sized discard commands by default
- change the policy of data hotness for logging
- distinguish IOs in terms of size and wbc type
- start SSR earlier to avoid foreground GC
- enhance data structures managing discard commands
- enhance in-place update flow
- add some more fault injection routines
- secure one more xattr entry
= Bug fix
- calculate victim cost for GC correctly
- remain correct victim segment number for GC
- race condition in nid allocator and initializer
- stale pointer produced by atomic_writes
- fix missing REQ_SYNC for flush commands
- handle missing errors in more corner cases
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZEKXrAAoJEEAUqH6CSFDSJJ8P/1Zy0NS9TM/PFtT7Sevb6vgC
LcKLtX1bVhUuX9wAt5Q6BZ9927tCQPt5vLEYUxtniqEQaC0fsJAMbRYot+gR/dvN
4bGgv1TeVST5pKbmctzhAL30PvZ1w4QS6dLvPMm2sPQSrPKGUGt0J8wPiHHZuvH4
pygKzDxbrIJTeMhLm9tgFg7dWTJXV3VDb57WpA1AM1LAFVsIPF4vZnryLv3GsRmY
eGRxgZEtt/90hCRbEcPirPZrtpv/O5f12K4Vp/NPw+4XGMEk+nTYndq6rlUWVNjg
iPEDuxONyk/yb274SqB6sbNDuxHOqn7stGJepdUpSbprIsLZ0RmMaYWjSNsLU3Vh
p4fAzRqvfSqAHCt0FEL/vT8M9ST5xQRVr9P/l0kDK5Ww95RROd05bEaGm/sKc7NB
PHiWUoMIFFmuVsoCi6sM0AKps53ZGON8GEUyVKyM7NWTw1oWLPWifGMthEkysmwm
08SdU5+XqbCeyMPAA2GURqMA5A8ssuA8+F0Citf4JPckQHPPj5pAydmx2wVlfBlc
/bneR7T/8OsUbxgG8JSbdHUiPcjb20F0GTxSOTXiV/AaZAMCtyETnw64K2V6E0n7
uraKcYYhypyphCj/IYc4vnQ3dCu3U2/NvTYEVX8DBvboN38/JVqmNWgQx9g+tLzj
+r5s7PqTDuXv5Cfzc5NC
=SBUb
-----END PGP SIGNATURE-----
Merge tag 'for-f2fs-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've focused on enhancing performance with regards to
block allocation, GC, and discard/in-place-update IO controls. There
are a bunch of clean-ups as well as minor bug fixes.
Enhancements:
- disable heap-based allocation by default
- issue small-sized discard commands by default
- change the policy of data hotness for logging
- distinguish IOs in terms of size and wbc type
- start SSR earlier to avoid foreground GC
- enhance data structures managing discard commands
- enhance in-place update flow
- add some more fault injection routines
- secure one more xattr entry
Bug fixes:
- calculate victim cost for GC correctly
- remain correct victim segment number for GC
- race condition in nid allocator and initializer
- stale pointer produced by atomic_writes
- fix missing REQ_SYNC for flush commands
- handle missing errors in more corner cases"
* tag 'for-f2fs-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (111 commits)
f2fs: fix a mount fail for wrong next_scan_nid
f2fs: enhance scalability of trace macro
f2fs: relocate inode_{,un}lock in F2FS_IOC_SETFLAGS
f2fs: Make flush bios explicitely sync
f2fs: show available_nids in f2fs/status
f2fs: flush dirty nats periodically
f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard
f2fs: allow cpc->reason to indicate more than one reason
f2fs: release cp and dnode lock before IPU
f2fs: shrink size of struct discard_cmd
f2fs: don't hold cmd_lock during waiting discard command
f2fs: nullify fio->encrypted_page for each writes
f2fs: sanity check segment count
f2fs: introduce valid_ipu_blkaddr to clean up
f2fs: lookup extent cache first under IPU scenario
f2fs: reconstruct code to write a data page
f2fs: introduce __wait_discard_cmd
f2fs: introduce __issue_discard_cmd
f2fs: enable small discard by default
f2fs: delay awaking discard thread
...
file systems and for random write workloads into a preallocated file;
bug fixes and cleanups.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlkPYB8ACgkQ8vlZVpUN
gaP1HwgApoMQGegtRIbCZKUzKBJ2S6vwIoPAMz62JuwngOyWygJ1T1TliKTitG04
XvijKpUHtEggMO/ZsUOCoyr2LzJlpVvvrJZsavEubO12LKreYMpvNraZF1GACYTb
lIZpdWkpcEz5WnPV/PXW/dEMcSMhnKe8tbmHXMyAouSC6a55F5Wp456KF/plqkHU
zkWTCDbEOtHThzpL8cthUL71ji62I3Op5jn/qOfKCm6/JtUlw5pYjWkRUNqqjSQE
uQqMpqLxI/VjOdEiBPxEF6A+ZudZmoBQKY15ibWCcHUPFOPqk4RdYz6VivRI7zrg
KrrKcdFT29MtKnRfAAoJcc0nJ4e1Iw==
=il74
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
- add GETFSMAP support
- some performance improvements for very large file systems and for
random write workloads into a preallocated file
- bug fixes and cleanups.
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd2: cleanup write flags handling from jbd2_write_superblock()
ext4: mark superblock writes synchronous for nobarrier mounts
ext4: inherit encryption xattr before other xattrs
ext4: replace BUG_ON with WARN_ONCE in ext4_end_bio()
ext4: avoid unnecessary transaction stalls during writeback
ext4: preload block group descriptors
ext4: make ext4_shutdown() static
ext4: support GETFSMAP ioctls
vfs: add common GETFSMAP ioctl definitions
ext4: evict inline data when writing to memory map
ext4: remove ext4_xattr_check_entry()
ext4: rename ext4_xattr_check_names() to ext4_xattr_check_entries()
ext4: merge ext4_xattr_list() into ext4_listxattr()
ext4: constify static data that is never modified
ext4: trim return value and 'dir' argument from ext4_insert_dentry()
jbd2: fix dbench4 performance regression for 'nobarrier' mounts
jbd2: Fix lockdep splat with generic/270 test
mm: retry writepages() on ENOMEM when doing an data integrity writeback
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZDHJ4AAoJEAhfPr2O5OEVdwAP/jAmT+Bu7gXfgcrrmHNpivx4
knyyGlmpoazPT4WbNvBkqCdYESXpJowQgzOMagRi2zSEqnylCgAFvZ/CF6imGJDd
0r1ahK6JE9sBSw2Y531h8t7IESmEFaDCOdg4W91lCMa76goZoSjWTDhv6xx1nQId
d77lHhbAKctQI7VdBA1KlCdrvn5QKmNKsJHMGWJbXv/zNWube8Lk6ZAeqJ2Q2Efk
yzrjQiXpYKVcG6tnI6BSp+rkzRYshO7vs+xw37RcCPfzf9YgHd9Olp9FDegzmRrd
gJ1UudEpGPFZ6RIiOJLUkurPEdfAiSVMUG7jEimgRwsu0+QEURuVHF0HiTA2XjVX
5jKJSobOQQzc14b1d42eIMDBsqEP2/Bll4BBjy7VHzyAcxh3Jpo8Fqoe0Jq/gmio
jP11RHt5XRrqPmyBoApigxffDSizqNhT+yoOr5G/2EJza/L7rH9SuGALa0OPql6o
OVJyfSit02Eco7ccrcqxp2s6fqFGXBwso6U9aSKyiG2xqXLb/g1GkacOt1TjMCHU
OnuWR/1RjizGyxoom5Y0WhnPcLEJ4x1cVtU8tuqAx2K4YhRFsH5e27gQCXPynm1Z
8yC2DA4+3w57U5uYAGUlZP6/Mo+KGVET83OtNHnmOZ8qH55CzFbp8TTF+iMMmLHm
ZkXCS1/1Iwt+ykNymFLn
=Snzj
-----END PGP SIGNATURE-----
Merge tag 'media/v4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
"Media updates for v4.12-rc1:
- new driver to support mediatek jpeg in hardware codec
- rc-lirc, s5p-cec and st-cec staging drivers got promoted
- hardware histogram support for vsp1 driver
- added Virtual Media Controller driver, to make easier to test the
media controller
- added a new CEC driver (rainshadow-cec)
- removed two staging LIRC drivers for obscure hardware that are too
obsolete
- added support for Intel SR300 Depth camera
- some improvements at CEC and RC core
- lots of driver cleanups, improvements all over the tree
With this series, we're finally getting rid of the LIRC staging
driver. There's just one left (lirc_zilog), with require more care,
as part of its functionality (IR RX) is already provided by another
driver. Work in progress to convert it on the proper way"
* tag 'media/v4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (304 commits)
[media] ov2640: print error if devm_*_optional*() fails
[media] atmel-isc: Fix the static checker warning
[media] ov2640: add support for MEDIA_BUS_FMT_YVYU8_2X8 and MEDIA_BUS_FMT_VYUY8_2X8
[media] ov2640: fix vflip control
[media] ov2640: fix duplicate width+height returning from ov2640_select_win()
[media] ov2640: add missing write to size change preamble
[media] ov2640: add information about DSP register 0xc7
[media] ov2640: improve banding filter register definitions/documentation
[media] ov2640: fix init sequence alignment
[media] ov2640: make GPIOLIB an optional dependency
[media] xc5000: fix spelling mistake: "calibration"
[media] vidioc-queryctrl.rst: fix menu/int menu references
[media] media-entity: only call dev_dbg_obj if mdev is not NULL
[media] pixfmt-meta-vsp1-hgo.rst: remove spurious '-'
[media] mtk-vcodec: avoid warnings because of empty macros
[media] coda: bump maximum number of internal framebuffers to 17
[media] media: mtk-vcodec: remove informative log
[media] subdev-formats.rst: remove spurious '-'
[media] dw2102: limit messages to buffer size
[media] ttusb2: limit messages to buffer size
...
This patch adds another parameter to the trace function:
trace_thermal_power_devfreq_get_power().
In case when we call directly driver's code for the real power,
we do not have static/dynamic_power values. Instead we get total
power in the '*power' value. The 'static_power' and
'dynamic_power' are set to 0.
Therefore, we have to trace that '*power' value in this scenario.
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ingo Molnar <mingo@redhat.com>
CC: Zhang Rui <rui.zhang@intel.com>
CC: Eduardo Valentin <edubezval@gmail.com>
Acked-by: Javi Merino <javi.merino@kernel.org>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Use __print_flags in show_bio_op_flags and show_cpreason instead of
__print_symbolic, it enables tracer function traverses and shows all
bits in the flag.
Additionally, add missing REQ_FUA into F2FS_OP_FLAGS.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Introduce CP_TRIMMED_FLAG to indicate all invalid block were trimmed
before umount, so once we do mount with image which contain the flag,
we don't record invalid blocks as undiscard one, when fstrim is being
triggered, we can avoid issuing redundant discard commands.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Pull networking updates from David Millar:
"Here are some highlights from the 2065 networking commits that
happened this development cycle:
1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri)
2) Add a generic XDP driver, so that anyone can test XDP even if they
lack a networking device whose driver has explicit XDP support
(me).
3) Sparc64 now has an eBPF JIT too (me)
4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei
Starovoitov)
5) Make netfitler network namespace teardown less expensive (Florian
Westphal)
6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana)
7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger)
8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky)
9) Multiqueue support in stmmac driver (Joao Pinto)
10) Remove TCP timewait recycling, it never really could possibly work
well in the real world and timestamp randomization really zaps any
hint of usability this feature had (Soheil Hassas Yeganeh)
11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay
Aleksandrov)
12) Add socket busy poll support to epoll (Sridhar Samudrala)
13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso,
and several others)
14) IPSEC hw offload infrastructure (Steffen Klassert)"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits)
tipc: refactor function tipc_sk_recv_stream()
tipc: refactor function tipc_sk_recvmsg()
net: thunderx: Optimize page recycling for XDP
net: thunderx: Support for XDP header adjustment
net: thunderx: Add support for XDP_TX
net: thunderx: Add support for XDP_DROP
net: thunderx: Add basic XDP support
net: thunderx: Cleanup receive buffer allocation
net: thunderx: Optimize CQE_TX handling
net: thunderx: Optimize RBDR descriptor handling
net: thunderx: Support for page recycling
ipx: call ipxitf_put() in ioctl error path
net: sched: add helpers to handle extended actions
qed*: Fix issues in the ptp filter config implementation.
qede: Fix concurrency issue in PTP Tx path processing.
stmmac: Add support for SIMATIC IOT2000 platform
net: hns: fix ethtool_get_strings overflow in hns driver
tcp: fix wraparound issue in tcp_lp
bpf, arm64: fix jit branch offset related to ldimm64
bpf, arm64: implement jiting of BPF_XADD
...
Pull x86 mm updates from Ingo Molnar:
"The main x86 MM changes in this cycle were:
- continued native kernel PCID support preparation patches to the TLB
flushing code (Andy Lutomirski)
- various fixes related to 32-bit compat syscall returning address
over 4Gb in applications, launched from 64-bit binaries - motivated
by C/R frameworks such as Virtuozzo. (Dmitry Safonov)
- continued Intel 5-level paging enablement: in particular the
conversion of x86 GUP to the generic GUP code. (Kirill A. Shutemov)
- x86/mpx ABI corner case fixes/enhancements (Joerg Roedel)
- ... plus misc updates, fixes and cleanups"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash
x86/mm: Fix flush_tlb_page() on Xen
x86/mm: Make flush_tlb_mm_range() more predictable
x86/mm: Remove flush_tlb() and flush_tlb_current_task()
x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
x86/mm/64: Fix crash in remove_pagetable()
Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation"
x86/boot/e820: Remove a redundant self assignment
x86/mm: Fix dump pagetables for 4 levels of page tables
x86/mpx, selftests: Only check bounds-vs-shadow when we keep shadow
x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space
Revert "x86/mm/numa: Remove numa_nodemask_from_meminfo()"
x86/espfix: Add support for 5-level paging
x86/kasan: Extend KASAN to support 5-level paging
x86/mm: Add basic defines/helpers for CONFIG_X86_5LEVEL=y
x86/paravirt: Add 5-level support to the paravirt code
x86/mm: Define virtual memory map for 5-level paging
x86/asm: Remove __VIRTUAL_MASK_SHIFT==47 assert
x86/boot: Detect 5-level paging support
x86/mm/numa: Remove numa_nodemask_from_meminfo()
...
Pull locking updates from Ingo Molnar:
"The main changes in this cycle were:
- a big round of FUTEX_UNLOCK_PI improvements, fixes, cleanups and
general restructuring
- lockdep updates such as new checks for lock_downgrade()
- introduce the new atomic_try_cmpxchg() locking API and use it to
optimize refcount code generation
- ... plus misc fixes, updates and cleanups"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
MAINTAINERS: Add FUTEX SUBSYSTEM
futex: Clarify mark_wake_futex memory barrier usage
futex: Fix small (and harmless looking) inconsistencies
futex: Avoid freeing an active timer
rtmutex: Plug preempt count leak in rt_mutex_futex_unlock()
rtmutex: Fix more prio comparisons
rtmutex: Fix PI chain order integrity
sched,tracing: Update trace_sched_pi_setprio()
sched/rtmutex: Refactor rt_mutex_setprio()
rtmutex: Clean up
sched/deadline/rtmutex: Dont miss the dl_runtime/dl_period update
sched/rtmutex/deadline: Fix a PI crash for deadline tasks
rtmutex: Deboost before waking up the top waiter
locking/ww-mutex: Limit stress test to 2 seconds
locking/atomic: Fix atomic_try_cmpxchg() semantics
lockdep: Fix per-cpu static objects
futex: Drop hb->lock before enqueueing on the rtmutex
futex: Futex_unlock_pi() determinism
futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()
futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock()
...
Support the GETFSMAP ioctls so that we can use the xfs free space
management tools to probe ext4 as well. Note that this is a partial
implementation -- we only report fixed-location metadata and free space;
everything else is reported as "unknown".
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The include file does not need any PCI specifics, so remove
that include. Also fix the places that relied on it.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When iterating through a map, we need to find a key that does not exist
in the map so map_get_next_key will give us the first key of the map.
This often requires a lot of guessing in production systems.
This patch makes map_get_next_key return the first key when the key
pointer in the parameter is NULL.
Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an even class f2fs_discard for introducing f2fs_queue_discard, then
use f2fs_{queue,issue}_discard to trace __{queue,submit}_discard_cmd.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Introduce the following trace points:
qgroup_update_reserve
qgroup_meta_reserve
These trace points are handy to trace qgroup reserve space related
problems.
Also export btrfs_qgroup structure, as now we directly pass btrfs_qgroup
structure to trace points, so that structure needs to be exported.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While debugging truncate problems, I found that these tracepoints could
help us quickly know what went wrong.
Two sets of tracepoints are created to track regular/prealloc file item
and inline file item respectively, I put inline as a separate one since
what inline file items cares about are way less than the regular one.
This adds four tracepoints:
- btrfs_get_extent_show_fi_regular
- btrfs_get_extent_show_fi_inline
- btrfs_truncate_show_fi_regular
- btrfs_truncate_show_fi_inline
Cc: David Sterba <dsterba@suse.cz>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ formatting adjustments ]
Signed-off-by: David Sterba <dsterba@suse.com>
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The metadata buffer type is used to transfer metadata between userspace
and kernelspace through a V4L2 buffers queue. It comes with a new
metadata capture capability and format description.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Tested-by: Guennadi Liakhovetski <guennadi.liakhovetski@intel.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
[hans.verkuil@cisco.com: removed left-over 'experimental' note]
[hans.verkuil@cisco.com: add newline after _v4l2-meta-format label]
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Add a tracepoint (rxrpc_connect_call) to log the combination of rxrpc_call
pointer, afs_call pointer/user data and wire call parameters to make it
easier to match the tracebuffer contents to captured network packets.
Signed-off-by: David Howells <dhowells@redhat.com>
Add a tracepoint (rxrpc_rx_rwind_change) to log changes in a call's receive
window size as imposed by the peer through an ACK packet.
Signed-off-by: David Howells <dhowells@redhat.com>
Add a tracepoint (rxrpc_rx_proto) to record protocol errors in received
packets. The following changes are made:
(1) Add a function, __rxrpc_abort_eproto(), to note a protocol error on a
call and mark the call aborted. This is wrapped by
rxrpc_abort_eproto() that makes the why string usable in trace.
(2) Add trace_rxrpc_rx_proto() or rxrpc_abort_eproto() to protocol error
generation points, replacing rxrpc_abort_call() with the latter.
(3) Only send an abort packet in rxkad_verify_packet*() if we actually
managed to abort the call.
Note that a trace event is also emitted if a kernel user (e.g. afs) tries
to send data through a call when it's not in the transmission phase, though
it's not technically a receive event.
Signed-off-by: David Howells <dhowells@redhat.com>
This patch converts x86 to use proper folding of a new (fifth) page table level
with <asm-generic/pgtable-nop4d.h>.
That's a bit of a kitchen sink patch, but I don't see how to split it further
without hurting bisectability.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170317185515.8636-7-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The root device's issue flush trace is missing,
add it and tracing the result from submit.
Fixes d50aaeec90 ("f2fs: show actual device info in tracepoints")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
window. Namely powerpc broke as jump labels uses the two LSB bits as flags
in initialization. A check was added to make sure that all jump label
entries were 4 bytes aligned, but powerpc didn't work that way for modules.
Adding an alignment in the module linker script appeared to be the best
solution.
Jump labels also added an anonymous union to access those LSB bits as a
normal long. But because this structure had static initialization, it broke
older compilers that could not statically initialize anonymous unions
without brackets.
The command line parameter for setting function graph filter broke the
"EMPTY_HASH" descriptor by modifying it instead of creating a new hash to
hold the entries.
The command line parameter ftrace_graph_max_depth was added to allow its
setting at boot time. It uses existing code and only the command line hook
was added. This is not really a fix, but as it uses existing code without
affecting anything else, I added it to this release. It was ready before the
merge window closed, but I wanted to let it sit in linux-next for a couple
of days first.
-----BEGIN PGP SIGNATURE-----
iQExBAABCAAbBQJYvNrAFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
JGQIAMkayeZ0OCyYHRPR4EcCrdE3fATmt1huJWHrMPnT4/fLabL8XQqrOpnOBMq1
GFZb1SMkBmvGtAHF4GbvCxnIUfDQko6BTQAd8EMea1WM8+Kb66/BLgJawjWIU9I0
dNYre9ONgR2NOzkz6nfKRXnmy0lRcOweBb09YYGSzY11Md7d8T3T4TUrPNZdYrO9
8ZMbF4qRd9KLMRHcsWqvhWhBISxWnmtUSlthfweukKgDMy8OKpb7pR0ckjtYwsWX
RF41jqLqzSUqtd/nE2Sj/aT8XOP4pfrKEUuNM4SBj8q5jmNcZuqi8Q9wItu3LWR2
jqM/9UKTzaCr9cchwuvUC0i+jWc=
=kDql
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"There was some breakage with the changes for jump labels in the 4.11
merge window:
- powerpc broke as jump labels uses the two LSB bits as flags in
initialization.
A check was added to make sure that all jump label entries were 4
bytes aligned, but powerpc didn't work that way for modules. Adding
an alignment in the module linker script appeared to be the best
solution.
- Jump labels also added an anonymous union to access those LSB bits
as a normal long. But because this structure had static
initialization, it broke older compilers that could not statically
initialize anonymous unions without brackets.
- The command line parameter for setting function graph filter broke
the "EMPTY_HASH" descriptor by modifying it instead of creating a
new hash to hold the entries.
- The command line parameter ftrace_graph_max_depth was added to
allow its setting at boot time. It uses existing code and only the
command line hook was added.
This is not really a fix, but as it uses existing code without
affecting anything else, I added it to this release. It was ready
before the merge window closed, but I wanted to let it sit in
linux-next for a couple of days first"
* tag 'trace-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace/graph: Add ftrace_graph_max_depth kernel parameter
tracing: Add #undef to fix compile error
jump_label: Add comment about initialization order for anonymous unions
jump_label: Fix anonymous union initialization
module: set __jump_table alignment to 8
ftrace/graph: Do not modify the EMPTY_HASH for the function_graph filter
tracing: Fix code comment for ftrace_ops_get_func()
Pull networking fixes from David Miller:
1) Fix double-free in batman-adv, from Sven Eckelmann.
2) Fix packet stats for fast-RX path, from Joannes Berg.
3) Netfilter's ip_route_me_harder() doesn't handle request sockets
properly, fix from Florian Westphal.
4) Fix sendmsg deadlock in rxrpc, from David Howells.
5) Add missing RCU locking to transport hashtable scan, from Xin Long.
6) Fix potential packet loss in mlxsw driver, from Ido Schimmel.
7) Fix race in NAPI handling between poll handlers and busy polling,
from Eric Dumazet.
8) TX path in vxlan and geneve need proper RCU locking, from Jakub
Kicinski.
9) SYN processing in DCCP and TCP need to disable BH, from Eric
Dumazet.
10) Properly handle net_enable_timestamp() being invoked from IRQ
context, also from Eric Dumazet.
11) Fix crash on device-tree systems in xgene driver, from Alban Bedel.
12) Do not call sk_free() on a locked socket, from Arnaldo Carvalho de
Melo.
13) Fix use-after-free in netvsc driver, from Dexuan Cui.
14) Fix max MTU setting in bonding driver, from WANG Cong.
15) xen-netback hash table can be allocated from softirq context, so use
GFP_ATOMIC. From Anoob Soman.
16) Fix MAC address change bug in bgmac driver, from Hari Vyas.
17) strparser needs to destroy strp_wq on module exit, from WANG Cong.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits)
strparser: destroy workqueue on module exit
sfc: fix IPID endianness in TSOv2
sfc: avoid max() in array size
rds: remove unnecessary returned value check
rxrpc: Fix potential NULL-pointer exception
nfp: correct DMA direction in XDP DMA sync
nfp: don't tell FW about the reserved buffer space
net: ethernet: bgmac: mac address change bug
net: ethernet: bgmac: init sequence bug
xen-netback: don't vfree() queues under spinlock
xen-netback: keep a local pointer for vif in backend_disconnect()
netfilter: nf_tables: don't call nfnetlink_set_err() if nfnetlink_send() fails
netfilter: nft_set_rbtree: incorrect assumption on lower interval lookups
netfilter: nf_conntrack_sip: fix wrong memory initialisation
can: flexcan: fix typo in comment
can: usb_8dev: Fix memory leak of priv->cmd_msg_buffer
can: gs_usb: fix coding style
can: gs_usb: Don't use stack memory for USB transfers
ixgbe: Limit use of 2K buffers on architectures with 256B or larger cache lines
ixgbe: update the rss key on h/w, when ethtool ask for it
...
There are several trace include files that define TRACE_INCLUDE_FILE.
Include several of them in the same .c file (as I currently have in
some code I am working on), and the compile will blow up with a
"warning: "TRACE_INCLUDE_FILE" redefined #define TRACE_INCLUDE_FILE syscalls"
Every other include file in include/trace/events/ avoids that issue
by having a #undef TRACE_INCLUDE_FILE before the #define; syscalls.h
should have one, too.
Link: http://lkml.kernel.org/r/20160928225554.13bd7ac6@annuminas.surriel.com
Cc: stable@vger.kernel.org
Fixes: b8007ef742 ("tracing: Separate raw syscall from syscall tracer")
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
We are going to split <linux/sched/numa_balancing.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/numa_balancing.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This round introduces several interesting features such as on-disk NAT bitmaps,
IO alignment, and a discard thread. And it includes a couple of major bug fixes
as below.
== Enhancement ==
- introduce on-disk bitmaps to avoid scanning NAT blocks when getting free nids
- support IO alignment to prepare open-channel SSD integration in future
- introduce a discard thread to avoid long latency during checkpoint and fstrim
- use SSR for warm node and enable inline_xattr by default
- introduce in-memory bitmaps to check FS consistency for debugging
- improve write_begin by avoiding needless read IO
== Bug fix ==
- fix broken zone_reset behavior for SMR drive
- fix wrong victim selection policy during GC
- fix missing behavior when preparing discard commands
- fix bugs in atomic write support and fiemap
- workaround to handle multiple f2fs_add_link calls having same name
And it includes a bunch of clean-up patches as well.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJYtmdOAAoJEEAUqH6CSFDSs0UP/AzngT37xVIhVBD13J9oHIuv
rFA/eHVGRJmU1xc4SG1bghKm45xq8rwUX7irarfvLLc5aL+6VPGSdaRBykUr4A5N
MN/bgK//EPp7If8EF+8PpY+9x7g67i0mtz5iD8dDrK+bUKV/IDKV1LWw5pR3g/g6
RwMH0dUVOiD/HJ5iFp1ykTdVPe4vFY013uVmyPxUq+nCBlqlQm1nOvrGjF/HeYyX
kqcD2LEc79GPfS5ebQIKfCfLE0rsWVnnS6YaqlDNCD5/oRim71CUtA4MPTYv29vp
R/SebWlayEm+u68+uQUu6AyIk/1IdP0+AtRuQd/VxuteoyXmkTMHER662DqN4F8J
npPdNrbNdlzwuAP77avy+hplqbD19yUa7o7Fl1No5rfheT3CiNTSj2uoriyEAffH
1AM6tES7S7n5ttrXOr9iOxrK0u/vuaf7fbKVtK+RI09hwzdvyGB5HUdQB0iP/XR+
obw8dru79ISMVZ9YuDhSfjI5ohAcfthfuqgjUt2RAfDv19IRsg5eayAp3T6nUfEX
AGQbV/52dkO9svZztMbcBW95zmqkE0cMeX66KIMCPXNuDiE474t8k115K6kHpFwP
e4Kx+mTSNhR1LEAaVdmCjbLb0gVrumVHTdjaZopnxTFmE70u/M6h1vY90m1LkReF
ZDK5mhfMmGzU4wkvbgP8
=tw8c
-----END PGP SIGNATURE-----
Merge tag 'for-f2fs-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"This round introduces several interesting features such as on-disk NAT
bitmaps, IO alignment, and a discard thread. And it includes a couple
of major bug fixes as below.
Enhancements:
- introduce on-disk bitmaps to avoid scanning NAT blocks when getting
free nids
- support IO alignment to prepare open-channel SSD integration in
future
- introduce a discard thread to avoid long latency during checkpoint
and fstrim
- use SSR for warm node and enable inline_xattr by default
- introduce in-memory bitmaps to check FS consistency for debugging
- improve write_begin by avoiding needless read IO
Bug fixes:
- fix broken zone_reset behavior for SMR drive
- fix wrong victim selection policy during GC
- fix missing behavior when preparing discard commands
- fix bugs in atomic write support and fiemap
- workaround to handle multiple f2fs_add_link calls having same name
... and it includes a bunch of clean-up patches as well"
* tag 'for-f2fs-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (97 commits)
f2fs: avoid to flush nat journal entries
f2fs: avoid to issue redundant discard commands
f2fs: fix a plint compile warning
f2fs: add f2fs_drop_inode tracepoint
f2fs: Fix zoned block device support
f2fs: remove redundant set_page_dirty()
f2fs: fix to enlarge size of write_io_dummy mempool
f2fs: fix memory leak of write_io_dummy mempool during umount
f2fs: fix to update F2FS_{CP_}WB_DATA count correctly
f2fs: use MAX_FREE_NIDS for the free nids target
f2fs: introduce free nid bitmap
f2fs: new helper cur_cp_crc() getting crc in f2fs_checkpoint
f2fs: update the comment of default nr_pages to skipping
f2fs: drop the duplicate pval in f2fs_getxattr
f2fs: Don't update the xattr data that same as the exist
f2fs: kill __is_extent_same
f2fs: avoid bggc->fggc when enough free segments are avaliable after cp
f2fs: select target segment with closer temperature in SSR mode
f2fs: show simple call stack in fault injection message
f2fs: no need lock_op in f2fs_write_inline_data
...
All the routines by which rxrpc is accessed from the outside are serialised
by means of the socket lock (sendmsg, recvmsg, bind,
rxrpc_kernel_begin_call(), ...) and this presents a problem:
(1) If a number of calls on the same socket are in the process of
connection to the same peer, a maximum of four concurrent live calls
are permitted before further calls need to wait for a slot.
(2) If a call is waiting for a slot, it is deep inside sendmsg() or
rxrpc_kernel_begin_call() and the entry function is holding the socket
lock.
(3) sendmsg() and recvmsg() or the in-kernel equivalents are prevented
from servicing the other calls as they need to take the socket lock to
do so.
(4) The socket is stuck until a call is aborted and makes its slot
available to the waiter.
Fix this by:
(1) Provide each call with a mutex ('user_mutex') that arbitrates access
by the users of rxrpc separately for each specific call.
(2) Make rxrpc_sendmsg() and rxrpc_recvmsg() unlock the socket as soon as
they've got a call and taken its mutex.
Note that I'm returning EWOULDBLOCK from recvmsg() if MSG_DONTWAIT is
set but someone else has the lock. Should I instead only return
EWOULDBLOCK if there's nothing currently to be done on a socket, and
sleep in this particular instance because there is something to be
done, but we appear to be blocked by the interrupt handler doing its
ping?
(3) Make rxrpc_new_client_call() unlock the socket after allocating a new
call, locking its user mutex and adding it to the socket's call tree.
The call is returned locked so that sendmsg() can add data to it
immediately.
From the moment the call is in the socket tree, it is subject to
access by sendmsg() and recvmsg() - even if it isn't connected yet.
(4) Lock new service calls in the UDP data_ready handler (in
rxrpc_new_incoming_call()) because they may already be in the socket's
tree and the data_ready handler makes them live immediately if a user
ID has already been preassigned.
Note that the new call is locked before any notifications are sent
that it is live, so doing mutex_trylock() *ought* to always succeed.
Userspace is prevented from doing sendmsg() on calls that are in a
too-early state in rxrpc_do_sendmsg().
(5) Make rxrpc_new_incoming_call() return the call with the user mutex
held so that a ping can be scheduled immediately under it.
Note that it might be worth moving the ping call into
rxrpc_new_incoming_call() and then we can drop the mutex there.
(6) Make rxrpc_accept_call() take the lock on the call it is accepting and
release the socket after adding the call to the socket's tree. This
is slightly tricky as we've dequeued the call by that point and have
to requeue it.
Note that requeuing emits a trace event.
(7) Make rxrpc_kernel_send_data() and rxrpc_kernel_recv_data() take the
new mutex immediately and don't bother with the socket mutex at all.
This patch has the nice bonus that calls on the same socket are now to some
extent parallelisable.
Note that we might want to move rxrpc_service_prealloc() calls out from the
socket lock and give it its own lock, so that we don't hang progress in
other calls because we're waiting for the allocator.
We probably also want to avoid calling rxrpc_notify_socket() from within
the socket lock (rxrpc_accept_call()).
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.c.dionne@auristor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
and small optimizations.
-----BEGIN PGP SIGNATURE-----
iQExBAABCAAbBQJYtDiAFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
KygH/3sxuM9MCeJ29JsjmV49fHcNqryNZdvSadmnysPm+dFPiI6IgIIbh5R8H89b
2V2gfQSmOTKHu3/wvJr/MprkGP275sWlZPORYFLDl/+NE/3q7g0NKOMWunLcv6dH
QQRJIFjSMeGawA3KYBEcwBYMlgNd2VgtTxqLqSBhWth5omV6UevJNHhe3xzZ4nEE
YbRX2mxwOuRHOyFp0Hem+Bqro4z1VXJ6YDxOvae2PP8krrIhIHYw9EI22GK68a2g
EyKqKPPaEzfU8IjHIQCqIZta5RufnCrDbfHU0CComPANBRGO7g+ZhLO11a/Z316N
lyV7JqtF680iem7NKcQlwEwhlLE=
=HJnl
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"This release has no new tracing features, just clean ups, minor fixes
and small optimizations"
* tag 'trace-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (25 commits)
tracing: Remove outdated ring buffer comment
tracing/probes: Fix a warning message to show correct maximum length
tracing: Fix return value check in trace_benchmark_reg()
tracing: Use modern function declaration
jump_label: Reduce the size of struct static_key
tracing/probe: Show subsystem name in messages
tracing/hwlat: Update old comment about migration
timers: Make flags output in the timer_start tracepoint useful
tracing: Have traceprobe_probes_write() not access userspace unnecessarily
tracing: Have COMM event filter key be treated as a string
ftrace: Have set_graph_function handle multiple functions in one write
ftrace: Do not hold references of ftrace_graph_{notrace_}hash out of graph_lock
tracing: Reset parser->buffer to allow multiple "puts"
ftrace: Have set_graph_functions handle write with RDWR
ftrace: Reset fgd->hash in ftrace_graph_write()
ftrace: Replace (void *)1 with a meaningful macro name FTRACE_GRAPH_EMPTY
ftrace: Create a slight optimization on searching the ftrace_hash
tracing: Add ftrace_hash_key() helper function
ftrace: Convert graph filter to use hash tables
ftrace: Expose ftrace_hash_empty and ftrace_lookup_ip
...
Pull btrfs updates from Chris Mason:
"This has a series of fixes and cleanups that Dave Sterba has been
collecting.
There is a pretty big variety here, cleaning up internal APIs and
fixing corner cases"
* 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (124 commits)
Btrfs: use the correct type when creating cow dio extent
Btrfs: fix deadlock between dedup on same file and starting writeback
btrfs: use btrfs_debug instead of pr_debug in transaction abort
btrfs: btrfs_truncate_free_space_cache always allocates path
btrfs: free-space-cache, clean up unnecessary root arguments
btrfs: convert btrfs_inc_block_group_ro to accept fs_info
btrfs: flush_space always takes fs_info->fs_root
btrfs: pass fs_info to (more) routines that are only called with extent_root
btrfs: qgroup: Move half of the qgroup accounting time out of commit trans
btrfs: remove unused parameter from adjust_slots_upwards
btrfs: remove unused parameters from __btrfs_write_out_cache
btrfs: remove unused parameter from cleanup_write_cache_enospc
btrfs: remove unused parameter from __add_inode_ref
btrfs: remove unused parameter from clone_copy_inline_extent
btrfs: remove unused parameters from btrfs_cmp_data
btrfs: remove unused parameter from __add_inline_refs
btrfs: remove unused parameters from scrub_setup_wr_ctx
btrfs: remove unused parameter from create_snapshot
btrfs: remove unused parameter from init_first_rw_device
btrfs: remove unused parameter from __btrfs_alloc_chunk
...
Memory pressure can put dirty pages at the end of the LRU without
anybody running into dirty limits. Don't start writing individual pages
from kswapd while the flushers might be asleep.
Unlike the old direct reclaim flusher wakeup (removed in the next patch)
that flushes the number of pages just scanned, this patch wakes the
flushers for all outstanding dirty pages. That seemed to perform better
in a synthetic test that pushes dirty pages to the end of the LRU and
into reclaim, because we know LRU aging outstrips writeback already, and
this way we give younger dirty pages a headstart rather than wait until
reclaim runs into them as well. It also means less plugging and risk of
exhausting the struct request pool from reclaim.
There is a concern that this will cause temporary files that used to get
dirtied and truncated before writeback to now get written to disk under
memory pressure. If this turns out to be a real problem, we'll have to
revisit this and tame the reclaim flusher wakeups.
[hannes@cmpxchg.org: mention dirty expiration as a condition]
Link: http://lkml.kernel.org/r/20170126174739.GA30636@cmpxchg.org
Link: http://lkml.kernel.org/r/20170123181641.23938-3-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to flush data writes before flushing last node block writes by using
FUA with PREFLUSH. We don't need to guarantee precedent node writes since if
those are not written, we can't reach to the last node block when scanning
node block chain during roll-forward recovery.
Afterwards f2fs_wait_on_page_writeback guarantees all the IO submission to
disk, which builds a valid node block chain.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Currently we have tracepoints for both active and inactive LRU lists
reclaim but we do not have any which would tell us why we we decided to
age the active list. Without that it is quite hard to diagnose
active/inactive lists balancing. Add mm_vmscan_inactive_list_is_low
tracepoint to tell us this information.
Link: http://lkml.kernel.org/r/20170104101942.4860-8-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm_vmscan_lru_shrink_inactive will currently report the number of
scanned and reclaimed pages. This doesn't give us an idea how the
reclaim went except for the overall effectiveness though. Export and
show other counters which will tell us why we couldn't reclaim some
pages.
- nr_dirty, nr_writeback, nr_congested and nr_immediate tells
us how many pages are blocked due to IO
- nr_activate tells us how many pages were moved to the active
list
- nr_ref_keep reports how many pages are kept on the LRU due
to references (mostly for the file pages which are about to
go for another round through the inactive list)
- nr_unmap_fail - how many pages failed to unmap
All these are rather low level so they might change in future but the
tracepoint is already implementation specific so no tools should be
depending on its stability.
Link: http://lkml.kernel.org/r/20170104101942.4860-7-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm_vmscan_lru_isolate currently prints only whether the LRU we isolate
from is file or anonymous but we do not know which LRU this is.
It is useful to know whether the list is active or inactive, since we
are using the same function to isolate pages from both of them and it's
hard to distinguish otherwise.
Link: http://lkml.kernel.org/r/20170104101942.4860-5-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm_vmscan_lru_isolate shows the number of requested, scanned and taken
pages. This is mostly OK but on 32b systems the number of scanned pages
is quite misleading because it includes both the scanned and skipped
pages. Moreover the skipped part is scaled based on the number of taken
pages. Let's report the exact numbers without any additional logic and
add the number of skipped pages.
This should make the reported data much more easier to interpret.
Link: http://lkml.kernel.org/r/20170104101942.4860-4-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Our reclaim process has several tracepoints to tell us more about how
things are progressing. We are, however, missing a tracepoint to track
active list aging. Introduce mm_vmscan_lru_shrink_active which reports
the number of
- nr_taken is number of isolated pages from the active list
- nr_referenced pages which tells us that we are hitting referenced
pages which are deactivated. If this is a large part of the
reported nr_deactivated pages then we might be hitting into
the active list too early because they might be still part of
the working set. This might help to debug performance issues.
- nr_active pages which tells us how many pages are kept on the
active list - mostly exec file backed pages. A high number can
indicate that we might be trashing on executables.
[mhocko@suse.com: update]
Link: http://lkml.kernel.org/r/20170104135244.GJ25453@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20170104101942.4860-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "vm, vmscan: enahance vmscan tracepoints", v2.
While debugging [2] I've realized that there is some room for
improvements in the tracepoints set we offer currently. I had hard
times to make any conclusion from the existing ones. The resulting
problem turned out to be active list aging [3] and we are missing at
least two tracepoints to debug such a problem.
Some existing tracepoints could export more information to see _why_ the
reclaim progress cannot be made not only _how much_ we could reclaim.
The later could be seen quite reasonably from the vmstat counters
already. It can be argued that we are showing too many implementation
details in those tracepoints but I consider them way too lowlevel
already to be usable by any kernel independent userspace. I would be
_really_ surprised if anything but debugging tools have used them.
Any feedback is highly appreciated.
[1] http://lkml.kernel.org/r/20161228153032.10821-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/20161215225702.GA27944@boerne.fritz.box
[3] http://lkml.kernel.org/r/20161223105157.GB23109@dhcp22.suse.cz
This patch (of 8):
The trace point is not used since 925b7673cc ("mm: make per-memcg LRU
lists exclusive") so it can be removed.
Link: http://lkml.kernel.org/r/20170104101942.4860-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Higher order requests oom debugging is currently quite hard. We do have
some compaction points which can tell us how the compaction is operating
but there is no trace point to tell us about compaction retry logic.
This patch adds a one which will have the following format
bash-3126 [001] .... 1498.220001: compact_retry: order=9 priority=COMPACT_PRIO_SYNC_LIGHT compaction_result=withdrawn retries=0 max_retries=16 should_retry=0
we can see that the order 9 request is not retried even though we are in
the highest compaction priority mode becase the last compaction attempt
was withdrawn. This means that compaction_zonelist_suitable must have
returned false and there is no suitable zone to compact for this request
and so no need to retry further.
another example would be
<...>-3137 [001] .... 81.501689: compact_retry: order=9 priority=COMPACT_PRIO_SYNC_LIGHT compaction_result=failed retries=0 max_retries=16 should_retry=0
in this case the order-9 compaction failed to find any suitable block.
We do not retry anymore because this is a costly request and those do
not go below COMPACT_PRIO_SYNC_LIGHT priority.
Link: http://lkml.kernel.org/r/20161220130135.15719-4-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
should_reclaim_retry is the central decision point for declaring the
OOM. It might be really useful to expose data used for this decision
making when debugging an unexpected oom situations.
Say we have an OOM report:
[ 52.264001] mem_eater invoked oom-killer: gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=0, order=0, oom_score_adj=0
[ 52.267549] CPU: 3 PID: 3148 Comm: mem_eater Tainted: G W 4.8.0-oomtrace3-00006-gb21338b386d2 #1024
Now we can check the tracepoint data to see how we have ended up in this
situation:
mem_eater-3148 [003] .... 52.432801: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11134 min_wmark=11084 no_progress_loops=1 wmark_check=1
mem_eater-3148 [003] .... 52.433269: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11103 min_wmark=11084 no_progress_loops=1 wmark_check=1
mem_eater-3148 [003] .... 52.433712: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11100 min_wmark=11084 no_progress_loops=2 wmark_check=1
mem_eater-3148 [003] .... 52.434067: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11097 min_wmark=11084 no_progress_loops=3 wmark_check=1
mem_eater-3148 [003] .... 52.434414: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11094 min_wmark=11084 no_progress_loops=4 wmark_check=1
mem_eater-3148 [003] .... 52.434761: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11091 min_wmark=11084 no_progress_loops=5 wmark_check=1
mem_eater-3148 [003] .... 52.435108: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11087 min_wmark=11084 no_progress_loops=6 wmark_check=1
mem_eater-3148 [003] .... 52.435478: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11084 min_wmark=11084 no_progress_loops=7 wmark_check=0
mem_eater-3148 [003] .... 52.435478: reclaim_retry_zone: node=0 zone=DMA order=0 reclaimable=0 available=1126 min_wmark=179 no_progress_loops=7 wmark_check=0
The above shows that we can quickly deduce that the reclaim stopped
making any progress (see no_progress_loops increased in each round) and
while there were still some 51 reclaimable pages they couldn't be
dropped for some reason (vmscan trace points would tell us more about
that part). available will represent reclaimable + free_pages scaled
down per no_progress_loops factor. This is essentially an optimistic
estimate of how much memory we would have when reclaiming everything.
This can be compared to min_wmark to get a rought idea but the
wmark_check tells the result of the watermark check which is more
precise (includes lowmem reserves, considers the order etc.). As we can
see no zone is eligible in the end and that is why we have triggered the
oom in this situation.
Please note that higher order requests might fail on the wmark_check
even when there is much more memory available than min_wmark - e.g.
when the memory is fragmented. A follow up tracepoint will help to
debug those situations.
Link: http://lkml.kernel.org/r/20161220130135.15719-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
COMPACTION_STATUS resp. ZONE_TYPE are currently used to translate enum
compact_result resp. struct zone index into their symbolic names for an
easier post processing. The follow up patch would like to reuse this as
well. The code involves some preprocessor black magic which is better not
duplicated elsewhere so move it to a common mm tracing relate header.
Link: http://lkml.kernel.org/r/20161220130135.15719-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pmd_fault() and related functions really only need the vmf parameter since
the additional parameters are all included in the vmf struct. Remove the
additional parameter and simplify pmd_fault() and friends.
Link: http://lkml.kernel.org/r/1484085142-2297-8-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instead of passing in multiple parameters in the pmd_fault() handler,
a vmf can be passed in just like a fault() handler. This will simplify
code and remove the need for the actual pmd fault handlers to allocate a
vmf. Related functions are also modified to do the same.
[dave.jiang@intel.com: fix issue with xfs_tests stall when DAX option is off]
Link: http://lkml.kernel.org/r/148469861071.195597.3619476895250028518.stgit@djiang5-desk3.ch.intel.com
Link: http://lkml.kernel.org/r/1484085142-2297-7-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tracepoints are the standard way to capture debugging and tracing
information in many parts of the kernel, including the XFS and ext4
filesystems. Create a tracepoint header for FS DAX and add the first DAX
tracepoints to the PMD fault handler. This allows the tracing for DAX to
be done in the same way as the filesystem tracing so that developers can
look at them together and get a coherent idea of what the system is doing.
I added both an entry and exit tracepoint because future patches will add
tracepoints to child functions of dax_iomap_pmd_fault() like
dax_pmd_load_hole() and dax_pmd_insert_mapping(). We want those messages
to be wrapped by the parent function tracepoints so the code flow is more
easily understood. Having entry and exit tracepoints for faults also
allows us to easily see what filesystems functions were called during the
fault. These filesystem functions get executed via iomap_begin() and
iomap_end() calls, for example, and will have their own tracepoints.
For PMD faults we primarily want to understand the type of mapping, the
fault flags, the faulting address and whether it fell back to 4k faults.
If it fell back to 4k faults the tracepoints should let us understand why.
I named the new tracepoint header file "fs_dax.h" to allow for device DAX
to have its own separate tracing header in the same directory at some
point.
Here is an example output for these events from a successful PMD fault:
big-1441 [005] .... 32.582758: xfs_filemap_pmd_fault: dev 259:0 ino 0x1003
big-1441 [005] .... 32.582776: dax_pmd_fault: dev 259:0 ino 0x1003
shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10505000 vm_start 0x10200000 vm_end 0x10700000 pgoff 0x200 max_pgoff 0x1400
big-1441 [005] .... 32.583292: dax_pmd_fault_done: dev 259:0 ino 0x1003
shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10505000 vm_start 0x10200000 vm_end 0x10700000 pgoff 0x200 max_pgoff 0x1400 NOPAGE
Link: http://lkml.kernel.org/r/1484085142-2297-3-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "DAX tracepoints, mm argument simplification", v4.
This contains both my DAX tracepoint code and Dave Jiang's MM argument
simplifications. Dave's code was written with my tracepoint code as a
baseline, so it seemed simplest to keep them together in a single series.
This patch (of 7):
Add __print_flags_u64() and the helper trace_print_flags_seq_u64() in the
same spirit as __print_symbolic_u64() and trace_print_symbols_seq_u64().
These functions allow us to print symbols associated with flags that are
64 bits wide even on 32 bit machines.
These will be used by the DAX code so that we can print the flags set in a
pfn_t such as PFN_SG_CHAIN, PFN_SG_LAST, PFN_DEV and PFN_MAP.
Without this new function I was getting errors like the following when
compiling for i386:
include/linux/pfn_t.h:13:22: warning: large integer implicitly truncated to unsigned type [-Woverflow]
#define PFN_SG_CHAIN (1ULL << (BITS_PER_LONG_LONG - 1))
^
Link: http://lkml.kernel.org/r/1484085142-2297-2-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here is the "small" driver core patches for 4.11-rc1.
Not much here, some firmware documentation and self-test updates, a
debugfs code formatting issue, and a new feature for call_usermodehelper
to make it more robust on systems that want to lock it down in a more
secure way.
All of these have been linux-next for a while now with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWK2jKg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymCEACgozYuqZZ/TUGW0P3xVNi7fbfUWCEAn3nYExrc
XgevqeYOSKp2We6X/2JX
=aZ+5
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the "small" driver core patches for 4.11-rc1.
Not much here, some firmware documentation and self-test updates, a
debugfs code formatting issue, and a new feature for call_usermodehelper
to make it more robust on systems that want to lock it down in a more
secure way.
All of these have been linux-next for a while now with no reported
issues"
* tag 'driver-core-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
kernfs: handle null pointers while printing node name and path
Introduce STATIC_USERMODEHELPER to mediate call_usermodehelper()
Make static usermode helper binaries constant
kmod: make usermodehelper path a const string
firmware: revamp firmware documentation
selftests: firmware: send expected errors to /dev/null
selftests: firmware: only modprobe if driver is missing
platform: Print the resource range if device failed to claim
kref: prefer atomic_inc_not_zero to atomic_add_unless
debugfs: improve formatting of debugfs_real_fops()
Pull networking updates from David Miller:
"Highlights:
1) Support TX_RING in AF_PACKET TPACKET_V3 mode, from Sowmini
Varadhan.
2) Simplify classifier state on sk_buff in order to shrink it a bit.
From Willem de Bruijn.
3) Introduce SIPHASH and it's usage for secure sequence numbers and
syncookies. From Jason A. Donenfeld.
4) Reduce CPU usage for ICMP replies we are going to limit or
suppress, from Jesper Dangaard Brouer.
5) Introduce Shared Memory Communications socket layer, from Ursula
Braun.
6) Add RACK loss detection and allow it to actually trigger fast
recovery instead of just assisting after other algorithms have
triggered it. From Yuchung Cheng.
7) Add xmit_more and BQL support to mvneta driver, from Simon Guinot.
8) skb_cow_data avoidance in esp4 and esp6, from Steffen Klassert.
9) Export MPLS packet stats via netlink, from Robert Shearman.
10) Significantly improve inet port bind conflict handling, especially
when an application is restarted and changes it's setting of
reuseport. From Josef Bacik.
11) Implement TX batching in vhost_net, from Jason Wang.
12) Extend the dummy device so that VF (virtual function) features,
such as configuration, can be more easily tested. From Phil
Sutter.
13) Avoid two atomic ops per page on x86 in bnx2x driver, from Eric
Dumazet.
14) Add new bpf MAP, implementing a longest prefix match trie. From
Daniel Mack.
15) Packet sample offloading support in mlxsw driver, from Yotam Gigi.
16) Add new aquantia driver, from David VomLehn.
17) Add bpf tracepoints, from Daniel Borkmann.
18) Add support for port mirroring to b53 and bcm_sf2 drivers, from
Florian Fainelli.
19) Remove custom busy polling in many drivers, it is done in the core
networking since 4.5 times. From Eric Dumazet.
20) Support XDP adjust_head in virtio_net, from John Fastabend.
21) Fix several major holes in neighbour entry confirmation, from
Julian Anastasov.
22) Add XDP support to bnxt_en driver, from Michael Chan.
23) VXLAN offloads for enic driver, from Govindarajulu Varadarajan.
24) Add IPVTAP driver (IP-VLAN based tap driver) from Sainath Grandhi.
25) Support GRO in IPSEC protocols, from Steffen Klassert"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1764 commits)
Revert "ath10k: Search SMBIOS for OEM board file extension"
net: socket: fix recvmmsg not returning error from sock_error
bnxt_en: use eth_hw_addr_random()
bpf: fix unlocking of jited image when module ronx not set
arch: add ARCH_HAS_SET_MEMORY config
net: napi_watchdog() can use napi_schedule_irqoff()
tcp: Revert "tcp: tcp_probe: use spin_lock_bh()"
net/hsr: use eth_hw_addr_random()
net: mvpp2: enable building on 64-bit platforms
net: mvpp2: switch to build_skb() in the RX path
net: mvpp2: simplify MVPP2_PRS_RI_* definitions
net: mvpp2: fix indentation of MVPP2_EXT_GLOBAL_CTRL_DEFAULT
net: mvpp2: remove unused register definitions
net: mvpp2: simplify mvpp2_bm_bufs_add()
net: mvpp2: drop useless fields in mvpp2_bm_pool and related code
net: mvpp2: remove unused 'tx_skb' field of 'struct mvpp2_tx_queue'
net: mvpp2: release reference to txq_cpu[] entry after unmapping
net: mvpp2: handle too large value in mvpp2_rx_time_coal_set()
net: mvpp2: handle too large value handling in mvpp2_rx_pkts_coal_set()
net: mvpp2: remove useless arguments in mvpp2_rx_{pkts, time}_coal_set
...
This update includes the usual round of major driver updates (ncr5380,
ufs, lpfc, be2iscsi, hisi_sas, storvsc, cxlflash, aacraid,
megaraid_sas, ). There's also an assortment of minor fixes and the
major update of switching a bunch of drivers to pci_alloc_irq_vectors
from Christoph.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJYq5adAAoJEAVr7HOZEZN4bjUP/Atk7CSZVnC75pcYmncbEGCx
ysOlEHK4uW2HhiAYk3PlYMk+pKrMHet2zsbbM9PHJfopdOHZ7Sq1+UZZVeqE1Zun
8pe0NhON+fZx7XAnevdEvnSSULQZ+AGfjZO72iUwkJiN3ozYaFtCITOyn49l4GpR
ra9emskBh7CQOFW2voGn1AKeDijPYGx3+TO4AUrWjVMiByR06gb1bmImx+ljiUrs
jzRJPfrt90ORcTdpMateyN2EXxudcASMhX03SJ6fRI84hPAhMCROMbTv8RnzOTE4
DPbnvbYUowlHt43iUhJHSwGdkRRaRBnkzQENBp1fNrNzZgF6vB7+kShxbonrYB2p
gC4ewaJr0BNj+HsUnvTpe3WseiPOcfsnBsKilPLKBlm2dCKEXqFox/dj/T1uexxg
HoyFrl3u8fyEqVHrzRS4M9t/njWh0NFmXxb0wBdj+lkVFTRErGSKQ8SfOqshuSGs
P8NN88jy8vC7uqgzKBJ+UH3ehzn3qfBxasFHIC/e2awY9FqKjHGTxKMmSVpjXVxy
wCvE2FQ3k/qEj2XSM6f7/NGytlSOlju5q1rFtHPW2M+TFSh0LJWCnmVjR/Zle9em
pBWmtIgCv8W5b41zL2H94nLWAZbfdrrNU/XnX88l47LKnmorte/PGhpxu36NEsMS
VCgreQmFMdMRY+WzDWl1
=cBQx
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This update includes the usual round of major driver updates (ncr5380,
ufs, lpfc, be2iscsi, hisi_sas, storvsc, cxlflash, aacraid,
megaraid_sas, ...).
There's also an assortment of minor fixes and the major update of
switching a bunch of drivers to pci_alloc_irq_vectors from Christoph"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (188 commits)
scsi: megaraid_sas: handle dma_addr_t right on 32-bit
scsi: megaraid_sas: array overflow in megasas_dump_frame()
scsi: snic: switch to pci_irq_alloc_vectors
scsi: megaraid_sas: driver version upgrade
scsi: megaraid_sas: Change RAID_1_10_RMW_CMDS to RAID_1_PEER_CMDS and set value to 2
scsi: megaraid_sas: Indentation and smatch warning fixes
scsi: megaraid_sas: Cleanup VD_EXT_DEBUG and SPAN_DEBUG related debug prints
scsi: megaraid_sas: Increase internal command pool
scsi: megaraid_sas: Use synchronize_irq to wait for IRQs to complete
scsi: megaraid_sas: Bail out the driver load if ld_list_query fails
scsi: megaraid_sas: Change build_mpt_mfi_pass_thru to return void
scsi: megaraid_sas: During OCR, if get_ctrl_info fails do not continue with OCR
scsi: megaraid_sas: Do not set fp_possible if TM capable for non-RW syspdIO, change fp_possible to bool
scsi: megaraid_sas: Remove unused pd_index from megasas_build_ld_nonrw_fusion
scsi: megaraid_sas: megasas_return_cmd does not memset IO frame to zero
scsi: megaraid_sas: max_fw_cmds are decremented twice, remove duplicate
scsi: megaraid_sas: update can_queue only if the new value is less
scsi: megaraid_sas: Change max_cmd from u32 to u16 in all functions
scsi: megaraid_sas: set pd_after_lb from MR_BuildRaidContext and initialize pDevHandle to MR_DEVHANDLE_INVALID
scsi: megaraid_sas: latest controller OCR capability from FW before sending shutdown DCMD
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJYqeb8AAoJEPfTWPspceCmB3UP/3UtcPrzEm8w2cxB9MaWhZN3
J+jiwlO4vaqhm2HVzQtoJqfaqRlud/iDx5cIXE2S7FnIM54ZKs3CANbKu8X+b1zm
eJije3zMI8A8qyftigbz6a/Y2kWE4ZqFEc9WU5CWawfTl3ImCVUi8+F5X0wOLU/h
r50zAQOEyURH4G5usNl9q0olF6FonJ82AcYm1iJ0QP2wYWZRJauC0rRn8IT93tyK
bZPHnGKdkd7km8yi3zr2GNWOfuZZuA0HWAaF4qfrHPZQ883gITFAUIlFb1f+2TNl
DkQzRrBB2wPWPnlbfb9KejMkvL94hflzsLb5rHt835DyVXFRyjxsgyAI8A+LPGSz
vqZ3rsbWj6H4F9z2CkZ+T+AP/ZSWDNjwc0RXPm9HYdR5CDeTxIUVvnFQ44YNsmTv
Xd5BKrUJ2oKegAxQG6zcuFx23p8JzhT70l+mNrMdtyeKnDD9FRdDvhKG9AHeTipn
o/DnGivhS3UMQoQ7D68KOO+kuhLDeo7my5XGsnjzMO/iHqg++7IP2HyYYs/Ba4qZ
cYaCtSDQW71Zt0vsqa6dvPuXBveu4h8Qh8R7uAGjSGS9IAFFb4Cab2tiUdISE6PE
YnMWzY+G6pT8imlLVOL5/QFuo2Q4pUsaL0AHpXMCN9TZnQtbqXa8eqwnKnQ0m2KN
7ut0IYYEPaYUX5xFn1K6
=z7AL
-----END PGP SIGNATURE-----
Merge tag 'for-4.11/linus-merge-signed' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
- blk-mq scheduling framework from me and Omar, with a port of the
deadline scheduler for this framework. A port of BFQ from Paolo is in
the works, and should be ready for 4.12.
- Various fixups and improvements to the above scheduling framework
from Omar, Paolo, Bart, me, others.
- Cleanup of the exported sysfs blk-mq data into debugfs, from Omar.
This allows us to export more information that helps debug hangs or
performance issues, without cluttering or abusing the sysfs API.
- Fixes for the sbitmap code, the scalable bitmap code that was
migrated from blk-mq, from Omar.
- Removal of the BLOCK_PC support in struct request, and refactoring of
carrying SCSI payloads in the block layer. This cleans up the code
nicely, and enables us to kill the SCSI specific parts of struct
request, shrinking it down nicely. From Christoph mainly, with help
from Hannes.
- Support for ranged discard requests and discard merging, also from
Christoph.
- Support for OPAL in the block layer, and for NVMe as well. Mainly
from Scott Bauer, with fixes/updates from various others folks.
- Error code fixup for gdrom from Christophe.
- cciss pci irq allocation cleanup from Christoph.
- Making the cdrom device operations read only, from Kees Cook.
- Fixes for duplicate bdi registrations and bdi/queue life time
problems from Jan and Dan.
- Set of fixes and updates for lightnvm, from Matias and Javier.
- A few fixes for nbd from Josef, using idr to name devices and a
workqueue deadlock fix on receive. Also marks Josef as the current
maintainer of nbd.
- Fix from Josef, overwriting queue settings when the number of
hardware queues is updated for a blk-mq device.
- NVMe fix from Keith, ensuring that we don't repeatedly mark and IO
aborted, if we didn't end up aborting it.
- SG gap merging fix from Ming Lei for block.
- Loop fix also from Ming, fixing a race and crash between setting loop
status and IO.
- Two block race fixes from Tahsin, fixing request list iteration and
fixing a race between device registration and udev device add
notifiations.
- Double free fix from cgroup writeback, from Tejun.
- Another double free fix in blkcg, from Hou Tao.
- Partition overflow fix for EFI from Alden Tondettar.
* tag 'for-4.11/linus-merge-signed' of git://git.kernel.dk/linux-block: (156 commits)
nvme: Check for Security send/recv support before issuing commands.
block/sed-opal: allocate struct opal_dev dynamically
block/sed-opal: tone down not supported warnings
block: don't defer flushes on blk-mq + scheduling
blk-mq-sched: ask scheduler for work, if we failed dispatching leftovers
blk-mq: don't special case flush inserts for blk-mq-sched
blk-mq-sched: don't add flushes to the head of requeue queue
blk-mq: have blk_mq_dispatch_rq_list() return if we queued IO or not
block: do not allow updates through sysfs until registration completes
lightnvm: set default lun range when no luns are specified
lightnvm: fix off-by-one error on target initialization
Maintainers: Modify SED list from nvme to block
Move stack parameters for sed_ioctl to prevent oversized stack with CONFIG_KASAN
uapi: sed-opal fix IOW for activate lsp to use correct struct
cdrom: Make device operations read-only
elevator: fix loading wrong elevator type for blk-mq devices
cciss: switch to pci_irq_alloc_vectors
block/loop: fix race between I/O and set_status
blk-mq-sched: don't hold queue_lock when calling exit_icq
block: set make_request_fn manually in blk_mq_update_nr_hw_queues
...
Pull scheduler updates from Ingo Molnar:
"The main changes in this (fairly busy) cycle were:
- There was a class of scheduler bugs related to forgetting to update
the rq-clock timestamp which can cause weird and hard to debug
problems, so there's a new debug facility for this: which uncovered
a whole lot of bugs which convinced us that we want to keep the
debug facility.
(Peter Zijlstra, Matt Fleming)
- Various cputime related updates: eliminate cputime and use u64
nanoseconds directly, simplify and improve the arch interfaces,
implement delayed accounting more widely, etc. - (Frederic
Weisbecker)
- Move code around for better structure plus cleanups (Ingo Molnar)
- Move IO schedule accounting deeper into the scheduler plus related
changes to improve the situation (Tejun Heo)
- ... plus a round of sched/rt and sched/deadline fixes, plus other
fixes, updats and cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (85 commits)
sched/core: Remove unlikely() annotation from sched_move_task()
sched/autogroup: Rename auto_group.[ch] to autogroup.[ch]
sched/topology: Split out scheduler topology code from core.c into topology.c
sched/core: Remove unnecessary #include headers
sched/rq_clock: Consolidate the ordering of the rq_clock methods
delayacct: Include <uapi/linux/taskstats.h>
sched/core: Clean up comments
sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds
sched/clock: Add dummy clear_sched_clock_stable() stub function
sched/cputime: Remove generic asm headers
sched/cputime: Remove unused nsec_to_cputime()
s390, sched/cputime: Remove unused cputime definitions
powerpc, sched/cputime: Remove unused cputime definitions
s390, sched/cputime: Make arch_cpu_idle_time() to return nsecs
ia64, sched/cputime: Remove unused cputime definitions
ia64: Convert vtime to use nsec units directly
ia64, sched/cputime: Move the nsecs based cputime headers to the last arch using it
sched/cputime: Remove jiffies based cputime
sched/cputime, vtime: Return nsecs instead of cputime_t to account
sched/cputime: Complete nsec conversion of tick based accounting
...
The timer flags in the timer_start trace event contain lots of useful
information, but the meaning is not clear in the trace output. Making tools
rely on the bit positions is bad as they might change over time.
Decode the flags in the print out. Tools can retrieve the bits and their
meaning from the trace format file.
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1702101639290.4036@nanos
Requested-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Currently btrfs_ino takes a struct inode and this causes a lot of
internal btrfs functions which consume this ino to take a VFS inode,
rather than btrfs' own struct btrfs_inode. In order to fix this "leak"
of VFS structs into the internals of btrfs first it's necessary to
eliminate all uses of struct inode for the purpose of inode. This patch
does that by using BTRFS_I to convert an inode to btrfs_inode. With
this problem eliminated subsequent patches will start eliminating the
passing of struct inode altogether, eventually resulting in a lot cleaner
code.
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
[ fix btrfs_get_extent tracepoint prototype ]
Signed-off-by: David Sterba <dsterba@suse.com>
Null kernfs nodes could be found at cgroups during construction.
It seems safer to handle these null pointers right in kernfs in
the same way as printf prints "(null)" for null pointer string.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Steven suggested to improve trace_print_hex_seq() a bit after commit
2acae0d5b0 ("trace: add variant without spacing in trace_print_hex_seq")
in two ways: i) by adding a kdoc comment for the helper function
itself and ii) by renaming 'spacing' argument into 'concatenate'
to better denote that we don't add spaces between each hex bytes.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the new nsec based cputime accessors as part of the whole cputime
conversion from cputime_t to nsecs.
Also convert itimers to use nsec based internal counters. This simplifies
it and removes the whole game with error/inc_error which served to deal
with cputime_t random granularity.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1485832191-26889-20-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch fixes wrong tracepoints in terms of op and op_flags.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
A couple tweaks to the tracing code:
- trace the request size for all requests
- trace request sector and nr_sectors only for fs requests, enforced by
helpers
- drop SCSI CDB tracing - we have SCSI tracing for this and are going
to me the CDB out of the generic struct request soon.
With this the tracing code stops to know about BLOCK_PC requests entirely,
it's just FS vs passthrough requests now, where the latter includes any
driver-private requests.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
This work adds a number of tracepoints to paths that are either
considered slow-path or exception-like states, where monitoring or
inspecting them would be desirable.
For bpf(2) syscall, tracepoints have been placed for main commands
when they succeed. In XDP case, tracepoint is for exceptions, that
is, f.e. on abnormal BPF program exit such as unknown or XDP_ABORTED
return code, or when error occurs during XDP_TX action and the packet
could not be forwarded.
Both have been split into separate event headers, and can be further
extended. Worst case, if they unexpectedly should get into our way in
future, they can also removed [1]. Of course, these tracepoints (like
any other) can be analyzed by eBPF itself, etc. Example output:
# ./perf record -a -e bpf:* sleep 10
# ./perf script
sock_example 6197 [005] 283.980322: bpf:bpf_map_create: map type=ARRAY ufd=4 key=4 val=8 max=256 flags=0
sock_example 6197 [005] 283.980721: bpf:bpf_prog_load: prog=a5ea8fa30ea6849c type=SOCKET_FILTER ufd=5
sock_example 6197 [005] 283.988423: bpf:bpf_prog_get_type: prog=a5ea8fa30ea6849c type=SOCKET_FILTER
sock_example 6197 [005] 283.988443: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[06 00 00 00] val=[00 00 00 00 00 00 00 00]
[...]
sock_example 6197 [005] 288.990868: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[01 00 00 00] val=[14 00 00 00 00 00 00 00]
swapper 0 [005] 289.338243: bpf:bpf_prog_put_rcu: prog=a5ea8fa30ea6849c type=SOCKET_FILTER
[1] https://lwn.net/Articles/705270/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For upcoming tracepoint support for BPF, we want to dump the program's
tag. Format should be similar to __print_hex(), but without spacing.
Add a __print_hex_str() variant for exactly that purpose that reuses
trace_print_hex_seq().
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 4a81e8328d ("rcu: Reduce overhead of cond_resched() checks
for RCU") moved quiescent-state generation out of cond_resched()
and commit bde6c3aa99 ("rcu: Provide cond_resched_rcu_qs() to force
quiescent states in long loops") introduced cond_resched_rcu_qs(), and
commit 5cd37193ce ("rcu: Make cond_resched_rcu_qs() apply to normal RCU
flavors") introduced the per-CPU rcu_qs_ctr variable, which is frequently
polled by the RCU core state machine.
This frequent polling can increase grace-period rate, which in turn
increases grace-period overhead, which is visible in some benchmarks
(for example, the "open1" benchmark in Anton Blanchard's "will it scale"
suite). This commit therefore reduces the rate at which rcu_qs_ctr
is polled by moving that polling into the force-quiescent-state (FQS)
machinery, and by further polling it only after the grace period has
been in effect for at least jiffies_till_sched_qs jiffies.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Pull btrfs fixes from Chris Mason:
"These are all over the place.
The tracepoint part of the pull fixes a crash and adds a little more
information to two tracepoints, while the rest are good old fashioned
fixes"
* 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: make tracepoint format strings more compact
Btrfs: add truncated_len for ordered extent tracepoints
Btrfs: add 'inode' for extent map tracepoint
btrfs: fix crash when tracepoint arguments are freed by wq callbacks
Btrfs: adjust outstanding_extents counter properly when dio write is split
Btrfs: fix lockdep warning about log_mutex
Btrfs: use down_read_nested to make lockdep silent
btrfs: fix locking when we put back a delayed ref that's too new
btrfs: fix error handling when run_delayed_extent_op fails
btrfs: return the actual error value from from btrfs_uuid_tree_iterate
The flag was introduced by commit 78afd5612d ("mm: add
__GFP_OTHER_NODE flag") to allow proper accounting of remote node
allocations done by kernel daemons on behalf of a process - e.g.
khugepaged.
After "mm: fix remote numa hits statistics" we do not need and actually
use the flag so we can safely remove it because all allocations which
are satisfied from their "home" node are accounted properly.
[mhocko@suse.com: fix build]
Link: http://lkml.kernel.org/r/20170106122225.GK5556@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20170102153057.9451-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQIVAwUAWHNwyPSw1s6N8H32AQKoqw//Wi8fpY/7SlQ8UT0RcF4KlBtfKux4dhMh
c4P2ARqEi3hVHz0MAJSYwhJDiXmPT8FboXq7yQmXj7DpkwDUgEHJlOZyoZFrStWC
hE72lbwD/m57jYgTG694wJZnGvTtqBEEkoMMIiUTSpEkSxB8aGsL+8dP9E6Q5hBS
ixLUHINdjaubsu+uzlI3MZdDk7TWBwp5fNekf4Jbjlb9anoICEkJsjZJHTR9n3nM
d9QpEbh42+YHAn2EFL8gXN+Cb7o75QppT3K+b68Pz43yvPgMLd78Q4tSN0aCo190
9ynR1szpniiw3T/xW0dGanpRjKLs7HZubTujc1oQ+TD1Q1Uh+2/nZWb9PxWAAe3S
CW+ssn6slv9IS+KXyoIMbDtyPaJOu1pMxYcFVXlZOAPXnYGl8P0A610f8u9833jT
OEqVKQ/bHAPiiTl2X/ATzCePhATtoYUq7jIc71pP01WK+o054bzm0r9Wyjxgs7g6
iPi4cfueZFOJMilkE9ZWuIws43YDv5wIEOWtpTkRCIHKCmkeVXkDfdRnnXhJCUeF
6y3iW0staR/pnTqI6g8LEnGku2gbteBQNCueYoJA5jsxLyl6oJw1Bur7yGTzzPnJ
SP+9+RBlyGI5EzIcqQWsReOhGY4U/hOWDtltYR/gmlhlQ2o/iO4U1aiN0qa1AiaH
3ixixVygYOA=
=H/FD
-----END PGP SIGNATURE-----
Merge tag 'rxrpc-rewrite-20170109' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
afs: Refcount afs_call struct
These patches provide some tracepoints for AFS and fix a potential leak by
adding refcounting to the afs_call struct.
The patches are:
(1) Add some tracepoints for logging incoming calls and monitoring
notifications from AF_RXRPC and data reception.
(2) Get rid of afs_wait_mode as it didn't turn out to be as useful as
initially expected. It can be brought back later if needed. This
clears some stuff out that I don't then need to fix up in (4).
(3) Allow listen(..., 0) to be used to disable listening. This makes
shutting down the AFS cache manager server in the kernel much easier
and the accounting simpler as we can then be sure that (a) all
preallocated afs_call structs are relesed and (b) no new incoming
calls are going to be started.
For the moment, listening cannot be reenabled.
(4) Add refcounting to the afs_call struct to fix a potential multiple
release detected by static checking and add a tracepoint to follow the
lifecycle of afs_call objects.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A static checker warning occurs in the AFS filesystem:
fs/afs/cmservice.c:155 SRXAFSCB_CallBack()
error: dereferencing freed memory 'call'
due to the reply being sent before we access the server it points to. The
act of sending the reply causes the call to be freed if an error occurs
(but not if it doesn't).
On top of this, the lifetime handling of afs_call structs is fragile
because they get passed around through workqueues without any sort of
refcounting.
Deal with the issues by:
(1) Fix the maybe/maybe not nature of the reply sending functions with
regards to whether they release the call struct.
(2) Refcount the afs_call struct and sort out places that need to get/put
references.
(3) Pass a ref through the work queue and release (or pass on) that ref in
the work function. Care has to be taken because a work queue may
already own a ref to the call.
(4) Do the cleaning up in the put function only.
(5) Simplify module cleanup by always incrementing afs_outstanding_calls
whenever a call is allocated.
(6) Set the backlog to 0 with kernel_listen() at the beginning of the
process of closing the socket to prevent new incoming calls from
occurring and to remove the contribution of preallocated calls from
afs_outstanding_calls before we wait on it.
A tracepoint is also added to monitor the afs_call refcount and lifetime.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Fixes: 08e0e7c82e: "[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC."
We've recently added the fsid to trace events, this makes the line quite
long. To reduce the it again, remove extra spaces around = and remove
",".
Signed-off-by: David Sterba <dsterba@suse.com>
This can help us monitor truncated ordered extents.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
'inode' is an important field for btrfs_get_extent, lets trace it.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Enabling btrfs tracepoints leads to instant crash, as reported. The wq
callbacks could free the memory and the tracepoints started to
dereference the members to get to fs_info.
The proposed fix https://marc.info/?l=linux-btrfs&m=148172436722606&w=2
removed the tracepoints but we could preserve them by passing only the
required data in a safe way.
Fixes: bc074524e1 ("btrfs: prefix fsid to all trace events")
CC: stable@vger.kernel.org # 4.8+
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add three tracepoints to the AFS filesystem:
(1) The afs_recv_data tracepoint logs data segments that are extracted
from the data received from the peer through afs_extract_data().
(2) The afs_notify_call tracepoint logs notification from AF_RXRPC of data
coming in to an asynchronous call.
(3) The afs_cb_call tracepoint logs incoming calls that have had their
operation ID extracted and mapped into a supported cache manager
service call.
To make (3) work, the name strings in the afs_call_type struct objects have
to be annotated with __tracepoint_string. This is done with the CM_NAME()
macro.
Further, the AFS call state enum needs a name so that it can be used to
declare parameter types.
Signed-off-by: David Howells <dhowells@redhat.com>
Pull swiotlb fixes from Konrad Rzeszutek Wilk:
"This has one fix to make i915 work when using Xen SWIOTLB, and a
feature from Geert to aid in debugging of devices that can't do DMA
outside the 32-bit address space.
The feature from Geert is on top of v4.10 merge window commit
(specifically you pulling my previous branch), as his changes were
dependent on the Documentation/ movement patches.
I figured it would just easier than me trying than to cherry-pick the
Documentation patches to satisfy git.
The patches have been soaking since 12/20, albeit I updated the last
patch due to linux-next catching an compiler error and adding an
Tested-and-Reported-by tag"
* 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
swiotlb: Export swiotlb_max_segment to users
swiotlb: Add swiotlb=noforce debug option
swiotlb: Convert swiotlb_force from int to enum
x86, swiotlb: Simplify pci_swiotlb_detect_override()
Use the ftrace infrastructure to conditionally trace ufs command events.
New trace event is created, which samples the following ufs command data:
- device name
- optional identification string
- task tag
- doorbell register
- number of transfer bytes
- interrupt status register
- request start LBA
- command opcode
Currently we only fully trace read(10) and write(10) commands.
All other commands which pass through ufshcd_send_command() will be
printed with "-1" in the lba and transfer_len fields.
Usage:
echo 1 > /sys/kernel/debug/tracing/events/ufs/enable
cat /sys/kernel/debug/tracing/trace_pipe
Signed-off-by: Lee Susman <lsusman@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch adds the profiling support for some of the time critical
operations like hibern8 enter/exit, clock gating & clock scaling.
Reviewed-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This change adds the ftrace support for following:
1. UFS initialization time
2. Clock gating states
3. Clock scaling states
4. Power management APIs latency
5. BKOPs enable/disable
Usage:
echo 1 > /sys/kernel/debug/tracing/events/ufs/enable
cat /sys/kernel/debug/tracing/trace_pipe
Reviewed-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add the following extra tracing information:
(1) Modify the rxrpc_transmit tracepoint to record the Tx window size as
this is varied by the slow-start algorithm.
(2) Modify the rxrpc_rx_ack tracepoint to record more information from
received ACK packets.
(3) Add an rxrpc_rx_data tracepoint to record the information in DATA
packets.
(4) Add an rxrpc_disconnect_call tracepoint to record call disconnection,
including the reason the call was disconnected.
(5) Add an rxrpc_improper_term tracepoint to record implicit termination
of a call by a client either by starting a new call on a particular
connection channel without first transmitting the final ACK for the
previous call.
Signed-off-by: David Howells <dhowells@redhat.com>
Fix the way enum values are translated into strings in AF_RXRPC
tracepoints. The problem with just doing a lookup in a normal flat array
of strings or chars is that external tracing infrastructure can't find it.
Rather, TRACE_DEFINE_ENUM must be used.
Also sort the enums and string tables to make it easier to keep them in
order so that a future patch to __print_symbolic() can be optimised to try
a direct lookup into the table first before iterating over it.
A couple of _proto() macro calls are removed because they refered to tables
that got moved to the tracing infrastructure. The relevant data can be
found by way of tracing.
Signed-off-by: David Howells <dhowells@redhat.com>
Pull timer type cleanups from Thomas Gleixner:
"This series does a tree wide cleanup of types related to
timers/timekeeping.
- Get rid of cycles_t and use a plain u64. The type is not really
helpful and caused more confusion than clarity
- Get rid of the ktime union. The union has become useless as we use
the scalar nanoseconds storage unconditionally now. The 32bit
timespec alike storage got removed due to the Y2038 limitations
some time ago.
That leaves the odd union access around for no reason. Clean it up.
Both changes have been done with coccinelle and a small amount of
manual mopping up"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ktime: Get rid of ktime_equal()
ktime: Cleanup ktime_set() usage
ktime: Get rid of the union
clocksource: Use a plain u64 instead of cycle_t
Add a new page flag, PageWaiters, to indicate the page waitqueue has
tasks waiting. This can be tested rather than testing waitqueue_active
which requires another cacheline load.
This bit is always set when the page has tasks on page_waitqueue(page),
and is set and cleared under the waitqueue lock. It may be set when
there are no tasks on the waitqueue, which will cause a harmless extra
wakeup check that will clears the bit.
The generic bit-waitqueue infrastructure is no longer used for pages.
Instead, waitqueues are used directly with a custom key type. The
generic code was not flexible enough to have PageWaiters manipulation
under the waitqueue lock (which simplifies concurrency).
This improves the performance of page lock intensive microbenchmarks by
2-3%.
Putting two bits in the same word opens the opportunity to remove the
memory barrier between clearing the lock bit and testing the waiters
bit, after some work on the arch primitives (e.g., ensuring memory
operand widths match and cover both bits).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A page is not added to the swap cache without being swap backed,
so PageSwapBacked mappings can use PG_owner_priv_1 for PageSwapCache.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.
Get rid of the union and just keep ktime_t as simple typedef of type s64.
The conversion was done with coccinelle and some manual mopping up.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
On architectures like arm64, swiotlb is tied intimately to the core
architecture DMA support. In addition, ZONE_DMA cannot be disabled.
To aid debugging and catch devices not supporting DMA to memory outside
the 32-bit address space, add a kernel command line option
"swiotlb=noforce", which disables the use of bounce buffers.
If specified, trying to map memory that cannot be used with DMA will
fail, and a rate-limited warning will be printed.
Note that io_tlb_nslabs is set to 1, which is the minimal supported
value.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Convert the flag swiotlb_force from an int to an enum, to prepare for
the advent of more possible values.
Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Pull btrfs updates from Chris Mason:
"Jeff Mahoney and Dave Sterba have a really nice set of cleanups in
here, and Christoph pitched in corrections/improvements to make btrfs
use proper helpers for bio walking instead of doing it by hand.
There are some key fixes as well, including some long standing bugs
that took forever to track down in btrfs_drop_extents and during
balance"
* 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (77 commits)
btrfs: limit async_work allocation and worker func duration
Revert "Btrfs: adjust len of writes if following a preallocated extent"
Btrfs: don't WARN() in btrfs_transaction_abort() for IO errors
btrfs: opencode chunk locking, remove helpers
btrfs: remove root parameter from transaction commit/end routines
btrfs: split btrfs_wait_marked_extents into normal and tree log functions
btrfs: take an fs_info directly when the root is not used otherwise
btrfs: simplify btrfs_wait_cache_io prototype
btrfs: convert extent-tree tracepoints to use fs_info
btrfs: root->fs_info cleanup, access fs_info->delayed_root directly
btrfs: root->fs_info cleanup, add fs_info convenience variables
btrfs: root->fs_info cleanup, update_block_group{,flags}
btrfs: root->fs_info cleanup, lock/unlock_chunks
btrfs: root->fs_info cleanup, btrfs_calc_{trans,trunc}_metadata_size
btrfs: pull node/sector/stripe sizes out of root and into fs_info
btrfs: root->fs_info cleanup, io_ctl_init
btrfs: root->fs_info cleanup, use fs_info->dev_root everywhere
btrfs: struct reada_control.root -> reada_control.fs_info
btrfs: struct btrfsic_state->root should be an fs_info
btrfs: alloc_reserved_file_extent trace point should use extent_root
...
o STM can hook into the function tracer
o Function filtering now supports more advance glob matching
o Ftrace selftests updates and added tests
o Softirq tag in traces now show only softirqs
o ARM nop added to non traced locations at compile time
o New trace_marker_raw file that allows for binary input
o Optimizations to the ring buffer
o Removal of kmap in trace_marker
o Wakeup and irqsoff tracers now adhere to the set_graph_notrace file
o Other various fixes and clean ups
Note, there are two patches marked for stable. These were discovered
near the end of the 4.9 rc release cycle. By the time I had them tested
it was just a matter of days before 4.9 would be released, and I
figured I would just submit them in the merge window. They are old
bugs and not critical. Nothing non-root could abuse.
-----BEGIN PGP SIGNATURE-----
iQExBAABCAAbBQJYUrFHFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
2+AIAIr20kSQV/nA5htGAeCTobVk3WUxY6bvjd9mIJDKPP19akNLyREW0G3KnfCr
yhx4aFRZG98fRu/6F8qieRosyN36lADDVYHelMFHMpcTOpE2aZGjaaOuNGxOEA9v
FmMPTX+K3+dzKyFP4l68R3+5JuQ1/AqLTioTWeLW8IDQ2OOVsjD8+0BuXrNKMJDY
o6U4Hk5U/vn+zHc6BmgBzloAXemBd7iJ1t5V3FRRGvm8yv3HU85Twc5ofGeYTWvB
J8PboEywRlIzxg0Kd8mxnMI5PgaKZSEc2ub8E7cY/CZ5PYpDE2xDA2hJmJgfYp00
1VW+DHRpRZfElsCcya6S6P4bs5Y=
=MGZ/
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"This release has a few updates:
- STM can hook into the function tracer
- Function filtering now supports more advance glob matching
- Ftrace selftests updates and added tests
- Softirq tag in traces now show only softirqs
- ARM nop added to non traced locations at compile time
- New trace_marker_raw file that allows for binary input
- Optimizations to the ring buffer
- Removal of kmap in trace_marker
- Wakeup and irqsoff tracers now adhere to the set_graph_notrace file
- Other various fixes and clean ups"
* tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (42 commits)
selftests: ftrace: Shift down default message verbosity
kprobes/trace: Fix kprobe selftest for newer gcc
tracing/kprobes: Add a helper method to return number of probe hits
tracing/rb: Init the CPU mask on allocation
tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results
tracing/fgraph: Have wakeup and irqsoff tracers ignore graph functions too
fgraph: Handle a case where a tracer ignores set_graph_notrace
tracing: Replace kmap with copy_from_user() in trace_marker writing
ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it
tracing: Allow benchmark to be enabled at early_initcall()
tracing: Have system enable return error if one of the events fail
tracing: Do not start benchmark on boot up
tracing: Have the reg function allow to fail
ring-buffer: Force rb_end_commit() and rb_set_commit_to_write() inline
ring-buffer: Froce rb_update_write_stamp() to be inlined
ring-buffer: Force inline of hotpath helper functions
tracing: Make __buffer_unlock_commit() always_inline
tracing: Make tracepoint_printk a static_key
ring-buffer: Always inline rb_event_data()
ring-buffer: Make rb_reserve_next_event() always inlined
...
This patch series contains several performance tuning patches regarding to the
IO submission flow, in addition to supporting new features such as a ZBC-base
drive and multiple devices.
It also includes some major bug fixes such as:
- checkpoint version control
- fdatasync-related roll-forward recovery routine
- memory boundary or null-pointer access in corner cases
- missing error cases
It has various minor clean-up patches as well.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJYTx44AAoJEEAUqH6CSFDSnAQP/jeYJq5Zd0bweEF5g00Ec1Qg
qNKQ57e9EHDRaDLBUmHHEaCEPRL0bw6SOUUWWqzGA07KcsIK+Yb/dGAyIcuV7WMl
PjntVbYm4yARDYBHGupdOCzFSkzr8gDalb+98jJnoGUonsftljhES9jedQ1NjAms
GFPHDNtirZM/r0bjKkYKjpqJ6FCxFxcGPfb/GtohDajIpohWfKZiemaXGTgtYR4d
iBVek16h+Hprz90ycZBY69uz0TdAwu/gb+htMVBrAdExHWvlFzgp35OIywiAB/YX
3QD/x4t2HqOBaNYiiOAY4ukVW/Yyqa/ZAzbm+m5B5CAcFYiWXMy+cMXUY9HJJ/K0
wdvi//Avtvgpp2PVZFn2pASx14vgMFylBzuNgKpP6MPdtWTEL33jT7VYs9Nuz45E
dgZ9IpiDt4DeTRuZ4mPO5iH7bVHPvAVV80bpXzirCCzDeNZ1EFFIQzXh/2UAmCxI
twPXGBIYul0aIl9JkWAyhCZSd3XDSqedpfPudknjhzM9Xb1H5X0QJco7f/UwsWXH
WxV6lHr1Q7UH96wJ7x/GAqj8ArOAASRV18+K51dqU+DWHnFPpBArJe39FVf8NGWs
Fz1ZmlWBQ0ZgzvLkGa80llhjalXIEy/JabMrpy6VrzQGxHdmW4cVxe4dJ3710WxX
VysJUcNMRKxMUTWOKsxp
=Boum
-----END PGP SIGNATURE-----
Merge tag 'for-f2fs-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"This patch series contains several performance tuning patches
regarding to the IO submission flow, in addition to supporting new
features such as a ZBC-base drive and multiple devices.
It also includes some major bug fixes such as:
- checkpoint version control
- fdatasync-related roll-forward recovery routine
- memory boundary or null-pointer access in corner cases
- missing error cases
It has various minor clean-up patches as well"
* tag 'for-f2fs-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (66 commits)
f2fs: fix a missing size change in f2fs_setattr
f2fs: fix to access nullified flush_cmd_control pointer
f2fs: free meta pages if sanity check for ckpt is failed
f2fs: detect wrong layout
f2fs: call sync_fs when f2fs is idle
Revert "f2fs: use percpu_counter for # of dirty pages in inode"
f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage
f2fs: do not activate auto_recovery for fallocated i_size
f2fs: fix to determine start_cp_addr by sbi->cur_cp_pack
f2fs: fix 32-bit build
f2fs: set ->owner for debugfs status file's file_operations
f2fs: fix incorrect free inode count in ->statfs
f2fs: drop duplicate header timer.h
f2fs: fix wrong AUTO_RECOVER condition
f2fs: do not recover i_size if it's valid
f2fs: fix fdatasync
f2fs: fix to account total free nid correctly
f2fs: fix an infinite loop when flush nodes in cp
f2fs: don't wait writeback for datas during checkpoint
f2fs: fix wrong written_valid_blocks counting
...
Pull block layer updates from Jens Axboe:
"This is the main block pull request this series. Contrary to previous
release, I've kept the core and driver changes in the same branch. We
always ended up having dependencies between the two for obvious
reasons, so makes more sense to keep them together. That said, I'll
probably try and keep more topical branches going forward, especially
for cycles that end up being as busy as this one.
The major parts of this pull request is:
- Improved support for O_DIRECT on block devices, with a small
private implementation instead of using the pig that is
fs/direct-io.c. From Christoph.
- Request completion tracking in a scalable fashion. This is utilized
by two components in this pull, the new hybrid polling and the
writeback queue throttling code.
- Improved support for polling with O_DIRECT, adding a hybrid mode
that combines pure polling with an initial sleep. From me.
- Support for automatic throttling of writeback queues on the block
side. This uses feedback from the device completion latencies to
scale the queue on the block side up or down. From me.
- Support from SMR drives in the block layer and for SD. From Hannes
and Shaun.
- Multi-connection support for nbd. From Josef.
- Cleanup of request and bio flags, so we have a clear split between
which are bio (or rq) private, and which ones are shared. From
Christoph.
- A set of patches from Bart, that improve how we handle queue
stopping and starting in blk-mq.
- Support for WRITE_ZEROES from Chaitanya.
- Lightnvm updates from Javier/Matias.
- Supoort for FC for the nvme-over-fabrics code. From James Smart.
- A bunch of fixes from a whole slew of people, too many to name
here"
* 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits)
blk-stat: fix a few cases of missing batch flushing
blk-flush: run the queue when inserting blk-mq flush
elevator: make the rqhash helpers exported
blk-mq: abstract out blk_mq_dispatch_rq_list() helper
blk-mq: add blk_mq_start_stopped_hw_queue()
block: improve handling of the magic discard payload
blk-wbt: don't throttle discard or write zeroes
nbd: use dev_err_ratelimited in io path
nbd: reset the setup task for NBD_CLEAR_SOCK
nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME
nvme-fabrics: Add target support for FC transport
nvme-fabrics: Add host support for FC transport
nvme-fabrics: Add FC transport LLDD api definitions
nvme-fabrics: Add FC transport FC-NVME definitions
nvme-fabrics: Add FC transport error codes to nvme.h
Add type 0x28 NVME type code to scsi fc headers
nvme-fabrics: patch target code in prep for FC transport support
nvme-fabrics: set sqe.command_id in core not transports
parser: add u64 number parser
nvme-rdma: align to generic ib_event logging helper
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYT3qqAAoJEAx081l5xIa+dLMP/2dqBybSAeWlPmAwVenIHRtS
KFNktISezFSY/LBcIP2mHkFJmjTKBMZFxWnyEJL9NmFUD1cS2WMyNnC1282h/+rD
+P8Bsmzmt/daV4UTFxVDpzlmVlavAyakNi6FnSQfAfmf+3PB1yzU3gn8ld9pU/if
h7KEp9fDn9eYZreTRfCUloI2yoVpD9d0DG3uaGDN/N0kGUnCC6TZT5ig5j2JO016
fYf/DqoYAk3ItWF9WK/uG7qJIGi37afCpQq+kbSSJk+p3HjJqu8JUe9jzqYdl7j9
26TGSY5o9WLhZkxDgbcCIJzcFJhMmXgMdhjil9lqaHmnNG5FPFU7g8DK1CZqbel9
m8+aRPn1EgxIahMgdl8NblW1pfO2Kco0tZmoP5vXx1uqhivd67h0hiQqp66WxOJd
i2yMLncaCEv8M161CVEgtzuI5a7nCfaZv7J9ArzbkD/huBwu51IZgTs7Dz4njgvz
VPB5FBTB/ZYteErUNoh6gjF0hLngWvvJSPvuzT+EFO7yypek0IJ28GTdbxYSP+jR
13697s5Itigf/D3KUdRRGsWRzyVVN9n+djkl//sy5ddL9eOlKSKEga4ujOUjTWaW
hTvAxpK9GmJS/Iun5jIP6f75zDbi+e8FWUeB/OI2lPtnApaSKdXBTPXsco2RnTEV
+G6XrH8IMEIsTxOk7hWU
=7s/c
-----END PGP SIGNATURE-----
Merge tag 'drm-for-v4.10' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
"This is the main pull request for drm for 4.10 kernel.
New drivers:
- ZTE VOU display driver (zxdrm)
- Amlogic Meson Graphic Controller GXBB/GXL/GXM SoCs (meson)
- MXSFB support (mxsfb)
Core:
- Format handling has been reworked
- Better atomic state debugging
- drm_mm leak debugging
- Atomic explicit fencing support
- fbdev helper ops
- Documentation updates
- MST fbcon fixes
Bridge:
- Silicon Image SiI8620 driver
Panel:
- Add support for new simple panels
i915:
- GVT Device model
- Better HDMI2.0 support on skylake
- More watermark fixes
- GPU idling rework for suspend/resume
- DP Audio workarounds
- Scheduler prep-work
- Opregion CADL handling
- GPU scheduler and priority boosting
amdgfx/radeon:
- Support for virtual devices
- New VM manager for non-contig VRAM buffers
- UVD powergating
- SI register header cleanup
- Cursor fixes
- Powermanagement fixes
nouveau:
- Powermangement reworks for better voltage/clock changes
- Atomic modesetting support
- Displayport Multistream (MST) support.
- GP102/104 hang and cursor fixes
- GP106 support
hisilicon:
- hibmc support (BMC chip for aarch64 servers)
armada:
- add tracing support for overlay change
- refactor plane support
- de-midlayer the driver
omapdrm:
- Timing code cleanups
rcar-du:
- R8A7792/R8A7796 support
- Misc fixes.
sunxi:
- A31 SoC display engine support
imx-drm:
- YUV format support
- Cleanup plane atomic update
mali-dp:
- Misc fixes
dw-hdmi:
- Add support for HDMI i2c master controller
tegra:
- IOMMU support fixes
- Error handling fixes
tda998x:
- Fix connector registration
- Improved robustness
- Fix infoframe/audio compliance
virtio:
- fix busid issues
- allocate more vbufs
qxl:
- misc fixes and cleanups.
vc4:
- Fragment shader threading
- ETC1 support
- VEC (tv-out) support
msm:
- A5XX GPU support
- Lots of atomic changes
tilcdc:
- Misc fixes and cleanups.
etnaviv:
- Fix dma-buf export path
- DRAW_INSTANCED support
- fix driver on i.MX6SX
exynos:
- HDMI refactoring
fsl-dcu:
- fbdev changes"
* tag 'drm-for-v4.10' of git://people.freedesktop.org/~airlied/linux: (1343 commits)
drm/nouveau/kms/nv50: fix atomic regression on original G80
drm/nouveau/bl: Do not register interface if Apple GMUX detected
drm/nouveau/bl: Assign different names to interfaces
drm/nouveau/bios/dp: fix handling of LevelEntryTableIndex on DP table 4.2
drm/nouveau/ltc: protect clearing of comptags with mutex
drm/nouveau/gr/gf100-: handle GPC/TPC/MPC trap
drm/nouveau/core: recognise GP106 chipset
drm/nouveau/ttm: wait for bo fence to signal before unmapping vmas
drm/nouveau/gr/gf100-: FECS intr handling is not relevant on proprietary ucode
drm/nouveau/gr/gf100-: properly ack all FECS error interrupts
drm/nouveau/fifo/gf100-: recover from host mmu faults
drm: Add fake controlD* symlinks for backwards compat
drm/vc4: Don't use drm_put_dev
drm/vc4: Document VEC DT binding
drm/vc4: Add support for the VEC (Video Encoder) IP
drm: Add TV connector states to drm_connector_state
drm: Turn DRM_MODE_SUBCONNECTOR_xx definitions into an enum
drm/vc4: Fix ->clock_select setting for the VEC encoder
drm/amdgpu/dce6: Set MASTER_UPDATE_MODE to 0 in resume_mc_access as well
drm/amdgpu: use pin rather than pin_restricted in a few cases
...
Pull timer updates from Thomas Gleixner:
"The time/timekeeping/timer folks deliver with this update:
- Fix a reintroduced signed/unsigned issue and cleanup the whole
signed/unsigned mess in the timekeeping core so this wont happen
accidentaly again.
- Add a new trace clock based on boot time
- Prevent injection of random sleep times when PM tracing abuses the
RTC for storage
- Make posix timers configurable for real tiny systems
- Add tracepoints for the alarm timer subsystem so timer based
suspend wakeups can be instrumented
- The usual pile of fixes and updates to core and drivers"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
timekeeping: Use mul_u64_u32_shr() instead of open coding it
timekeeping: Get rid of pointless typecasts
timekeeping: Make the conversion call chain consistently unsigned
timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion
alarmtimer: Add tracepoints for alarm timers
trace: Update documentation for mono, mono_raw and boot clock
trace: Add an option for boot clock as trace clock
timekeeping: Add a fast and NMI safe boot clock
timekeeping/clocksource_cyc2ns: Document intended range limitation
timekeeping: Ignore the bogus sleep time if pm_trace is enabled
selftests/timers: Fix spelling mistake "Asyncrhonous" -> "Asynchronous"
clocksource/drivers/bcm2835_timer: Unmap region obtained by of_iomap
clocksource/drivers/arm_arch_timer: Map frame with of_io_request_and_map()
arm64: dts: rockchip: Arch counter doesn't tick in system suspend
clocksource/drivers/arm_arch_timer: Don't assume clock runs in suspend
posix-timers: Make them configurable
posix_cpu_timers: Move the add_device_randomness() call to a proper place
timer: Move sys_alarm from timer.c to itimer.c
ptp_clock: Allow for it to be optional
Kconfig: Regenerate *.c_shipped files after previous changes
...
Pull RCU updates from Ingo Molnar:
"The main RCU changes in this development cycle were:
- Miscellaneous fixes, including a change to call_rcu()'s rcu_head
alignment check.
- Security-motivated list consistency checks, which are disabled by
default behind DEBUG_LIST.
- Torture-test updates.
- Documentation updates, yet again just simple changes"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
torture: Prevent jitter from delaying build-only runs
torture: Remove obsolete files from rcutorture .gitignore
rcu: Don't kick unless grace period or request
rcu: Make expedited grace periods recheck dyntick idle state
torture: Trace long read-side delays
rcu: RCU_TRACE enables event tracing as well as debugfs
rcu: Remove obsolete comment from __call_rcu()
rcu: Remove obsolete rcu_check_callbacks() header comment
rcu: Tighten up __call_rcu() rcu_head alignment check
Documentation/RCU: Fix minor typo
documentation: Present updated RCU guarantee
bug: Avoid Kconfig warning for BUG_ON_DATA_CORRUPTION
lib/Kconfig.debug: Fix typo in select statement
lkdtm: Add tests for struct list corruption
bug: Provide toggle for BUG on data corruption
list: Split list_del() debug checking into separate function
rculist: Consolidate DEBUG_LIST for list_add_rcu()
list: Split list_add() debug checking into separate function
Some tracepoints have a registration function that gets enabled when the
tracepoint is enabled. There may be cases that the registraction function
must fail (for example, can't allocate enough memory). In this case, the
tracepoint should also fail to register, otherwise the user would not know
why the tracepoint is not working.
Cc: David Howells <dhowells@redhat.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The extent-tree tracepoints all operate on the extent root, regardless of
which root is passed in. Let's just use the extent root objectid instead.
If it turns out that nobody is depending on the format of this tracepoint,
we can drop the root printing entirely.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are many functions that are always called with the same root
argument. Rather than passing the same root every time, we can
pass an fs_info pointer instead and have the function get the root
pointer itself.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Alarm timers are one of the mechanisms to wake up a system from suspend,
but there exist no tracepoints to analyse which process/thread armed an
alarmtimer.
Add tracepoints for start/cancel/expire of individual alarm timers and one
for tracing the suspend time decision when to resume the system.
The following trace excerpt illustrates the new mechanism:
Binder:3292_2-3304 [000] d..2 149.981123: alarmtimer_cancel:
alarmtimer:ffffffc1319a7800 type:REALTIME
expires:1325463120000000000 now:1325376810370370245
Binder:3292_2-3304 [000] d..2 149.981136: alarmtimer_start:
alarmtimer:ffffffc1319a7800 type:REALTIME
expires:1325376840000000000 now:1325376810370384591
Binder:3292_9-3953 [000] d..2 150.212991: alarmtimer_cancel:
alarmtimer:ffffffc1319a5a00 type:BOOTTIME
expires:179552000000 now:150154008122
Binder:3292_9-3953 [000] d..2 150.213006: alarmtimer_start:
alarmtimer:ffffffc1319a5a00 type:BOOTTIME
expires:179551000000 now:150154025622
system_server-3000 [002] ...1 162.701940: alarmtimer_suspend:
alarmtimer type:REALTIME expires:1325376840000000000
The wakeup time which is selected at suspend time allows to map it back to
the task arming the timer: Binder:3292_2.
[ tglx: Store alarm timer expiry time instead of some useless RTC relative
information, add proper type information for wakeups which are
handled via the clock_nanosleep/freezer and massage the changelog. ]
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Link: http://lkml.kernel.org/r/1480372524-15181-5-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Rename btrfs_qgroup_insert_dirty_extent(_nolock) to
btrfs_qgroup_trace_extent(_nolock), according to the new
reserve/trace/account naming schema.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-and-Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Make it possible to generate trace events for mdio read and write accesses.
Signed-off-by: Uwe Kleine-König <uwe@kleine-koenig.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similarly to the regular discard, trace zone reset events.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Although rcutorture will occasionally do a 50-millisecond grace-period
delay, these delays are quite rare. And rightly so, because otherwise
the read rate would be quite low. Thie means that it can be important
to identify whether or not a given run contained a long-delay read.
This commit therefore inserts a trace_rcu_torture_read() event to flag
runs containing long delays.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
We can hook this up to the block layer, to help throttle buffered
writes.
wbt registers a few trace points that can be used to track what is
happening in the system:
wbt_lat: 259:0: latency 2446318
wbt_stat: 259:0: rmean=2446318, rmin=2446318, rmax=2446318, rsamples=1,
wmean=518866, wmin=15522, wmax=5330353, wsamples=57
wbt_step: 259:0: step down: step=1, window=72727272, background=8, normal=16, max=32
This shows a sync issue event (wbt_lat) that exceeded it's time. wbt_stat
dumps the current read/write stats for that window, and wbt_step shows a
step down event where we now scale back writes. Each trace includes the
device, 259:0 in this case.
Signed-off-by: Jens Axboe <axboe@fb.com>
Remove the WRITE_* and READ_SYNC wrappers, and just use the flags
directly. Where applicable this also drops usage of the
bio_set_op_attrs wrapper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Noidle should be the default for writes as seen by all the compounds
definitions in fs.h using it. In fact only direct I/O really should
be using NODILE, so turn the whole flag around to get the defaults
right, which will make our life much easier especially onces the
WRITE_* defines go away.
This assumes all the existing "raw" users of REQ_SYNC for writes
want noidle behavior, which seems to be spot on from a quick audit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Now that we don't need the common flags to overflow outside the range
of a 32-bit type we can encode them the same way for both the bio and
request fields. This in addition allows us to place the operation
first (and make some room for more ops while we're at it) and to
stop having to shift around the operation values.
In addition this allows passing around only one value in the block layer
instead of two (and eventuall also in the file systems, but we can do
that later) and thus clean up a lot of code.
Last but not least this allows decreasing the size of the cmd_flags
field in struct request to 32-bits. Various functions passing this
value could also be updated, but I'd like to avoid the churn for now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Pull cgroup updates from Tejun Heo:
- tracepoints for basic cgroup management operations added
- kernfs and cgroup path formatting functions updated to behave in the
style of strlcpy()
- non-critical bug fixes
* 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
blkcg: Unlock blkcg_pol_mutex only once when cpd == NULL
cgroup: fix error handling regressions in proc_cgroup_show() and cgroup_release_agent()
cpuset: fix error handling regression in proc_cpuset_show()
cgroup: add tracepoints for basic operations
cgroup: make cgroup_path() and friends behave in the style of strlcpy()
kernfs: remove kernfs_path_len()
kernfs: make kernfs_path*() behave in the style of strlcpy()
kernfs: add dummy implementation of kernfs_path_from_node()
COMPACT_PARTIAL has historically meant that compaction returned after
doing some work without fully compacting a zone. It however didn't
distinguish if compaction terminated because it succeeded in creating
the requested high-order page. This has changed recently and now we
only return COMPACT_PARTIAL when compaction thinks it succeeded, or the
high-order watermark check in compaction_suitable() passes and no
compaction needs to be done.
So at this point we can make the return value clearer by renaming it to
COMPACT_SUCCESS. The next patch will remove some redundant tests for
success where compaction just returned COMPACT_SUCCESS.
Link: http://lkml.kernel.org/r/20160810091226.6709-4-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull HID updates from Jiri Kosina:
- Integrated Sensor Hub support (Cherrytrail+) from Srinivas Pandruvada
- Big cleanup of Wacom driver; namely it's now using devres, and the
standardized LED API so that libinput doesn't need to have root
access any more, with substantial amount of other cleanups
piggy-backing on top. All this from Benjamin Tissoires
- Report descriptor parsing would now ignore and out-of-range System
controls in case of the application actually being System Control.
This fixes quite some issues with several devices, and allows us to
remove a few ->report_fixup callbacks. From Benjamin Tissoires
- ... a lot of other assorted small fixes and device ID additions
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (76 commits)
HID: add missing \n to end of dev_warn messages
HID: alps: fix multitouch cursor issue
HID: hid-logitech: Documentation updates/corrections
HID: hid-logitech: Improve Wingman Formula Force GP support
HID: hid-logitech: Rewrite of descriptor for all DF wheels
HID: hid-logitech: Compute combined pedals value
HID: hid-logitech: Add combined pedal support Logitech wheels
HID: hid-logitech: Introduce control for combined pedals feature
HID: sony: Update copyright and add Dualshock 4 rate control note
HID: sony: Defer the initial USB Sixaxis output report
HID: sony: Relax duplicate checking for USB-only devices
Revert "HID: microsoft: fix invalid rdesc for 3k kbd"
HID: alps: fix error return code in alps_input_configured()
HID: alps: fix stick device not working after resume
HID: support for keyboard - Corsair STRAFE
HID: alps: Fix memory leak
HID: uclogic: Add support for UC-Logic TWHA60 v3
HID: uclogic: Override constant descriptors
HID: uclogic: Support UGTizer GP0610 partially
HID: uclogic: Add support for several more tablets
...
injection facility. With this, we could fix several corner cases. And, in order
to improve the performance, we set inline_dentry by default and enhance the
exisiting discard issue flow. In addition, we added f2fs_migrate_page for better
memory management.
= Enhancement =
- set inline_dentry by default
- improve discard issue flow
- add more fault injection cases in f2fs
- allow block preallocation for encrypted files
- introduce migrate_page callback function
- avoid truncating the next direct node block at every checkpoint
= Bug fixes =
- set page flag correctly between write_begin and write_end
- missing error handling cases detected by fault injection
- preallocate blocks regarding to 4KB alignement correctly
- dentry and filename handling of encryption
- lost xattrs of directories
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJX9sMhAAoJEEAUqH6CSFDSFhQQAIQ99GkcaPmSACHg7JNa9zG1
wb6eeKIDee+Jr4vu7yQ++T3Ih4lesl2ZLABVaP+IcXlsYWI2VUvlChczuwVSDQMg
ZiBIR2IwXVVY6Zpb0xuw8C/vmQAJjLZTBV33s+wgsYHaTDobYexVUjkCM+pekrzj
HBXrk7zx8NHUh41yr/kVQl6FY8KPC6bTtBH23UUp6Vuy1zMZDR/VjL440IyT5Ded
JRSBX0XSAC9He6n+kZ4S2kMc11kmqZYW7mE4SmiPDzAhGwUv4SmQ1871lK00EOUp
5EN1Lcy8M7kkl8en2zpZ002R/LDbzRTYjb1fjGJVR+s5Q3piGokxtwAMd0/a7k9v
wwZm64Bm4NMHBEK6uc/DPWFUmnUySrboTvOCDRunNogPGTjMJwnzAQmTcB/Hdpr5
oAJQwyAq7ZzkMk3xt0ifeNqy+78uiwfpPEnZDoWqU6zxa+vIyqpFDD+8wEPBO9qo
JLRocH0Yl7+ExJvi+2W9wMQq9DsxZWR+CwUc8pg68E+1oOEycJ3weAwg5XSVHoNr
59I2blZQU6P922sH2HVhp0n58xZfYrR7Z3NSsiSfKXeL4gN222dHHT1UfRUmY+A3
7EeuYm8EUecKV0fZimMcqCCrUXQpubT+qGZfI6NZhu3Qhno1Y8ApxqH8Ieypx7ol
YD5prZs2qqVKO5LjLV5o
=crpN
-----END PGP SIGNATURE-----
Merge tag 'for-f2fs-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've investigated how f2fs deals with errors given by
our fault injection facility. With this, we could fix several corner
cases. And, in order to improve the performance, we set inline_dentry
by default and enhance the exisiting discard issue flow. In addition,
we added f2fs_migrate_page for better memory management.
Enhancements:
- set inline_dentry by default
- improve discard issue flow
- add more fault injection cases in f2fs
- allow block preallocation for encrypted files
- introduce migrate_page callback function
- avoid truncating the next direct node block at every checkpoint
Bug fixes:
- set page flag correctly between write_begin and write_end
- missing error handling cases detected by fault injection
- preallocate blocks regarding to 4KB alignement correctly
- dentry and filename handling of encryption
- lost xattrs of directories"
* tag 'for-f2fs-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (69 commits)
f2fs: introduce update_ckpt_flags to clean up
f2fs: don't submit irrelevant page
f2fs: fix to commit bio cache after flushing node pages
f2fs: introduce get_checkpoint_version for cleanup
f2fs: remove dead variable
f2fs: remove redundant io plug
f2fs: support checkpoint error injection
f2fs: fix to recover old fault injection config in ->remount_fs
f2fs: do fault injection initialization in default_options
f2fs: remove redundant value definition
f2fs: support configuring fault injection per superblock
f2fs: adjust display format of segment bit
f2fs: remove dirty inode pages in error path
f2fs: do not unnecessarily null-terminate encrypted symlink data
f2fs: handle errors during recover_orphan_inodes
f2fs: avoid gc in cp_error case
f2fs: should put_page for summary page
f2fs: assign return value in f2fs_gc
f2fs: add customized migrate_page callback
f2fs: introduce cp_lock to protect updating of ckpt_flags
...
Pull networking updates from David Miller:
1) BBR TCP congestion control, from Neal Cardwell, Yuchung Cheng and
co. at Google. https://lwn.net/Articles/701165/
2) Do TCP Small Queues for retransmits, from Eric Dumazet.
3) Support collect_md mode for all IPV4 and IPV6 tunnels, from Alexei
Starovoitov.
4) Allow cls_flower to classify packets in ip tunnels, from Amir Vadai.
5) Support DSA tagging in older mv88e6xxx switches, from Andrew Lunn.
6) Support GMAC protocol in iwlwifi mwm, from Ayala Beker.
7) Support ndo_poll_controller in mlx5, from Calvin Owens.
8) Move VRF processing to an output hook and allow l3mdev to be
loopback, from David Ahern.
9) Support SOCK_DESTROY for UDP sockets. Also from David Ahern.
10) Congestion control in RXRPC, from David Howells.
11) Support geneve RX offload in ixgbe, from Emil Tantilov.
12) When hitting pressure for new incoming TCP data SKBs, perform a
partial rathern than a full purge of the OFO queue (which could be
huge). From Eric Dumazet.
13) Convert XFRM state and policy lookups to RCU, from Florian Westphal.
14) Support RX network flow classification to igb, from Gangfeng Huang.
15) Hardware offloading of eBPF in nfp driver, from Jakub Kicinski.
16) New skbmod packet action, from Jamal Hadi Salim.
17) Remove some inefficiencies in snmp proc output, from Jia He.
18) Add FIB notifications to properly propagate route changes to
hardware which is doing forwarding offloading. From Jiri Pirko.
19) New dsa driver for qca8xxx chips, from John Crispin.
20) Implement RFC7559 ipv6 router solicitation backoff, from Maciej
Żenczykowski.
21) Add L3 mode to ipvlan, from Mahesh Bandewar.
22) Support 802.1ad in mlx4, from Moshe Shemesh.
23) Support hardware LRO in mediatek driver, from Nelson Chang.
24) Add TC offloading to mlx5, from Or Gerlitz.
25) Convert various drivers to ethtool ksettings interfaces, from
Philippe Reynes.
26) TX max rate limiting for cxgb4, from Rahul Lakkireddy.
27) NAPI support for ath10k, from Rajkumar Manoharan.
28) Support XDP in mlx5, from Rana Shahout and Saeed Mahameed.
29) UDP replicast support in TIPC, from Richard Alpe.
30) Per-queue statistics for qed driver, from Sudarsana Reddy Kalluru.
31) Support BQL in thunderx driver, from Sunil Goutham.
32) TSO support in alx driver, from Tobias Regnery.
33) Add stream parser engine and use it in kcm.
34) Support async DHCP replies in ipconfig module, from Uwe
Kleine-König.
35) DSA port fast aging for mv88e6xxx driver, from Vivien Didelot.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1715 commits)
mlxsw: switchx2: Fix misuse of hard_header_len
mlxsw: spectrum: Fix misuse of hard_header_len
net/faraday: Stop NCSI device on shutdown
net/ncsi: Introduce ncsi_stop_dev()
net/ncsi: Rework the channel monitoring
net/ncsi: Allow to extend NCSI request properties
net/ncsi: Rework request index allocation
net/ncsi: Don't probe on the reserved channel ID (0x1f)
net/ncsi: Introduce NCSI_RESERVED_CHANNEL
net/ncsi: Avoid unused-value build warning from ia64-linux-gcc
net: Add netdev all_adj_list refcnt propagation to fix panic
net: phy: Add Edge-rate driver for Microsemi PHYs.
vmxnet3: Wake queue from reset work
i40e: avoid NULL pointer dereference and recursive errors on early PCI error
qed: Add RoCE ll2 & GSI support
qed: Add support for memory registeration verbs
qed: Add support for QP verbs
qed: PD,PKEY and CQ verb support
qed: Add support for RoCE hw init
qede: Add qedr framework
...
Pull CPU hotplug updates from Thomas Gleixner:
"Yet another batch of cpu hotplug core updates and conversions:
- Provide core infrastructure for multi instance drivers so the
drivers do not have to keep custom lists.
- Convert custom lists to the new infrastructure. The block-mq custom
list conversion comes through the block tree and makes the diffstat
tip over to more lines removed than added.
- Handle unbalanced hotplug enable/disable calls more gracefully.
- Remove the obsolete CPU_STARTING/DYING notifier support.
- Convert another batch of notifier users.
The relayfs changes which conflicted with the conversion have been
shipped to me by Andrew.
The remaining lot is targeted for 4.10 so that we finally can remove
the rest of the notifiers"
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
cpufreq: Fix up conversion to hotplug state machine
blk/mq: Reserve hotplug states for block multiqueue
x86/apic/uv: Convert to hotplug state machine
s390/mm/pfault: Convert to hotplug state machine
mips/loongson/smp: Convert to hotplug state machine
mips/octeon/smp: Convert to hotplug state machine
fault-injection/cpu: Convert to hotplug state machine
padata: Convert to hotplug state machine
cpufreq: Convert to hotplug state machine
ACPI/processor: Convert to hotplug state machine
virtio scsi: Convert to hotplug state machine
oprofile/timer: Convert to hotplug state machine
block/softirq: Convert to hotplug state machine
lib/irq_poll: Convert to hotplug state machine
x86/microcode: Convert to hotplug state machine
sh/SH-X3 SMP: Convert to hotplug state machine
ia64/mca: Convert to hotplug state machine
ARM/OMAP/wakeupgen: Convert to hotplug state machine
ARM/shmobile: Convert to hotplug state machine
arm64/FP/SIMD: Convert to hotplug state machine
...
Pull RAS updates from Ingo Molnar:
"The main changes were:
- Lots of enhancements for AMD SMCA (Scalable MCA
features/extensions) systems: extract, decode and print more
hardware error information and add matching support on the
injection/testing side as well. (Yazn Ghannam)
- Various MCE handling improvements on modern Intel Xeons. (Tony
Luck)
- Plus misc fixes and enhancements"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
x86/RAS/mce_amd_inj: Remove debugfs dir recursively on exit
x86/RAS/mce_amd_inj: Fix signed wrap around when decrementing index 'i'
x86/RAS/mce_amd_inj: Fix some W= warnings
x86/MCE/AMD, EDAC: Handle reserved bank 4 on Fam17h properly
x86/mce/AMD: Extract the error address on SMCA systems
x86/mce, EDAC/mce_amd: Print MCA_SYND and MCA_IPID during MCE on SMCA systems
x86/mce/AMD: Save MCA_IPID in MCE struct on SMCA systems
x86/mce/AMD: Ensure the deferred error interrupt is of type APIC on SMCA systems
x86/mce/AMD: Update sysfs bank names for SMCA systems
x86/mce/AMD, EDAC/mce_amd: Define and use tables for known SMCA IP types
EDAC/mce_amd: Use SMCA prefix for error descriptions arrays
EDAC/mce_amd: Add missing SMCA error descriptions
x86/mce/AMD: Read MSRs on the CPU allocating the threshold blocks
x86/RAS: Add syndrome support to mce_amd_inj
EDAC/mce_amd: Print syndrome register value on SMCA systems
x86/mce: Add support for new MCA_SYND register
x86/mce/AMD: Use msr_ops.misc() in allocate_threshold_blocks()
x86/mce: Drop X86_FEATURE_MCE_RECOVERY and the related model string test
x86/mce: Improve memcpy_mcsafe()
x86/mce: Add PCI quirks to identify Xeons with machine check recovery
...
Keep that call timeouts as ktimes rather than jiffies so that they can be
expressed as functions of RTT.
Signed-off-by: David Howells <dhowells@redhat.com>
In rxrpc_send_data_packet() make the loss-injection path return through the
same code as the transmission path so that the RTT determination is
initiated and any future timer shuffling will be done, despite the packet
having been binned.
Whilst we're at it:
(1) Add to the tx_data tracepoint an indication of whether or not we're
retransmitting a data packet.
(2) When we're deciding whether or not to request an ACK, rather than
checking if we're in fast-retransmit mode check instead if we're
retransmitting.
(3) Don't invoke the lose_skb tracepoint when losing a Tx packet as we're
not altering the sk_buff refcount nor are we just seeing it after
getting it off the Tx list.
(4) The rxrpc_skb_tx_lost note is then no longer used so remove it.
(5) rxrpc_lose_skb() no longer needs to deal with rxrpc_skb_tx_lost.
Signed-off-by: David Howells <dhowells@redhat.com>
This patch fix a spelling typo found in DocBook/tracepoint.xml.
It is because the file was created from comments in source,
so I have to fix the typo in include/trace/events/irq.h
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Implement RxRPC slow-start, which is similar to RFC 5681 for TCP. A
tracepoint is added to log the state of the congestion management algorithm
and the decisions it makes.
Notes:
(1) Since we send fixed-size DATA packets (apart from the final packet in
each phase), counters and calculations are in terms of packets rather
than bytes.
(2) The ACK packet carries the equivalent of TCP SACK.
(3) The FLIGHT_SIZE calculation in RFC 5681 doesn't seem particularly
suited to SACK of a small number of packets. It seems that, almost
inevitably, by the time three 'duplicate' ACKs have been seen, we have
narrowed the loss down to one or two missing packets, and the
FLIGHT_SIZE calculation ends up as 2.
(4) In rxrpc_resend(), if there was no data that apparently needed
retransmission, we transmit a PING ACK to ask the peer to tell us what
its Rx window state is.
Signed-off-by: David Howells <dhowells@redhat.com>
Add a tracepoint to log in rxrpc_resend() which packets will be
retransmitted. Note that if a positive ACK comes in whilst we have dropped
the lock to retransmit another packet, the actual retransmission may not
happen, though some of the effects will (such as altering the congestion
management).
Signed-off-by: David Howells <dhowells@redhat.com>
Add a tracepoint to log proposed ACKs, including whether the proposal is
used to update a pending ACK or is discarded in favour of an easlier,
higher priority ACK.
Whilst we're at it, get rid of the rxrpc_acks() function and access the
name array directly. We do, however, need to validate the ACK reason
number given to trace_rxrpc_rx_ack() to make sure we don't overrun the
array.
Signed-off-by: David Howells <dhowells@redhat.com>
Add a tracepoint to log transmission of DATA packets (including loss
injection).
Adjust the ACK transmission tracepoint to include the packet serial number
and to line this up with the DATA transmission display.
Signed-off-by: David Howells <dhowells@redhat.com>
Add a function to track the average RTT for a peer. Sources of RTT data
will be added in subsequent patches.
The RTT data will be useful in the future for determining resend timeouts
and for handling the slow-start part of the Rx protocol.
Also add a pair of tracepoints, one to log transmissions to elicit a
response for RTT purposes and one to log responses that contribute RTT
data.
Signed-off-by: David Howells <dhowells@redhat.com>
Improve sk_buff tracing within AF_RXRPC by the following means:
(1) Use an enum to note the event type rather than plain integers and use
an array of event names rather than a big multi ?: list.
(2) Distinguish Rx from Tx packets and account them separately. This
requires the call phase to be tracked so that we know what we might
find in rxtx_buffer[].
(3) Add a parameter to rxrpc_{new,see,get,free}_skb() to indicate the
event type.
(4) A pair of 'rotate' events are added to indicate packets that are about
to be rotated out of the Rx and Tx windows.
(5) A pair of 'lost' events are added, along with rxrpc_lose_skb() for
packet loss injection recording.
Signed-off-by: David Howells <dhowells@redhat.com>
Add a tracepoint to follow the insertion of a packet into the transmit
buffer, its transmission and its rotation out of the buffer.
Signed-off-by: David Howells <dhowells@redhat.com>
Add a pair of tracepoints, one to track rxrpc_connection struct ref
counting and the other to track the client connection cache state.
Signed-off-by: David Howells <dhowells@redhat.com>
Print a symbolic packet type name for each valid received packet in the
trace output, not just a number.
Signed-off-by: David Howells <dhowells@redhat.com>
Add io_boost percent to current pstate_sample tracepoint.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The MCA_IPID register uniquely identifies a bank's type and instance
on Scalable MCA systems. We should save the value of this register
in struct mce along with the other relevant error information. This
ensures that we can decode errors without relying on system software to
correlate the bank to the type.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1472680624-34221-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Syndrome information is no longer contained in MCA_STATUS for SMCA
systems but in a new register - MCA_SYND.
Add a synd field to struct mce to hold MCA_SYND register value. Add it
to the end of struct mce to maintain compatibility with old versions of
mcelog. Also, add it to the respective tracepoint.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1467633035-32080-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add two tracepoints:
(1) Record the RxRPC protocol header of packets retrieved from the UDP
socket by the data_ready handler.
(2) Record the outcome of the data_ready handler.
Signed-off-by: David Howells <dhowells@redhat.com>
Remove the sk_buff count from the rxrpc_call struct as it's less useful
once we stop queueing sk_buffs.
Signed-off-by: David Howells <dhowells@redhat.com>
Add a tracepoint for working out where local aborts happen. Each
tracepoint call is labelled with a 3-letter code so that they can be
distinguished - and the DATA sequence number is added too where available.
rxrpc_kernel_abort_call() also takes a 3-letter code so that AFS can
indicate the circumstances when it aborts a call.
Signed-off-by: David Howells <dhowells@redhat.com>
Improve the call tracking tracepoint by showing more differentiation
between some of the put and get events, including:
(1) Getting and putting refs for the socket call user ID tree.
(2) Getting and putting refs for queueing and failing to queue the call
processor work item.
Note that these aren't necessarily used in this patch, but will be taken
advantage of in future patches.
An enum is added for the event subtype numbers rather than coding them
directly as decimal numbers and a table of 3-letter strings is provided
rather than a sequence of ?: operators.
Signed-off-by: David Howells <dhowells@redhat.com>
This patch adds the ability for a given state to have multiple
instances. Until now all states have a single instance and the startup /
teardown callback use global variables.
A few drivers need to perform a the same callbacks on multiple
"instances". Currently we have three drivers in tree which all have a
global list which they iterate over. With multi instance they support
don't need their private list and the functionality has been moved into
core code. Plus we hold the hotplug lock in core so no cpus comes/goes
while instances are registered and we do rollback in error case :)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/1471024183-12666-3-git-send-email-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This layer is responsible for
- Enumerating over PCI bus
- Inform FW about host readiness
- Provide HW interface to transport layer for control and messages
- Interrupt handling and routing
Original-author: Daniel Drubin <daniel.drubin@intel.com>
Reviewed-and-tested-by: Ooi, Joyce <joyce.ooi@intel.com>
Tested-by: Grant Likely <grant.likely@secretlab.ca>
Tested-by: Rann Bar-On <rb6@duke.edu>
Tested-by: Atri Bhattacharya <badshah400@aim.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Debugging what goes wrong with cgroup setup can get hairy. Add
tracepoints for cgroup hierarchy mount, cgroup creation/destruction
and task migration operations for better visibility.
Signed-off-by: Tejun Heo <tj@kernel.org>
properly by the tracing user space tools. This was due to the
TRACE_DEFINE_ENUM() being set to a define, when it should have been set
to the enum itself. The define was of the MASK that used the BIT to shift.
The BIT was the enum and by adding that, everything gets converted nicely.
The MASK is still kept just in case it gets converted to an enum in the
future.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXqeA/AAoJEKKk/i67LK/8wLAH/0nD8L5pxtn+pZi3mQnWNwbn
qEBtfKK8cvnt0IWH2HlKmRKLAiIJfp8UrkGoPiLT7Nb83PlKaw3UT868t7eDmknu
b29SsZroMgvJ1MeeNH9Yzk7cK3/K213VO02P4ce8EWSELYCqFlxJDE3dhl52K9Lj
clJSoZbIGTLlx4pk6zZnPksTt3Z9WZXcVJITwxEiz/Cr+CKAWZpLoPPUaqJOmA4j
9oS6d++0DYZdz3cWCmYBBkphmc5IQkBNZWGMYLcAR+M+m5fsN+baWlP3Dhq4j7he
WknHwu4WDFMk6a2Kh0Ggi6yUWVUIkLNY2Z4QUF2gNmJT1g/FH5lZka4/kIxjvKw=
=rgBA
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Fix tick_stop tracepoint symbols for user export.
Luiz Capitulino noticed that the tick_stop tracepoint wasn't being
parsed properly by the tracing user space tools.
This was due to the TRACE_DEFINE_ENUM() being set to a define, when it
should have been set to the enum itself. The define was of the MASK
that used the BIT to shift. The BIT was the enum and by adding that,
everything gets converted nicely. The MASK is still kept just in case
it gets converted to an enum in the future"
* tag 'trace-v4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix tick_stop tracepoint symbols for user export
The symbols used in the tick_stop tracepoint were not being converted
properly into integers in the trace_stop format file. Instead we had this:
print fmt: "success=%d dependency=%s", REC->success,
__print_symbolic(REC->dependency, { 0, "NONE" },
{ (1 << TICK_DEP_BIT_POSIX_TIMER), "POSIX_TIMER" },
{ (1 << TICK_DEP_BIT_PERF_EVENTS), "PERF_EVENTS" },
{ (1 << TICK_DEP_BIT_SCHED), "SCHED" },
{ (1 << TICK_DEP_BIT_CLOCK_UNSTABLE), "CLOCK_UNSTABLE" })
User space tools have no idea how to parse "TICK_DEP_BIT_SCHED" or the other
symbols used to do the bit shifting. The reason is that the conversion was
done with using the TICK_DEP_MASK_* symbols which are just macros that
convert to the BIT shift itself (with the exception of NONE, which was
converted properly, because it doesn't use bits, and is defined as zero).
The TICK_DEP_BIT_* needs to be denoted by TRACE_DEFINE_ENUM() in order to
have this properly converted for user space tools to parse this event.
Cc: stable@vger.kernel.org
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Fixes: e6e6cc22e0 ("nohz: Use enum code for tick stop failure tracing message")
Reported-by: Luiz Capitulino <lcapitulino@redhat.com>
Tested-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Since commit 63a4cc2486, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.
No intended functional changes in this commit.
Signed-off-by: Jens Axboe <axboe@fb.com>
- New vsock device support in host and guest
- Platform IOMMU support in host and guest,
including compatibility quirks for legacy systems.
- Misc fixes and cleanups.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXofvbAAoJECgfDbjSjVRpUTIH/iEoK9h636tBayXy0PXkPby0
6fMaRFy6H1HgEttgDhJE8Pqg/ba3qaW9Em0fHyFq7Mp2waFHAZ8hAT8phC6TAK3c
CIBnfzyyuI8u3N9SnNOfelPVcwCBfuALuuTsXB/rwKbYQEVv+U5Rdt3Vyx9+lXkj
P005klz7PfqxFhQrrnj4Eh7VawtHwmMuLH8YoWpCZpM71dHPo6eL+3ftKwhH2boo
qK86uVprwba03Pewpm13vQnotemfVfUUkjXd4EJpG3dx7E0KZosuj0ZG9OV8mPGQ
Cl2gBdUhocdJgeUnAHmf6tumYi9KFlYfy6xLy44YMmN7FL3E9nQjaKZp25UKfiM=
=ztIm
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio/vhost updates from Michael Tsirkin:
- new vsock device support in host and guest
- platform IOMMU support in host and guest, including compatibility
quirks for legacy systems.
- misc fixes and cleanups.
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
VSOCK: Use kvfree()
vhost: split out vringh Kconfig
vhost: detect 32 bit integer wrap around
vhost: new device IOTLB API
vhost: drop vringh dependency
vhost: convert pre sorted vhost memory array to interval tree
vhost: introduce vhost memory accessors
VSOCK: Add Makefile and Kconfig
VSOCK: Introduce vhost_vsock.ko
VSOCK: Introduce virtio_transport.ko
VSOCK: Introduce virtio_vsock_common.ko
VSOCK: defer sock removal to transports
VSOCK: transport-specific vsock_transport functions
vhost: drop vringh dependency
vop: pull in vhost Kconfig
virtio: new feature to detect IOMMU device quirk
balloon: check the number of available pages in leak balloon
vhost: lockless enqueuing
vhost: simplify work flushing
Trond made a change to the server's tcp logic that allows a fast
client to better take advantage of high bandwidth networks, but
may increase the risk that a single client could starve other
clients; a new sunrpc.svc_rpc_per_connection_limit parameter
should help mitigate this in the (hopefully unlikely) event this
becomes a problem in practice.
Tom Haynes added a minimal flex-layout pnfs server, which is of
no use in production for now--don't build it unless you're doing
client testing or further server development.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXo7HNAAoJECebzXlCjuG+zqUP/RxO5jZjBhNI8/ayGdDW/Jnq
s0Fu6B+aNRV3GnugmIeI4tWNGnPyERNzFtjLKlnwaasz/oW4qBLqGbNUWC5xKARS
erODs0hM/1aCYWwNBEc5qXP2u23HrWVuQ+B5fg42ACyliKFGq5faDRmf6XGU/1kB
8unXGWPAiLiNZD/bWP91fYhThlLgpfHBFZ7M3G2IqmzWZTSELPzwp1bpRWt7yWQQ
z1oYtXToycbwz3yPVk3cXtaoqpjDUVZf2Guqgqi1BwEyEtYOSaYo1VHNsKDf4OId
QXQh64AqIK4uszpvtNhvsEaAECN7IiB+N4n2laFiQVmAf8Hfl3AnV/gKeD4lKmTj
TY6knnjZO/X88wn80MB7JR1H1WXvvzNIHwNR95qfub/lVKX+C+0AORRtYhi5F9ec
ixNs/z1ImLpYxAjiP/T5anD5xcX2S+LcSv7kRjhEufqNFtRAIqBZO9ZWbCdXAAyE
tcH9Cru4jeIlFO/y6O61EVrn9FFj2+0uu+7urefNRQ2Y9pmKeculJrLF6WO8WHms
4IzXMmjZK+358RVdX2Ji5Hw6rBDvfgP+LjB8Jn8CeIiNRONEjT+2/AYQcfk61aLb
INUbk6G6Vfd8iMO4aaRI9tmW+vKCOZa0IbnrNE1oHKp/AKBDr25i5YPSCsnl3r4Q
iR7rRe9FIkfqBpbfjVFv
=mo54
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.8' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Highlights:
- Trond made a change to the server's tcp logic that allows a fast
client to better take advantage of high bandwidth networks, but may
increase the risk that a single client could starve other clients;
a new sunrpc.svc_rpc_per_connection_limit parameter should help
mitigate this in the (hopefully unlikely) event this becomes a
problem in practice.
- Tom Haynes added a minimal flex-layout pnfs server, which is of no
use in production for now--don't build it unless you're doing
client testing or further server development"
* tag 'nfsd-4.8' of git://linux-nfs.org/~bfields/linux: (32 commits)
nfsd: remove some dead code in nfsd_create_locked()
nfsd: drop unnecessary MAY_EXEC check from create
nfsd: clean up bad-type check in nfsd_create_locked
nfsd: remove unnecessary positive-dentry check
nfsd: reorganize nfsd_create
nfsd: check d_can_lookup in fh_verify of directories
nfsd: remove redundant zero-length check from create
nfsd: Make creates return EEXIST instead of EACCES
SUNRPC: Detect immediate closure of accepted sockets
SUNRPC: accept() may return sockets that are still in SYN_RECV
nfsd: allow nfsd to advertise multiple layout types
nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock
nfsd/blocklayout: Make sure calculate signature/designator length aligned
xfs: abstract block export operations from nfsd layouts
SUNRPC: Remove unused callback xpo_adjust_wspace()
SUNRPC: Change TCP socket space reservation
SUNRPC: Add a server side per-connection limit
SUNRPC: Micro optimisation for svc_data_ready
SUNRPC: Call the default socket callbacks instead of open coding
SUNRPC: lock the socket while detaching it
...
Pull more btrfs updates from Chris Mason:
"This is part two of my btrfs pull, which is some cleanups and a batch
of fixes.
Most of the code here is from Jeff Mahoney, making the pointers we
pass around internally more consistent and less confusing overall. I
noticed a small problem right before I sent this out yesterday, so I
fixed it up and re-tested overnight"
* 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (40 commits)
Btrfs: fix __MAX_CSUM_ITEMS
btrfs: btrfs_abort_transaction, drop root parameter
btrfs: add btrfs_trans_handle->fs_info pointer
btrfs: btrfs_relocate_chunk pass extent_root to btrfs_end_transaction
btrfs: convert nodesize macros to static inlines
btrfs: introduce BTRFS_MAX_ITEM_SIZE
btrfs: cleanup, remove prototype for btrfs_find_root_ref
btrfs: copy_to_sk drop unused root parameter
btrfs: simpilify btrfs_subvol_inherit_props
btrfs: tests, use BTRFS_FS_STATE_DUMMY_FS_INFO instead of dummy root
btrfs: tests, require fs_info for root
btrfs: tests, move initialization into tests/
btrfs: btrfs_test_opt and friends should take a btrfs_fs_info
btrfs: prefix fsid to all trace events
btrfs: plumb fs_info into btrfs_work
btrfs: remove obsolete part of comment in statfs
btrfs: hide test-only member under ifdef
btrfs: Ratelimit "no csum found" info message
btrfs: Add ratelimit to btrfs printing
Btrfs: fix unexpected balance crash due to BUG_ON
...
VGIC implementation.
- s390: support for trapping software breakpoints, nested virtualization
(vSIE), the STHYI opcode, initial extensions for CPU model support.
- MIPS: support for MIPS64 hosts (32-bit guests only) and lots of cleanups,
preliminary to this and the upcoming support for hardware virtualization
extensions.
- x86: support for execute-only mappings in nested EPT; reduced vmexit
latency for TSC deadline timer (by about 30%) on Intel hosts; support for
more than 255 vCPUs.
- PPC: bugfixes.
The ugly bit is the conflicts. A couple of them are simple conflicts due
to 4.7 fixes, but most of them are with other trees. There was definitely
too much reliance on Acked-by here. Some conflicts are for KVM patches
where _I_ gave my Acked-by, but the worst are for this pull request's
patches that touch files outside arch/*/kvm. KVM submaintainers should
probably learn to synchronize better with arch maintainers, with the
latter providing topic branches whenever possible instead of Acked-by.
This is what we do with arch/x86. And I should learn to refuse pull
requests when linux-next sends scary signals, even if that means that
submaintainers have to rebase their branches.
Anyhow, here's the list:
- arch/x86/kvm/vmx.c: handle_pcommit and EXIT_REASON_PCOMMIT was removed
by the nvdimm tree. This tree adds handle_preemption_timer and
EXIT_REASON_PREEMPTION_TIMER at the same place. In general all mentions
of pcommit have to go.
There is also a conflict between a stable fix and this patch, where the
stable fix removed the vmx_create_pml_buffer function and its call.
- virt/kvm/kvm_main.c: kvm_cpu_notifier was removed by the hotplug tree.
This tree adds kvm_io_bus_get_dev at the same place.
- virt/kvm/arm/vgic.c: a few final bugfixes went into 4.7 before the
file was completely removed for 4.8.
- include/linux/irqchip/arm-gic-v3.h: this one is entirely our fault;
this is a change that should have gone in through the irqchip tree and
pulled by kvm-arm. I think I would have rejected this kvm-arm pull
request. The KVM version is the right one, except that it lacks
GITS_BASER_PAGES_SHIFT.
- arch/powerpc: what a mess. For the idle_book3s.S conflict, the KVM
tree is the right one; everything else is trivial. In this case I am
not quite sure what went wrong. The commit that is causing the mess
(fd7bacbca4, "KVM: PPC: Book3S HV: Fix TB corruption in guest exit
path on HMI interrupt", 2016-05-15) touches both arch/powerpc/kernel/
and arch/powerpc/kvm/. It's large, but at 396 insertions/5 deletions
I guessed that it wasn't really possible to split it and that the 5
deletions wouldn't conflict. That wasn't the case.
- arch/s390: also messy. First is hypfs_diag.c where the KVM tree
moved some code and the s390 tree patched it. You have to reapply the
relevant part of commits 6c22c98637, plus all of e030c1125e, to
arch/s390/kernel/diag.c. Or pick the linux-next conflict
resolution from http://marc.info/?l=kvm&m=146717549531603&w=2.
Second, there is a conflict in gmap.c between a stable fix and 4.8.
The KVM version here is the correct one.
I have pushed my resolution at refs/heads/merge-20160802 (commit
3d1f53419842) at git://git.kernel.org/pub/scm/virt/kvm/kvm.git.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJXoGm7AAoJEL/70l94x66DugQIAIj703ePAFepB/fCrKHkZZia
SGrsBdvAtNsOhr7FQ5qvvjLxiv/cv7CymeuJivX8H+4kuUHUllDzey+RPHYHD9X7
U6n1PdCH9F15a3IXc8tDjlDdOMNIKJixYuq1UyNZMU6NFwl00+TZf9JF8A2US65b
x/41W98ilL6nNBAsoDVmCLtPNWAqQ3lajaZELGfcqRQ9ZGKcAYOaLFXHv2YHf2XC
qIDMf+slBGSQ66UoATnYV2gAopNlWbZ7n0vO6tE2KyvhHZ1m399aBX1+k8la/0JI
69r+Tz7ZHUSFtmlmyByi5IAB87myy2WQHyAPwj+4vwJkDGPcl0TrupzbG7+T05Y=
=42ti
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
- ARM: GICv3 ITS emulation and various fixes. Removal of the
old VGIC implementation.
- s390: support for trapping software breakpoints, nested
virtualization (vSIE), the STHYI opcode, initial extensions
for CPU model support.
- MIPS: support for MIPS64 hosts (32-bit guests only) and lots
of cleanups, preliminary to this and the upcoming support for
hardware virtualization extensions.
- x86: support for execute-only mappings in nested EPT; reduced
vmexit latency for TSC deadline timer (by about 30%) on Intel
hosts; support for more than 255 vCPUs.
- PPC: bugfixes.
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits)
KVM: PPC: Introduce KVM_CAP_PPC_HTM
MIPS: Select HAVE_KVM for MIPS64_R{2,6}
MIPS: KVM: Reset CP0_PageMask during host TLB flush
MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX()
MIPS: KVM: Sign extend MFC0/RDHWR results
MIPS: KVM: Fix 64-bit big endian dynamic translation
MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase
MIPS: KVM: Use 64-bit CP0_EBase when appropriate
MIPS: KVM: Set CP0_Status.KX on MIPS64
MIPS: KVM: Make entry code MIPS64 friendly
MIPS: KVM: Use kmap instead of CKSEG0ADDR()
MIPS: KVM: Use virt_to_phys() to get commpage PFN
MIPS: Fix definition of KSEGX() for 64-bit
KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD
kvm: x86: nVMX: maintain internal copy of current VMCS
KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE
KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures
KVM: arm64: vgic-its: Simplify MAPI error handling
KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers
KVM: arm64: vgic-its: Turn device_id validation into generic ID validation
...
This module contains the common code and header files for the following
virtio_transporto and vhost_vsock kernel modules.
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Pull btrfs updates from Chris Mason:
"This pull is dedicated to Josef's enospc rework, which we've been
testing for a few releases now. It fixes some early enospc problems
and is dramatically faster.
This also includes an updated fix for the delalloc accounting that
happens after a fault in copy_from_user. My patch in v4.7 was almost
but not quite enough"
* 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix delalloc accounting after copy_from_user faults
Btrfs: avoid deadlocks during reservations in btrfs_truncate_block
Btrfs: use FLUSH_LIMIT for relocation in reserve_metadata_bytes
Btrfs: fill relocation block rsv after allocation
Btrfs: always use trans->block_rsv for orphans
Btrfs: change how we calculate the global block rsv
Btrfs: use root when checking need_async_flush
Btrfs: don't bother kicking async if there's nothing to reclaim
Btrfs: fix release reserved extents trace points
Btrfs: add fsid to some tracepoints
Btrfs: add tracepoints for flush events
Btrfs: fix delalloc reservation amount tracepoint
Btrfs: trace pinned extents
Btrfs: introduce ticketed enospc infrastructure
Btrfs: add tracepoint for adding block groups
Btrfs: warn_on for unaccounted spaces
Btrfs: change delayed reservation fallback behavior
Btrfs: always reserve metadata for delalloc extents
Btrfs: fix callers of btrfs_block_rsv_migrate
Btrfs: add bytes_readonly to the spaceinfo at once
changes are:
. The function pid code uses the event pid filtering logic
. [ku]probe events have access to current->comm
. trace_printk now has sample code
. PCI devices now trace physical addresses
. stack tracing has less unnessary functions traced
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXl+d2AAoJEKKk/i67LK/83QEH/RDJ0mcfFVsuEeOnZZrZXABm
4Rxk4FE5UAD+TSrVycwwzcbQab1iPK63mMdYvIBvaOiIC6/OJaEVM7jzZxnNGqmr
pj0H8bxwOr58pe5pfnP92ow5qTLLzsXraWNl5sRXhSSHON7CXpGVzkErB58GmMYd
8p6d9ziifQjo8X2O6XC9rGAvYLY5kEkVvyfuE1hI7muNTeOjyOT4EqpkNzxdBk+I
QkGZGsk3Xhc8II9nu8FPWkaD26TatGJoZtZmVWHOzfsb3HNzG4RXla+WVOQ5u1HV
noVyB1CJHhkO5CEBPdYIqwBWPQU4B9HfG4gVcUpDDVRxfzMpnEcKi1uwe+uDjfs=
=XFcv
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"This is mostly clean ups and small fixes. Some of the more visible
changes are:
- The function pid code uses the event pid filtering logic
- [ku]probe events have access to current->comm
- trace_printk now has sample code
- PCI devices now trace physical addresses
- stack tracing has less unnessary functions traced"
* tag 'trace-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
printk, tracing: Avoiding unneeded blank lines
tracing: Use __get_str() when manipulating strings
tracing, RAS: Cleanup on __get_str() usage
tracing: Use outer () on __get_str() definition
ftrace: Reduce size of function graph entries
tracing: Have HIST_TRIGGERS select TRACING
tracing: Using for_each_set_bit() to simplify trace_pid_write()
ftrace: Move toplevel init out of ftrace_init_tracefs()
tracing/function_graph: Fix filters for function_graph threshold
tracing: Skip more functions when doing stack tracing of events
tracing: Expose CPU physical addresses (resource values) for PCI devices
tracing: Show the preempt count of when the event was called
tracing: Add trace_printk sample code
tracing: Choose static tp_printk buffer by explicit nesting count
tracing: expose current->comm to [ku]probe events
ftrace: Have set_ftrace_pid use the bitmap like events do
tracing: Move pid_list write processing into its own function
tracing: Move the pid_list seq_file functions to be global
tracing: Move filtered_pid helper functions into trace.c
tracing: Make the pid filtering helper functions global
In the context of direct compaction, for some types of allocations we
would like the compaction to either succeed or definitely fail while
trying as hard as possible. Current async/sync_light migration mode is
insufficient, as there are heuristics such as caching scanner positions,
marking pageblocks as unsuitable or deferring compaction for a zone. At
least the final compaction attempt should be able to override these
heuristics.
To communicate how hard compaction should try, we replace migration mode
with a new enum compact_priority and change the relevant function
signatures. In compact_zone_order() where struct compact_control is
constructed, the priority is mapped to suitable control flags. This
patch itself has no functional change, as the current priority levels
are mapped back to the same migration modes as before. Expanding them
will be done next.
Note that !CONFIG_COMPACTION variant of try_to_compact_pages() is
removed, as the only caller exists under CONFIG_COMPACTION.
Link: http://lkml.kernel.org/r/20160721073614.24395-8-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After the previous patch, we can distinguish costly allocations that
should be really lightweight, such as THP page faults, with
__GFP_NORETRY. This means we don't need to recognize khugepaged
allocations via PF_KTHREAD anymore. We can also change THP page faults
in areas where madvise(MADV_HUGEPAGE) was used to try as hard as
khugepaged, as the process has indicated that it benefits from THP's and
is willing to pay some initial latency costs.
We can also make the flags handling less cryptic by distinguishing
GFP_TRANSHUGE_LIGHT (no reclaim at all, default mode in page fault) from
GFP_TRANSHUGE (only direct reclaim, khugepaged default). Adding
__GFP_NORETRY or __GFP_KSWAPD_RECLAIM is done where needed.
The patch effectively changes the current GFP_TRANSHUGE users as
follows:
* get_huge_zero_page() - the zero page lifetime should be relatively
long and it's shared by multiple users, so it's worth spending some
effort on it. We use GFP_TRANSHUGE, and __GFP_NORETRY is not added.
This also restores direct reclaim to this allocation, which was
unintentionally removed by commit e4a49efe4e7e ("mm: thp: set THP defrag
by default to madvise and add a stall-free defrag option")
* alloc_hugepage_khugepaged_gfpmask() - this is khugepaged, so latency
is not an issue. So if khugepaged "defrag" is enabled (the default), do
reclaim via GFP_TRANSHUGE without __GFP_NORETRY. We can remove the
PF_KTHREAD check from page alloc.
As a side-effect, khugepaged will now no longer check if the initial
compaction was deferred or contended. This is OK, as khugepaged sleep
times between collapsion attempts are long enough to prevent noticeable
disruption, so we should allow it to spend some effort.
* migrate_misplaced_transhuge_page() - already was masking out
__GFP_RECLAIM, so just convert to GFP_TRANSHUGE_LIGHT which is
equivalent.
* alloc_hugepage_direct_gfpmask() - vma's with VM_HUGEPAGE (via madvise)
are now allocating without __GFP_NORETRY. Other vma's keep using
__GFP_NORETRY if direct reclaim/compaction is at all allowed (by default
it's allowed only for madvised vma's). The rest is conversion to
GFP_TRANSHUGE(_LIGHT).
[mhocko@suse.com: suggested GFP_TRANSHUGE_LIGHT]
Link: http://lkml.kernel.org/r/20160721073614.24395-7-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is convenient when tracking down why the skip count is high because
it'll show what classzone kswapd woke up at and what zones are being
isolated.
Link: http://lkml.kernel.org/r/1467970510-21195-29-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As reclaim is now node-based, it follows that page write activity due to
page reclaim should also be accounted for on the node. For consistency,
also account page writes and page dirtying on a per-node basis.
After this patch, there are a few remaining zone counters that may appear
strange but are fine. NUMA stats are still per-zone as this is a
user-space interface that tools consume. NR_MLOCK, NR_SLAB_*,
NR_PAGETABLE, NR_KERNEL_STACK and NR_BOUNCE are all allocations that
potentially pin low memory and cannot trivially be reclaimed on demand.
This information is still useful for debugging a page allocation failure
warning.
Link: http://lkml.kernel.org/r/1467970510-21195-21-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are now a number of accounting oddities such as mapped file pages
being accounted for on the node while the total number of file pages are
accounted on the zone. This can be coped with to some extent but it's
confusing so this patch moves the relevant file-based accounted. Due to
throttling logic in the page allocator for reliable OOM detection, it is
still necessary to track dirty and writeback pages on a per-zone basis.
[mgorman@techsingularity.net: fix NR_ZONE_WRITE_PENDING accounting]
Link: http://lkml.kernel.org/r/1468404004-5085-5-git-send-email-mgorman@techsingularity.net
Link: http://lkml.kernel.org/r/1467970510-21195-20-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This moves the LRU lists from the zone to the node and related data such
as counters, tracing, congestion tracking and writeback tracking.
Unfortunately, due to reclaim and compaction retry logic, it is
necessary to account for the number of LRU pages on both zone and node
logic. Most reclaim logic is based on the node counters but the retry
logic uses the zone counters which do not distinguish inactive and
active sizes. It would be possible to leave the LRU counters on a
per-zone basis but it's a heavier calculation across multiple cache
lines that is much more frequent than the retry checks.
Other than the LRU counters, this is mostly a mechanical patch but note
that it introduces a number of anomalies. For example, the scans are
per-zone but using per-node counters. We also mark a node as congested
when a zone is congested. This causes weird problems that are fixed
later but is easier to review.
In the event that there is excessive overhead on 32-bit systems due to
the nodes being on LRU then there are two potential solutions
1. Long-term isolation of highmem pages when reclaim is lowmem
When pages are skipped, they are immediately added back onto the LRU
list. If lowmem reclaim persisted for long periods of time, the same
highmem pages get continually scanned. The idea would be that lowmem
keeps those pages on a separate list until a reclaim for highmem pages
arrives that splices the highmem pages back onto the LRU. It potentially
could be implemented similar to the UNEVICTABLE list.
That would reduce the skip rate with the potential corner case is that
highmem pages have to be scanned and reclaimed to free lowmem slab pages.
2. Linear scan lowmem pages if the initial LRU shrink fails
This will break LRU ordering but may be preferable and faster during
memory pressure than skipping LRU pages.
Link: http://lkml.kernel.org/r/1467970510-21195-4-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking updates from David Miller:
1) Unified UDP encapsulation offload methods for drivers, from
Alexander Duyck.
2) Make DSA binding more sane, from Andrew Lunn.
3) Support QCA9888 chips in ath10k, from Anilkumar Kolli.
4) Several workqueue usage cleanups, from Bhaktipriya Shridhar.
5) Add XDP (eXpress Data Path), essentially running BPF programs on RX
packets as soon as the device sees them, with the option to mirror
the packet on TX via the same interface. From Brenden Blanco and
others.
6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet.
7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli.
8) Simplify netlink conntrack entry layout, from Florian Westphal.
9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido
Schimmel, Yotam Gigi, and Jiri Pirko.
10) Add SKB array infrastructure and convert tun and macvtap over to it.
From Michael S Tsirkin and Jason Wang.
11) Support qdisc packet injection in pktgen, from John Fastabend.
12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy.
13) Add NV congestion control support to TCP, from Lawrence Brakmo.
14) Add GSO support to SCTP, from Marcelo Ricardo Leitner.
15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni.
16) Support MPLS over IPV4, from Simon Horman.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
xgene: Fix build warning with ACPI disabled.
be2net: perform temperature query in adapter regardless of its interface state
l2tp: Correctly return -EBADF from pppol2tp_getname.
net/mlx5_core/health: Remove deprecated create_singlethread_workqueue
net: ipmr/ip6mr: update lastuse on entry change
macsec: ensure rx_sa is set when validation is disabled
tipc: dump monitor attributes
tipc: add a function to get the bearer name
tipc: get monitor threshold for the cluster
tipc: make cluster size threshold for monitoring configurable
tipc: introduce constants for tipc address validation
net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update()
MAINTAINERS: xgene: Add driver and documentation path
Documentation: dtb: xgene: Add MDIO node
dtb: xgene: Add MDIO node
drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset
drivers: net: xgene: Use exported functions
drivers: net: xgene: Enable MDIO driver
drivers: net: xgene: Add backward compatibility
drivers: net: phy: xgene: Add MDIO driver
...
Merge updates from Andrew Morton:
- a few misc bits
- ocfs2
- most(?) of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits)
thp: fix comments of __pmd_trans_huge_lock()
cgroup: remove unnecessary 0 check from css_from_id()
cgroup: fix idr leak for the first cgroup root
mm: memcontrol: fix documentation for compound parameter
mm: memcontrol: remove BUG_ON in uncharge_list
mm: fix build warnings in <linux/compaction.h>
mm, thp: convert from optimistic swapin collapsing to conservative
mm, thp: fix comment inconsistency for swapin readahead functions
thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
shmem: split huge pages beyond i_size under memory pressure
thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
khugepaged: add support of collapse for tmpfs/shmem pages
shmem: make shmem_inode_info::lock irq-safe
khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
thp: extract khugepaged from mm/huge_memory.c
shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
shmem: add huge pages support
shmem: get_unmapped_area align huge page
shmem: prepare huge= mount option and sysfs knob
mm, rmap: account shmem thp pages
...
To detect whether khugepaged swapin is worthwhile, this patch checks the
amount of young pages. There should be at least half of HPAGE_PMD_NR to
swapin.
Link: http://lkml.kernel.org/r/1468109451-1615-1-git-send-email-ebru.akagunduz@gmail.com
Signed-off-by: Ebru Akagunduz <ebru.akagunduz@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch extends khugepaged to support collapse of tmpfs/shmem pages.
We share fair amount of infrastructure with anon-THP collapse.
Few design points:
- First we are looking for VMA which can be suitable for mapping huge
page;
- If the VMA maps shmem file, the rest scan/collapse operations
operates on page cache, not on page tables as in anon VMA case.
- khugepaged_scan_shmem() finds a range which is suitable for huge
page. The scan is lockless and shouldn't disturb system too much.
- once the candidate for collapse is found, collapse_shmem() attempts
to create a huge page:
+ scan over radix tree, making the range point to new huge page;
+ new huge page is not-uptodate, locked and freezed (refcount
is 0), so nobody can touch them until we say so.
+ we swap in pages during the scan. khugepaged_scan_shmem()
filters out ranges with more than khugepaged_max_ptes_swap
swapped out pages. It's HPAGE_PMD_NR/8 by default.
+ old pages are isolated, unmapped and put to local list in case
to be restored back if collapse failed.
- if collapse succeed, we retract pte page tables from VMAs where huge
pages mapping is possible. The huge page will be mapped as PMD on
next minor fault into the range.
Link: http://lkml.kernel.org/r/1466021202-61880-35-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch makes swapin readahead to improve thp collapse rate. When
khugepaged scanned pages, there can be a few of the pages in swap area.
With the patch THP can collapse 4kB pages into a THP when there are up
to max_ptes_swap swap ptes in a 2MB range.
The patch was tested with a test program that allocates 400B of memory,
writes to it, and then sleeps. I force the system to swap out all.
Afterwards, the test program touches the area by writing, it skips a
page in each 20 pages of the area.
Without the patch, system did not swap in readahead. THP rate was %65
of the program of the memory, it did not change over time.
With this patch, after 10 minutes of waiting khugepaged had collapsed
%99 of the program's memory.
[kirill.shutemov@linux.intel.com: trivial cleanup of exit path of the function]
[kirill.shutemov@linux.intel.com: __collapse_huge_page_swapin(): drop unused 'pte' parameter]
[kirill.shutemov@linux.intel.com: do not hold anon_vma lock during swap in]
Signed-off-by: Ebru Akagunduz <ebru.akagunduz@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce a new sysfs integer knob
/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_swap which makes
optimistic check for swapin readahead to increase thp collapse rate.
Before getting swapped out pages to memory, checks them and allows up to a
certain number. It also prints out using tracepoints amount of unmapped
ptes.
[vdavydov@parallels.com: fix scan not aborted on SCAN_EXCEED_SWAP_PTE]
[sfr@canb.auug.org.au: build fix]
Link: http://lkml.kernel.org/r/20160616154503.65806e12@canb.auug.org.au
Signed-off-by: Ebru Akagunduz <ebru.akagunduz@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The per-sb inode writeback list tracks inodes currently under writeback
to facilitate efficient sync processing. In particular, it ensures that
sync only needs to walk through a list of inodes that were cleaned by
the sync.
Add a couple tracepoints to help identify when inodes are added/removed
to and from the writeback lists. Piggyback off of the writeback
lazytime tracepoint template as it already tracks the relevant inode
information.
Link: http://lkml.kernel.org/r/1466594593-6757-3-git-send-email-bfoster@redhat.com
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
cc: Josef Bacik <jbacik@fb.com>
Cc: Holger Hoffstätte <holger.hoffstaette@applied-asynchrony.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull block driver updates from Jens Axboe:
"This branch also contains core changes. I've come to the conclusion
that from 4.9 and forward, I'll be doing just a single branch. We
often have dependencies between core and drivers, and it's hard to
always split them up appropriately without pulling core into drivers
when that happens.
That said, this contains:
- separate secure erase type for the core block layer, from
Christoph.
- set of discard fixes, from Christoph.
- bio shrinking fixes from Christoph, as a followup up to the
op/flags change in the core branch.
- map and append request fixes from Christoph.
- NVMeF (NVMe over Fabrics) code from Christoph. This is pretty
exciting!
- nvme-loop fixes from Arnd.
- removal of ->driverfs_dev from Dan, after providing a
device_add_disk() helper.
- bcache fixes from Bhaktipriya and Yijing.
- cdrom subchannel read fix from Vchannaiah.
- set of lightnvm updates from Wenwei, Matias, Johannes, and Javier.
- set of drbd updates and fixes from Fabian, Lars, and Philipp.
- mg_disk error path fix from Bart.
- user notification for failed device add for loop, from Minfei.
- NVMe in general:
+ NVMe delay quirk from Guilherme.
+ SR-IOV support and command retry limits from Keith.
+ fix for memory-less NUMA node from Masayoshi.
+ use UINT_MAX for discard sectors, from Minfei.
+ cancel IO fixes from Ming.
+ don't allocate unused major, from Neil.
+ error code fixup from Dan.
+ use constants for PSDT/FUSE from James.
+ variable init fix from Jay.
+ fabrics fixes from Ming, Sagi, and Wei.
+ various fixes"
* 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits)
nvme/pci: Provide SR-IOV support
nvme: initialize variable before logical OR'ing it
block: unexport various bio mapping helpers
scsi/osd: open code blk_make_request
target: stop using blk_make_request
block: simplify and export blk_rq_append_bio
block: ensure bios return from blk_get_request are properly initialized
virtio_blk: use blk_rq_map_kern
memstick: don't allow REQ_TYPE_BLOCK_PC requests
block: shrink bio size again
block: simplify and cleanup bvec pool handling
block: get rid of bio_rw and READA
block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
NVMe: don't allocate unused nvme_major
nvme: avoid crashes when node 0 is memoryless node.
nvme: Limit command retries
loop: Make user notify for adding loop device failed
nvme-loop: fix nvme-loop Kconfig dependencies
nvmet: fix return value check in nvmet_subsys_alloc()
...
Pull core block updates from Jens Axboe:
- the big change is the cleanup from Mike Christie, cleaning up our
uses of command types and modified flags. This is what will throw
some merge conflicts
- regression fix for the above for btrfs, from Vincent
- following up to the above, better packing of struct request from
Christoph
- a 2038 fix for blktrace from Arnd
- a few trivial/spelling fixes from Bart Van Assche
- a front merge check fix from Damien, which could cause issues on
SMR drives
- Atari partition fix from Gabriel
- convert cfq to highres timers, since jiffies isn't granular enough
for some devices these days. From Jan and Jeff
- CFQ priority boost fix idle classes, from me
- cleanup series from Ming, improving our bio/bvec iteration
- a direct issue fix for blk-mq from Omar
- fix for plug merging not involving the IO scheduler, like we do for
other types of merges. From Tahsin
- expose DAX type internally and through sysfs. From Toshi and Yigal
* 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
block: Fix front merge check
block: do not merge requests without consulting with io scheduler
block: Fix spelling in a source code comment
block: expose QUEUE_FLAG_DAX in sysfs
block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
Btrfs: fix comparison in __btrfs_map_block()
block: atari: Return early for unsupported sector size
Doc: block: Fix a typo in queue-sysfs.txt
cfq-iosched: Charge at least 1 jiffie instead of 1 ns
cfq-iosched: Fix regression in bonnie++ rewrite performance
cfq-iosched: Convert slice_resid from u64 to s64
block: Convert fifo_time from ulong to u64
blktrace: avoid using timespec
block/blk-cgroup.c: Declare local symbols static
block/bio-integrity.c: Add #include "blk.h"
block/partition-generic.c: Remove a set-but-not-used variable
block: bio: kill BIO_MAX_SIZE
cfq-iosched: temporarily boost queue priority for idle classes
block: drbd: avoid to use BIO_MAX_SIZE
block: bio: remove BIO_MAX_SECTORS
...
Pull libata updates from Tejun Heo:
"libata saw quite a bit of activities in this cycle:
- SMR drive support still being worked on
- bug fixes and improvements to misc SCSI command emulation
- some low level driver updates"
* 'for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: (39 commits)
libata-scsi: better style in ata_msense_*()
AHCI: Clear GHC.IS to prevent unexpectly asserting INTx
ata: sata_dwc_460ex: remove redundant dev_err call
ata: define ATA_PROT_* in terms of ATA_PROT_FLAG_*
libata: remove ATA_PROT_FLAG_DATA
libata: remove ata_is_nodata
ata: make lba_{28,48}_ok() use ATA_MAX_SECTORS{,_LBA48}
libata-scsi: minor cleanup for ata_scsi_zbc_out_xlat
libata-scsi: Fix ZBC management out command translation
libata-scsi: Fix translation of REPORT ZONES command
ata: Handle ATA NCQ NO-DATA commands correctly
libata-eh: decode all taskfile protocols
ata: fixup ATA_PROT_NODATA
libsas: use ata_is_ncq() and ata_has_dma() accessors
libata: use ata_is_ncq() accessors
libata: return boolean values from ata_is_*
libata-scsi: avoid repeated calculation of number of TRIM ranges
libata-scsi: reject WRITE SAME (16) with n_block that exceeds limit
libata-scsi: rename ata_msense_ctl_mode() to ata_msense_control()
libata-scsi: fix D_SENSE bit relection in control mode page
...
When using trace events to debug a problem, it's impossible to determine
which file system generated a particular event. This patch adds a
macro to prefix standard information to the head of a trace event.
The extent_state alloc/free events are all that's left without an
fs_info available.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
These two are confusing leftover of the old world order, combining
values of the REQ_OP_ and REQ_ namespaces. For callers that don't
special case we mostly just replace bi_rw with bio_data_dir or
op_is_write, except for the few cases where a switch over the REQ_OP_
values makes more sense. Any check for READA is replaced with an
explicit check for REQ_RAHEAD. Also remove the READA alias for
REQ_RAHEAD.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The recent change to tracepoint napi:napi_poll changed the order of
the parameters that perf scripts sees, the printk was correct. The
problem was that the new parameters (work and budget) were pushed
in front of dev_name.
The new parameters obviously need to be appended to keep backward
compatible.
Fixes: 1db19db7f5 ("net: tracepoint napi:napi_poll add work and budget")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Printk messages often finish with '\n' to cause a new line.
But as each tracepoint is already printed in a new line,
printk messages that finish with '\n' ends up adding a blank
line to the trace output. For example:
kworker/0:1-86 [000] d... 46.006949: console: [ 46.006946] usb 1-3: USB disconnect, device number 3
kworker/2:2-374 [002] d... 48.699342: console: [ 48.699339] usb 1-3: new high-speed USB device number 4 using ehci-pci
kworker/2:2-374 [002] d... 49.041450: console: [ 49.041448] usb 1-3: New USB device found, idVendor=5986, idProduct=0
To avoid unneeded blank lines, this patch checks if the printk
message finishes with '\n', if so, it cut is off the '\n' to
avoid blank lines.
In a patched kernel, the same messages are printed without
extra blank lines. For example:
kworker/0:4-185 [000] d... 23.641738: console: [ 23.641736] usb 1-3: USB disconnect, device number 3
kworker/0:4-185 [000] d... 24.918703: console: [ 24.918700] usb 1-3: new high-speed USB device number 4 using ehci-pci
kworker/0:4-185 [000] d... 25.228308: console: [ 25.228306] usb 1-3: New USB device found, idVendor=5986, idProduct=02d5
Link: http://lkml.kernel.org/r/c350fb2521baaf681a1b4d67981ca0e900108e8e.1467407618.git.bristot@redhat.com
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Use __get_str(str) rather than __get_dynamic_array(str) when
deadling with strings.
It is just a code cleanup, no changes on tracepoint ABI.
Link: http://lkml.kernel.org/r/ea260df91817411cca2a1f3db2abd88860094788.1467407618.git.bristot@redhat.com
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-nfs@vger.kernel.org
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
__get_str(str)'s definition includes a (char *) operator
overloading that is not protected with outer ().
This patch adds () around __get_str()'s definition, enabling
some code cleanup.
Link: http://lkml.kernel.org/r/20ac1a10c2ec4ccd23e4a8ef34101fb6e4157d37.1467407618.git.bristot@redhat.com
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add a new taskfile protocol ATA_PROT_NCQ_NODATA to handle
ATA NCQ NO-DATA commands correctly.
And fixup ata_scsi_zbc_out_xlat() to use it.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Including devlink.h on ARM and probably other 32-bit architectures results in
a harmless warning:
In file included from ../include/trace/define_trace.h:95:0,
from ../include/trace/events/devlink.h:51,
from ../net/core/devlink.c:30:
include/trace/events/devlink.h: In function 'trace_raw_output_devlink_hwmsg':
include/trace/events/devlink.h:42:12: error: format '%lu' expects argument of type 'long unsigned int', but argument 10 has type 'size_t {aka unsigned int}' [-Werror=format=]
The correct format string for 'size_t' is %zu, not %lu, this works on all
architectures.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: e5224f0fe2 ("devlink: add hardware messages tracing facility")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Turned on that driver->owner which is struct module is not available when
modules are disabled. Better to depend on a driver name which is
always available.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: e5224f0fe2 ("devlink: add hardware messages tracing facility")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
KVM_CAP_X2APIC_API is a capability for features related to x2APIC
enablement. KVM_X2APIC_API_32BIT_FORMAT feature can be enabled to
extend APIC ID in get/set ioctl and MSI addresses to 32 bits.
Both are needed to support x2APIC.
The feature has to be enableable and disabled by default, because
get/set ioctl shifted and truncated APIC ID to 8 bits by using a
non-standard protocol inspired by xAPIC and the change is not
backward-compatible.
Changes to MSI addresses follow the format used by interrupt remapping
unit. The upper address word, that used to be 0, contains upper 24 bits
of the LAPIC address in its upper 24 bits. Lower 8 bits are reserved as
0. Using the upper address word is not backward-compatible either as we
didn't check that userspace zeroed the word. Reserved bits are still
not explicitly checked, but non-zero data will affect LAPIC addresses,
which will cause a bug.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Dropping and/or deferring requests has an impact on performance. Let's
make sure we can trace those events.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Add a tracepoint to track when the processing of incoming RPC data gets
deferred due to out-of-space issues on the outgoing transport.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Define a tracepoint and allow user to trace messages going to and from
hardware associated with devlink instance.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
An important information for the napi_poll tracepoint is knowing
the work done (packets processed) by the napi_poll() call. Add
both the work done and budget, as they are related.
Handle trace_napi_poll() param change in dropwatch/drop_monitor
and in python perf script netdev-times.py in backward compat way,
as python fortunately supports optional parameter handling.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When tracing enospc problems on a box with multiple file systems mounted I need
to be able to differentiate between the two file systems. Most of the important
trace points I'm looking at already have an fsid, but the reserved extent trace
points do not, so add that to make it possible to figure out which trace point
belongs to which file system. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We want to track when we're triggering flushing from our reservation code and
what flushing is being done when we start flushing. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
I'm writing a tool to visualize the enospc system inside btrfs, I need this
tracepoint in order to keep track of the block groups in the system. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We're always tracing IPv4 or IPv6 addresses, so we can save a lot
of space on the ringbuffer by allocating the correct sockaddr size.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Fixes: 83a712e0af "sunrpc: add some tracepoints around ..."
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
To avoid confusion between REQ_OP_FLUSH, which is handled by
request_fn drivers, and upper layers requesting the block layer
perform a flush sequence along with possibly a WRITE, this patch
renames REQ_FLUSH to REQ_PREFLUSH.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This patch drops the compat definition of req_op where it matches
the rq_flag_bits definitions, and drops the related old and compat
code that allowed users to set either the op or flags for the operation.
We also then store the operation in the bi_rw/cmd_flags field similar
to how we used to store the bio ioprio where it sat in the upper bits
of the field.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Have blktrace use the req/bio op accessor to get the REQ_OP.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Separate the op from the rq_flag_bits and have f2fs
set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(kvm_stat had nothing to do with QEMU in the first place -- the tool
only interprets debugfs)
- expose per-vm statistics in debugfs and support them in kvm_stat
(KVM always collected per-vm statistics, but they were summarised into
global statistics)
x86:
- fix dynamic APICv (VMX was improperly configured and a guest could
access host's APIC MSRs, CVE-2016-4440)
- minor fixes
ARM changes from Christoffer Dall:
"This set of changes include the new vgic, which is a reimplementation
of our horribly broken legacy vgic implementation. The two
implementations will live side-by-side (with the new being the
configured default) for one kernel release and then we'll remove the
legacy one.
Also fixes a non-critical issue with virtual abort injection to
guests."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCAAGBQJXRz0KAAoJEED/6hsPKofosiMIAIHmRI+9I6VMNmQe5vrZKz9/
vt89QGxDJrFQwhEuZovenLEDaY6rMIJNguyvIbPhNuXNHIIPWbe6cO6OPwByqkdo
WI/IIqcAJN/Bpwt4/Y2977A5RwDOwWLkaDs0LrZCEKPCgeh9GWQf+EfyxkDJClhG
uIgbSAU+t+7b05K3c6NbiQT/qCzDTCdl6In6PI/DFSRRkXDaTcopjjp1PmMUSSsR
AM8LGhEzMer+hGKOH7H5TIbN+HFzAPjBuDGcoZt0/w9IpmmS5OMd3ZrZ320cohz8
zZQooRcFrT0ulAe+TilckmRMJdMZ69fyw3nzfqgAKEx+3PaqjKSY/tiEgqqDJHY=
=EEBK
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull second batch of KVM updates from Radim Krčmář:
"General:
- move kvm_stat tool from QEMU repo into tools/kvm/kvm_stat (kvm_stat
had nothing to do with QEMU in the first place -- the tool only
interprets debugfs)
- expose per-vm statistics in debugfs and support them in kvm_stat
(KVM always collected per-vm statistics, but they were summarised
into global statistics)
x86:
- fix dynamic APICv (VMX was improperly configured and a guest could
access host's APIC MSRs, CVE-2016-4440)
- minor fixes
ARM changes from Christoffer Dall:
- new vgic reimplementation of our horribly broken legacy vgic
implementation. The two implementations will live side-by-side
(with the new being the configured default) for one kernel release
and then we'll remove the legacy one.
- fix for a non-critical issue with virtual abort injection to guests"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (70 commits)
tools: kvm_stat: Add comments
tools: kvm_stat: Introduce pid monitoring
KVM: Create debugfs dir and stat files for each VM
MAINTAINERS: Add kvm tools
tools: kvm_stat: Powerpc related fixes
tools: Add kvm_stat man page
tools: Add kvm_stat vm monitor script
kvm:vmx: more complete state update on APICv on/off
KVM: SVM: Add more SVM_EXIT_REASONS
KVM: Unify traced vector format
svm: bitwise vs logical op typo
KVM: arm/arm64: vgic-new: Synchronize changes to active state
KVM: arm/arm64: vgic-new: enable build
KVM: arm/arm64: vgic-new: implement mapped IRQ handling
KVM: arm/arm64: vgic-new: Wire up irqfd injection
KVM: arm/arm64: vgic-new: Add vgic_v2/v3_enable
KVM: arm/arm64: vgic-new: vgic_init: implement map_resources
KVM: arm/arm64: vgic-new: vgic_init: implement vgic_init
KVM: arm/arm64: vgic-new: vgic_init: implement vgic_create
KVM: arm/arm64: vgic-new: vgic_init: implement kvm_vgic_hyp_init
...
Specifically the change from hex to decimal helps correlating events.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull libata ZAC support from Tejun Heo:
"This contains Zone ATA Command support for Shingled Magnetic Recording
devices.
In addition to sending the new commands down to the device, as ZAC
commands depend on getting a lot of responses from the device, piping
up responses is beefed up too. However, it doesn't involve changes to
libata core mechanism or its interaction with upper layers, so I'm not
expecting too many fallouts.
Kudos to Hannes for driving SMR support"
* 'for-4.7-zac' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: (28 commits)
libata: support host-aware and host-managed ZAC devices
libata: support device-managed ZAC devices
libata: NCQ encapsulation for ZAC MANAGEMENT OUT
libata: Implement ZBC OUT translation
libata: implement ZBC IN translation
libata: fixup ZAC device disabling
libata-scsi: Generate sense code for disabled devices
libata-trace: decode subcommands
libata: Check log page directory before accessing pages
libata: Add command definitions for NCQ Encapsulation for READ LOG DMA EXT
libata: Separate out ata_dev_config_ncq_send_recv()
libata/libsas: Define ATA_CMD_NCQ_NON_DATA
libsas: enable FPDMA SEND/RECEIVE
libata: do not attempt to retrieve sense code twice
libata-scsi: Set information sense field for invalid parameter
libata-scsi: set bit pointer for sense code information
libata-scsi: Set field pointer in sense code
scsi: add scsi_set_sense_field_pointer()
libata: Implement control mode page to select sense format
libata-scsi: generate correct ATA pass-through sense
...
- fs-specific prefix for fscrypto
- fault injection facility
- expose validity bitmaps for user to be aware of fragmentation
- fallocate/rm/preallocation speed up
- use percpu counters
Bug fixes
- some inline_dentry/inline_data bugs
- error handling for atomic/volatile/orphan inodes
- recover broken superblock
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXQPu4AAoJEEAUqH6CSFDSILgP/1dj6fmtytr8c+55EBqXUGpo
M7rS93JTxlmU5BduIo9psJsEquTQoVEmxB/Gjd+ZnI5R6Rp1c/REaP0ba374rEhZ
ecMQh5QqzM1gRNFXrQhWFEL/KtfRqt3T80zebQP7pxFUm/m9NGMLWT43RzQ8AAhr
Y3P0NLdvxA4HAnipKptkPJcGZQlWnL9W/MR+LgsXLXqLDwJHkVu61GcF0y2ibcJM
lEtIRmyH5tg7hP5c5LTw9pKQFHkIZt5cHFLjrJ1x8FSm2TXOcJPbjOrThvcb+NKK
e0O+6R0meH2eMpak+BTkZp2YbPPyXOb1N00j//lmbPjCoJPd4ZuiJ+oRoHUlTxtU
FhO67t0brlDbMFQVRFrtv8VA8M6by+DTAAP3Ffx62I/TJkphKANCSoyQRhlWtxxO
kRU69N7ipnRNxO4WCv40FjaQjSIElCKysP1POazRmAOQm7UFTGT9Nj37+eqUcEPJ
HZ7O61DEHNemb0SMlJ8WSClstt0yUU+2cjRfTPAr2Wd3V8gYbRs0QUg5M2GLgywR
EmiJfpkXse3f/nR8W6g1hganSOXA0AZX+EUibed6VkV3oYemdFbm8OymeEmLmWpM
y2F3D7dPLW7MCoTXJqtwFWdoDwI+zkH4rJaPGTq5TVBRWVU/njX8OvoB47pOvKV1
kccL7zv2PekE1hSDO5WF
=6MSp
-----END PGP SIGNATURE-----
Merge tag 'for-f2fs-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, as Ted pointed out, fscrypto allows one more key prefix
given by filesystem to resolve backward compatibility issues. Other
than that, we've fixed several error handling cases by introducing
a fault injection facility. We've also achieved performance
improvement in some workloads as well as a bunch of bug fixes.
Summary:
Enhancements:
- fs-specific prefix for fscrypto
- fault injection facility
- expose validity bitmaps for user to be aware of fragmentation
- fallocate/rm/preallocation speed up
- use percpu counters
Bug fixes:
- some inline_dentry/inline_data bugs
- error handling for atomic/volatile/orphan inodes
- recover broken superblock"
* tag 'for-f2fs-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (73 commits)
f2fs: fix to update dirty page count correctly
f2fs: flush pending bios right away when error occurs
f2fs: avoid ENOSPC fault in the recovery process
f2fs: make exit_f2fs_fs more clear
f2fs: use percpu_counter for total_valid_inode_count
f2fs: use percpu_counter for alloc_valid_block_count
f2fs: use percpu_counter for # of dirty pages in inode
f2fs: use percpu_counter for page counters
f2fs: use bio count instead of F2FS_WRITEBACK page count
f2fs: manipulate dirty file inodes when DATA_FLUSH is set
f2fs: add fault injection to sysfs
f2fs: no need inc dirty pages under inode lock
f2fs: fix incorrect error path handling in f2fs_move_rehashed_dirents
f2fs: fix i_current_depth during inline dentry conversion
f2fs: correct return value type of f2fs_fill_super
f2fs: fix deadlock when flush inline data
f2fs: avoid f2fs_bug_on during recovery
f2fs: show # of orphan inodes
f2fs: support in batch fzero in dnode page
f2fs: support in batch multi blocks preallocation
...
COMPACT_COMPLETE now means that compaction and free scanner met. This
is not very useful information if somebody just wants to use this
feedback and make any decisions based on that. The current caller might
be a poor guy who just happened to scan tiny portion of the zone and
that could be the reason no suitable pages were compacted. Make sure we
distinguish the full and partial zone walks.
Consumers should treat COMPACT_PARTIAL_SKIPPED as a potential success
and be optimistic in retrying.
The existing users of COMPACT_COMPLETE are conservatively changed to use
COMPACT_PARTIAL_SKIPPED as well but some of them should be probably
reconsidered and only defer the compaction only for COMPACT_COMPLETE
with the new semantic.
This patch shouldn't introduce any functional changes.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
try_to_compact_pages() can currently return COMPACT_SKIPPED even when
the compaction is defered for some zone just because zone DMA is skipped
in 99% of cases due to watermark checks. This makes COMPACT_DEFERRED
basically unusable for the page allocator as a feedback mechanism.
Make sure we distinguish those two states properly and switch their
ordering in the enum. This would mean that the COMPACT_SKIPPED will be
returned only when all eligible zones are skipped.
As a result COMPACT_DEFERRED handling for THP in __alloc_pages_slowpath
will be more precise and we would bail out rather than reclaim.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- x86: miscellaneous fixes, AVIC support (local APIC virtualization,
AMD version)
- s390: polling for interrupts after a VCPU goes to halted state is
now enabled for s390; use hardware provided information about facility
bits that do not need any hypervisor activity, and other fixes for
cpu models and facilities; improve perf output; floating interrupt
controller improvements.
- MIPS: miscellaneous fixes
- PPC: bugfixes only
- ARM: 16K page size support, generic firmware probing layer for
timer and GIC
Christoffer Dall (KVM-ARM maintainer) says:
"There are a few changes in this pull request touching things outside
KVM, but they should all carry the necessary acks and it made the
merge process much easier to do it this way."
though actually the irqchip maintainers' acks didn't make it into the
patches. Marc Zyngier, who is both irqchip and KVM-ARM maintainer,
later acked at http://mid.gmane.org/573351D1.4060303@arm.com
"more formally and for documentation purposes".
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJXPJjyAAoJEL/70l94x66DhioH/j4fwQ0FmfPSM9PArzaFHQdx
LNE3tU4+bobbsy1BJr4DiAaOUQn3DAgwUvGLWXdeLiOXtoWXBiFHKaxlqEsCA6iQ
xcTH1TgfxsVoqGQ6bT9X/2GCx70heYpcWG3f+zqBy7ZfFmQykLAC/HwOr52VQL8f
hUFi3YmTHcnorp0n5Xg+9r3+RBS4D/kTbtdn6+KCLnPJ0RcgNkI3/NcafTemoofw
Tkv8+YYFNvKV13qlIfVqxMa0GwWI3pP6YaNKhaS5XO8Pu16HuuF1JthJsUBDzwBa
RInp8R9MoXgsBYhLpz3jc9vWG7G9yDl5LehsD9KOUGOaFYJ7sQN+QZOusa6jFgA=
=llO5
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"Small release overall.
x86:
- miscellaneous fixes
- AVIC support (local APIC virtualization, AMD version)
s390:
- polling for interrupts after a VCPU goes to halted state is now
enabled for s390
- use hardware provided information about facility bits that do not
need any hypervisor activity, and other fixes for cpu models and
facilities
- improve perf output
- floating interrupt controller improvements.
MIPS:
- miscellaneous fixes
PPC:
- bugfixes only
ARM:
- 16K page size support
- generic firmware probing layer for timer and GIC
Christoffer Dall (KVM-ARM maintainer) says:
"There are a few changes in this pull request touching things
outside KVM, but they should all carry the necessary acks and it
made the merge process much easier to do it this way."
though actually the irqchip maintainers' acks didn't make it into the
patches. Marc Zyngier, who is both irqchip and KVM-ARM maintainer,
later acked at http://mid.gmane.org/573351D1.4060303@arm.com ('more
formally and for documentation purposes')"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (82 commits)
KVM: MTRR: remove MSR 0x2f8
KVM: x86: make hwapic_isr_update and hwapic_irr_update look the same
svm: Manage vcpu load/unload when enable AVIC
svm: Do not intercept CR8 when enable AVIC
svm: Do not expose x2APIC when enable AVIC
KVM: x86: Introducing kvm_x86_ops.apicv_post_state_restore
svm: Add VMEXIT handlers for AVIC
svm: Add interrupt injection via AVIC
KVM: x86: Detect and Initialize AVIC support
svm: Introduce new AVIC VMCB registers
KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick
KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking hooks
KVM: x86: Introducing kvm_x86_ops VM init/destroy hooks
KVM: x86: Rename kvm_apic_get_reg to kvm_lapic_get_reg
KVM: x86: Misc LAPIC changes to expose helper functions
KVM: shrink halt polling even more for invalid wakeups
KVM: s390: set halt polling to 80 microseconds
KVM: halt_polling: provide a way to qualify wakeups during poll
KVM: PPC: Book3S HV: Re-enable XICS fast path for irqfd-generated interrupts
kvm: Conditionally register IRQ bypass consumer
...
This patch includes the usual quota of driver updates (bnx2fc, mp3sas,
hpsa, ncr5380, lpfc, hisi_sas, snic, aacraid, megaraid_sas) there's
also a multiqueue update for scsi_debug, assorted bug fixes and a few
other minor updates (refactor of scsi_sg_pools into generic code, alua
and VPD updates, and struct timeval conversions).
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJXO8W0AAoJEDeqqVYsXL0MW24H/jGWwfjsDUiSsLwbLca6DWu8
ZCWZ7rSZ27CApwGPgZGpLvUg+vpW8Ykm2zdeBnlZ6ScXS+dT3uo/PHsnemsTextj
6glQNIOFY0Ja2GwkkN00M6IZQhTJ628cqJKIEJxC68lIw16wiOwjZaK68GMrusDO
Sl062rkuLR6Jb2T+YoT/sD8jQfWlSj2V9e9rqJoS/rIbS6B+hUipuybz2yQ2yK2u
XFc30yal9oVz1fHEoh2O8aqckW3/iskukVXVuZ0MQzT/lV/bm9I6AnWVHw7d0Yhp
ZELjXpjx5M2Z/d8k0Wvx1e25oL/ERwa96yLnTvRcqyF5Yt1EgAhT+jKvo4pnGr8=
=L6y/
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"First round of SCSI updates for the 4.6+ merge window.
This batch includes the usual quota of driver updates (bnx2fc, mp3sas,
hpsa, ncr5380, lpfc, hisi_sas, snic, aacraid, megaraid_sas). There's
also a multiqueue update for scsi_debug, assorted bug fixes and a few
other minor updates (refactor of scsi_sg_pools into generic code, alua
and VPD updates, and struct timeval conversions)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (138 commits)
mpt3sas: Used "synchronize_irq()"API to synchronize timed-out IO & TMs
mpt3sas: Set maximum transfer length per IO to 4MB for VDs
mpt3sas: Updating mpt3sas driver version to 13.100.00.00
mpt3sas: Fix initial Reference tag field for 4K PI drives.
mpt3sas: Handle active cable exception event
mpt3sas: Update MPI header to 2.00.42
Revert "lpfc: Delete unnecessary checks before the function call mempool_destroy"
eata_pio: missing break statement
hpsa: Fix type ZBC conditional checks
scsi_lib: Decode T10 vendor IDs
scsi_dh_alua: do not fail for unknown VPD identification
scsi_debug: use locally assigned naa
scsi_debug: uuid for lu name
scsi_debug: vpd and mode page work
scsi_debug: add multiple queue support
bfa: fix bfa_fcb_itnim_alloc() error handling
megaraid_sas: Downgrade two success messages to info
cxlflash: Fix to resolve dead-lock during EEH recovery
scsi_debug: rework resp_report_luns
scsi_debug: use pdt constants
...
Pull networking updates from David Miller:
"Highlights:
1) Support SPI based w5100 devices, from Akinobu Mita.
2) Partial Segmentation Offload, from Alexander Duyck.
3) Add GMAC4 support to stmmac driver, from Alexandre TORGUE.
4) Allow cls_flower stats offload, from Amir Vadai.
5) Implement bpf blinding, from Daniel Borkmann.
6) Optimize _ASYNC_ bit twiddling on sockets, unless the socket is
actually using FASYNC these atomics are superfluous. From Eric
Dumazet.
7) Run TCP more preemptibly, also from Eric Dumazet.
8) Support LED blinking, EEPROM dumps, and rxvlan offloading in mlx5e
driver, from Gal Pressman.
9) Allow creating ppp devices via rtnetlink, from Guillaume Nault.
10) Improve BPF usage documentation, from Jesper Dangaard Brouer.
11) Support tunneling offloads in qed, from Manish Chopra.
12) aRFS offloading in mlx5e, from Maor Gottlieb.
13) Add RFS and RPS support to SCTP protocol, from Marcelo Ricardo
Leitner.
14) Add MSG_EOR support to TCP, this allows controlling packet
coalescing on application record boundaries for more accurate
socket timestamp sampling. From Martin KaFai Lau.
15) Fix alignment of 64-bit netlink attributes across the board, from
Nicolas Dichtel.
16) Per-vlan stats in bridging, from Nikolay Aleksandrov.
17) Several conversions of drivers to ethtool ksettings, from Philippe
Reynes.
18) Checksum neutral ILA in ipv6, from Tom Herbert.
19) Factorize all of the various marvell dsa drivers into one, from
Vivien Didelot
20) Add VF support to qed driver, from Yuval Mintz"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1649 commits)
Revert "phy dp83867: Fix compilation with CONFIG_OF_MDIO=m"
Revert "phy dp83867: Make rgmii parameters optional"
r8169: default to 64-bit DMA on recent PCIe chips
phy dp83867: Make rgmii parameters optional
phy dp83867: Fix compilation with CONFIG_OF_MDIO=m
bpf: arm64: remove callee-save registers use for tmp registers
asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions
switchdev: pass pointer to fib_info instead of copy
net_sched: close another race condition in tcf_mirred_release()
tipc: fix nametable publication field in nl compat
drivers: net: Don't print unpopulated net_device name
qed: add support for dcbx.
ravb: Add missing free_irq() calls to ravb_close()
qed: Remove a stray tab
net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings
net: ethernet: fec-mpc52xx: use phydev from struct net_device
bpf, doc: fix typo on bpf_asm descriptions
stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set
net: ethernet: fs-enet: use phy_ethtool_{get|set}_link_ksettings
net: ethernet: fs-enet: use phydev from struct net_device
...
Pull parallel filesystem directory handling update from Al Viro.
This is the main parallel directory work by Al that makes the vfs layer
able to do lookup and readdir in parallel within a single directory.
That's a big change, since this used to be all protected by the
directory inode mutex.
The inode mutex is replaced by an rwsem, and serialization of lookups of
a single name is done by a "in-progress" dentry marker.
The series begins with xattr cleanups, and then ends with switching
filesystems over to actually doing the readdir in parallel (switching to
the "iterate_shared()" that only takes the read lock).
A more detailed explanation of the process from Al Viro:
"The xattr work starts with some acl fixes, then switches ->getxattr to
passing inode and dentry separately. This is the point where the
things start to get tricky - that got merged into the very beginning
of the -rc3-based #work.lookups, to allow untangling the
security_d_instantiate() mess. The xattr work itself proceeds to
switch a lot of filesystems to generic_...xattr(); no complications
there.
After that initial xattr work, the series then does the following:
- untangle security_d_instantiate()
- convert a bunch of open-coded lookup_one_len_unlocked() to calls of
that thing; one such place (in overlayfs) actually yields a trivial
conflict with overlayfs fixes later in the cycle - overlayfs ended
up switching to a variant of lookup_one_len_unlocked() sans the
permission checks. I would've dropped that commit (it gets
overridden on merge from #ovl-fixes in #for-next; proper resolution
is to use the variant in mainline fs/overlayfs/super.c), but I
didn't want to rebase the damn thing - it was fairly late in the
cycle...
- some filesystems had managed to depend on lookup/lookup exclusion
for *fs-internal* data structures in a way that would break if we
relaxed the VFS exclusion. Fixing hadn't been hard, fortunately.
- core of that series - parallel lookup machinery, replacing
->i_mutex with rwsem, making lookup_slow() take it only shared. At
that point lookups happen in parallel; lookups on the same name
wait for the in-progress one to be done with that dentry.
Surprisingly little code, at that - almost all of it is in
fs/dcache.c, with fs/namei.c changes limited to lookup_slow() -
making it use the new primitive and actually switching to locking
shared.
- parallel readdir stuff - first of all, we provide the exclusion on
per-struct file basis, same as we do for read() vs lseek() for
regular files. That takes care of most of the needed exclusion in
readdir/readdir; however, these guys are trickier than lookups, so
I went for switching them one-by-one. To do that, a new method
'->iterate_shared()' is added and filesystems are switched to it
as they are either confirmed to be OK with shared lock on directory
or fixed to be OK with that. I hope to kill the original method
come next cycle (almost all in-tree filesystems are switched
already), but it's still not quite finished.
- several filesystems get switched to parallel readdir. The
interesting part here is dealing with dcache preseeding by readdir;
that needs minor adjustment to be safe with directory locked only
shared.
Most of the filesystems doing that got switched to in those
commits. Important exception: NFS. Turns out that NFS folks, with
their, er, insistence on VFS getting the fuck out of the way of the
Smart Filesystem Code That Knows How And What To Lock(tm) have
grown the locking of their own. They had their own homegrown
rwsem, with lookup/readdir/atomic_open being *writers* (sillyunlink
is the reader there). Of course, with VFS getting the fuck out of
the way, as requested, the actual smarts of the smart filesystem
code etc. had become exposed...
- do_last/lookup_open/atomic_open cleanups. As the result, open()
without O_CREAT locks the directory only shared. Including the
->atomic_open() case. Backmerge from #for-linus in the middle of
that - atomic_open() fix got brought in.
- then comes NFS switch to saner (VFS-based ;-) locking, killing the
homegrown "lookup and readdir are writers" kinda-sorta rwsem. All
exclusion for sillyunlink/lookup is done by the parallel lookups
mechanism. Exclusion between sillyunlink and rmdir is a real rwsem
now - rmdir being the writer.
Result: NFS lookups/readdirs/O_CREAT-less opens happen in parallel
now.
- the rest of the series consists of switching a lot of filesystems
to parallel readdir; in a lot of cases ->llseek() gets simplified
as well. One backmerge in there (again, #for-linus - rockridge
fix)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (74 commits)
ext4: switch to ->iterate_shared()
hfs: switch to ->iterate_shared()
hfsplus: switch to ->iterate_shared()
hostfs: switch to ->iterate_shared()
hpfs: switch to ->iterate_shared()
hpfs: handle allocation failures in hpfs_add_pos()
gfs2: switch to ->iterate_shared()
f2fs: switch to ->iterate_shared()
afs: switch to ->iterate_shared()
befs: switch to ->iterate_shared()
befs: constify stuff a bit
isofs: switch to ->iterate_shared()
get_acorn_filename(): deobfuscate a bit
btrfs: switch to ->iterate_shared()
logfs: no need to lock directory in lseek
switch ecryptfs to ->iterate_shared
9p: switch to ->iterate_shared()
fat: switch to ->iterate_shared()
romfs, squashfs: switch to ->iterate_shared()
more trivial ->iterate_shared conversions
...
- Add TRACE support to be able to debug request flow
- Extend/improve reset support for (e)MMC
- Convert MMC pwrseq to platform device drivers
- Use IDA for indexes
- Some additional minor improvements
MMC host:
- sdhci: Re-factoring, clean-ups and improvements
- sdhci-acpi|pci: Use MMC_CAP_AGGRESSIVE_PM for Broxton
- omap/omap_hsmmc: Convert to use dma_request_chan()
- usdhi6rol0: Add support for UHS modes
- sh_mmcif: Update runtime PM support
- tmio: Wolfram Sang steps in as maintainer
- tmio: Add UHS-I mode support
- sh_mobile_sdhi: Add UHS-I mode support
- tmio/sdhi: Re-factoring, clean-ups and improvements
- dw_mmc: Re-factoring and clean-ups
- davinci: Convert to use dma_request_chan()
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXObOWAAoJEP4mhCVzWIwp3P4QANEb2z7NgUOw3DTti87r05gj
N3PNafNIn7EjrtuBenaVNZUkGnjnzVanNYEMArGFIdeVhJ/ZCCJY1fUOK161NUmO
1zRCOSufD9mRmhNtKb7jKu1YboXPRyKDaPVBTSSVrQPBw699tALGHCyAFvgFFKPD
RvTPHSvH1vTy0VF50/ao/vl1ci89nxp6PBG/5xe1rorBHH1CYaiPgWtniMqc09Ix
LiAO8Ox7fNd4WgK1tO56xJgEN2WA+Pbqy/7UabO+OjXoAMbPmO/l8vP0/9MqlBaX
WZyDVwusQ9VhyDMsQ6tpZa6k8G3u3LOeolZWHKQqHpJYbNuwP+szh4gdJRb838CC
AIz9UWC35ERn7yYD0aL5ok0TQhf4NJhJZibbGT2zNtnUVaSJnrJsqNtQOtEVLI9v
KxzSiKsAAC0fGpyvze3/yU4JXc1yJd8EXm1iakF5KYBimC+wzVRqQmuDUPrLjTG5
iypctu+yqb2OXmKbedsCruJ7nnLYAcGFKAaUSvCxn7AO4e44YEU7VIeWdC+NO6+s
vf9HNfKwiorw2mkYNcfnJgTjzqXhimOp+94WAOUBMhi1w+OZ1TUlSriTyBbK3s1G
rb4I37T7oLZIpDitfvmra9ORqNyUr0AlG3728BScN/Rc3731uEIBRd11h32hUoXk
b8a9ORVfHZHMrv5+5T0N
=89kT
-----END PGP SIGNATURE-----
Merge tag 'mmc-v4.7' of git://git.linaro.org/people/ulf.hansson/mmc
Pull MMC updates from Ulf Hansson:
"MMC core:
- Add TRACE support to be able to debug request flow
- Extend/improve reset support for (e)MMC
- Convert MMC pwrseq to platform device drivers
- Use IDA for indexes
- Some additional minor improvements
MMC host:
- sdhci: Re-factoring, clean-ups and improvements
- sdhci-acpi|pci: Use MMC_CAP_AGGRESSIVE_PM for Broxton
- omap/omap_hsmmc: Convert to use dma_request_chan()
- usdhi6rol0: Add support for UHS modes
- sh_mmcif: Update runtime PM support
- tmio: Wolfram Sang steps in as maintainer
- tmio: Add UHS-I mode support
- sh_mobile_sdhi: Add UHS-I mode support
- tmio/sdhi: Re-factoring, clean-ups and improvements
- dw_mmc: Re-factoring and clean-ups
- davinci: Convert to use dma_request_chan()"
* tag 'mmc-v4.7' of git://git.linaro.org/people/ulf.hansson/mmc: (99 commits)
mmc: mmc: Fix partition switch timeout for some eMMCs
mmc: sh_mobile_sdhi: enable SDIO IRQs for RCar Gen3
mmc: sdio: fall back to SDIO 1.0 for broken 1.1 cards
mmc: sdhci-st: correct name of sd-uhs-sdr50 property
MAINTAINERS: update entry for TMIO MMC driver
mmc: block: improve logging of handling emmc timeouts
mmc: sdhci: removed unneeded function wrappers
mmc: core: remove the invalid message in mmc_select_timing
mmc: core: fix using wrong io voltage if mmc_select_hs200 fails
mmc: sdhci-of-arasan: fix set_clock when a phy is supported
mmc: omap: Use dma_request_chan() for requesting DMA channel
mmc: mmc: Attempt to flush cache before reset
mmc: sh_mobile_sdhi: check return value when changing clk
mmc: sh_mobile_sdhi: only change the clock on RCar Gen2+
mmc: tmio/sdhi: introduce flag for RCar 2+ specific features
mmc: sh_mobile_sdhi: make clk_update function more compact
mmc: omap_hsmmc: Use dma_request_chan() for requesting DMA channel
mmc: sdhci-of-at91: add presets setup
mmc: usdhi6rol0: add pinctrl to set pin drive strength
mmc: usdhi6rol0: add support for UHS modes
...
Some wakeups should not be considered a sucessful poll. For example on
s390 I/O interrupts are usually floating, which means that _ALL_ CPUs
would be considered runnable - letting all vCPUs poll all the time for
transactional like workload, even if one vCPU would be enough.
This can result in huge CPU usage for large guests.
This patch lets architectures provide a way to qualify wakeups if they
should be considered a good/bad wakeups in regard to polls.
For s390 the implementation will fence of halt polling for anything but
known good, single vCPU events. The s390 implementation for floating
interrupts does a wakeup for one vCPU, but the interrupt will be delivered
by whatever CPU checks first for a pending interrupt. We prefer the
woken up CPU by marking the poll of this CPU as "good" poll.
This code will also mark several other wakeup reasons like IPI or
expired timers as "good". This will of course also mark some events as
not sucessful. As KVM on z runs always as a 2nd level hypervisor,
we prefer to not poll, unless we are really sure, though.
This patch successfully limits the CPU usage for cases like uperf 1byte
transactional ping pong workload or wakeup heavy workload like OLTP
while still providing a proper speedup.
This also introduced a new vcpu stat "halt_poll_no_tuning" that marks
wakeups that are considered not good for polling.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version)
Cc: David Matlack <dmatlack@google.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
[Rename config symbol. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch introduces reserve_new_blocks to make preallocation of multi
blocks as in batch operation, so it can avoid lots of redundant
operation, result in better performance.
In virtual machine, with rotational device:
time fallocate -l 32G /mnt/f2fs/file
Before:
real 0m4.584s
user 0m0.000s
sys 0m4.580s
After:
real 0m0.292s
user 0m0.000s
sys 0m0.272s
In x86, with SSD:
time fallocate -l 500G $MNT/testfile
Before : 24.758 s
After : 1.604 s
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix bugs and add performance numbers measured in x86.]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
ZAC drives implement a 'ZAC Management Out' command template,
which maps onto the ZBC OUT command.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
ZAC drives implement a 'ZAC Management In' command template,
which maps onto the ZBC IN command.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Some commands like FPDMA RECEIVE or NCQ NON DATA can encapsulate
other commands to NCQ transport. So decode the subcmds, too.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Define the NCQ NON DATA command and update libsas to handle it
correctly.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
This patch provides some tracepoints for the lifecycle of a mmc request
from starting to completion to help with performance analysis of MMC
subsystem.
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Pull RCU updates from Paul E. McKenney:
* Documentation updates, including fixes to the design-level
requirements documentation and a fixed version of the design-level
data-structure documentation. These fixes include removing
cartoons and getting rid of the html/htmlx duplication.
* Further improvements to the new-age expedited grace periods.
* Miscellaneous fixes.
* Torture-test changes, including a new rcuperf module for measuring
RCU grace-period performance and scalability, which is useful for
the expedited-grace-period changes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
move trace_call_bpf() into helper function to minimize the size
of perf_trace_*() tracepoint handlers.
text data bss dec hex filename
10541679 5526646 2945024 19013349 1221ee5 vmlinux_before
10509422 5526646 2945024 18981092 121a0e4 vmlinux_after
It may seem that perf_fetch_caller_regs() can also be moved,
but that is incorrect, since ip/sp will be wrong.
bpf+tracepoint performance is not affected, since
perf_swevent_put_recursion_context() is now inlined.
export_symbol_gpl can also be dropped.
No measurable change in normal perf tracepoints.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new trace functions for ZBC_IN and ZBC_OUT.
Reviewed-by: Doug Gilbert <dgilbert@interlog.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi_opcode_name() is displaying the opcode, not the service
action.
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull btrfs fixes from Chris Mason:
"These are bug fixes, including a really old fsync bug, and a few trace
points to help us track down problems in the quota code"
* 'for-linus-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix file/data loss caused by fsync after rename and new inode
btrfs: Reset IO error counters before start of device replacing
btrfs: Add qgroup tracing
Btrfs: don't use src fd for printk
btrfs: fallback to vmalloc in btrfs_compare_tree
btrfs: handle non-fatal errors in btrfs_qgroup_inherit()
btrfs: Output more info for enospc_debug mount option
Btrfs: fix invalid reference in replace_path
Btrfs: Improve FL_KEEP_SIZE handling in fallocate
introduce BPF_PROG_TYPE_TRACEPOINT program type and allow it to be attached
to the perf tracepoint handler, which will copy the arguments into
the per-cpu buffer and pass it to the bpf program as its first argument.
The layout of the fields can be discovered by doing
'cat /sys/kernel/debug/tracing/events/sched/sched_switch/format'
prior to the compilation of the program with exception that first 8 bytes
are reserved and not accessible to the program. This area is used to store
the pointer to 'struct pt_regs' which some of the bpf helpers will use:
+---------+
| 8 bytes | hidden 'struct pt_regs *' (inaccessible to bpf program)
+---------+
| N bytes | static tracepoint fields defined in tracepoint/format (bpf readonly)
+---------+
| dynamic | __dynamic_array bytes of tracepoint (inaccessible to bpf yet)
+---------+
Not that all of the fields are already dumped to user space via perf ring buffer
and broken application access it directly without consulting tracepoint/format.
Same rule applies here: static tracepoint fields should only be accessed
in a format defined in tracepoint/format. The order of fields and
field sizes are not an ABI.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
split allows to move expensive update of 'struct trace_entry' to later phase.
Repurpose unused 1st argument of perf_tp_event() to indicate event type.
While splitting use temp variable 'rctx' instead of '*rctx' to avoid
unnecessary loads done by the compiler due to -fno-strict-aliasing
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
now all calls to perf_trace_buf_submit() pass 0 as 4th
argument which will be repurposed in the next patch which will
change the meaning of 1st arg of perf_tp_event() to event_type
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds tracepoints to the qgroup code on both the reporting side
(insert_dirty_extents) and the accounting side. Taken together it allows us
to see what qgroup operations have happened, and what their result was.
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Page isolation has not failed if the fin pfn extends beyond the end pfn
and test_pages_isolated checks this correctly. Fix the tracepoint to
report the same result as the actual check function.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current mutex-based funnel-locking approach used by expedited grace
periods is subject to severe unfairness. The problem arises when a
few tasks, making a path from leaves to root, all wake up before other
tasks do. A new task can then follow this path all the way to the root,
which needlessly delays tasks whose grace period is done, but who do
not happen to acquire the lock quickly enough.
This commit avoids this problem by maintaining per-rcu_node wait queues,
along with a per-rcu_node counter that tracks the latest grace period
sought by an earlier task to visit this node. If that grace period
would satisfy the current task, instead of proceeding up the tree,
it waits on the current rcu_node structure using a pair of wait queues
provided for that purpose. This decouples awakening of old tasks from
the arrival of new tasks.
If the wakeups prove to be a bottleneck, additional kthreads can be
brought to bear for that purpose.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Some visible changes:
A new flag was added to distinguish traces done in NMI context.
Preempt tracer now shows functions where preemption is disabled but
interrupts are still enabled.
Other notes:
Updates were done to function tracing to allow better performance
with perf.
Infrastructure code has been added to allow for a new histogram
feature for recording live trace event histograms that can be
configured by simple user commands. The feature itself was just
finished, but needs a round in linux-next before being pulled.
This only includes some infrastructure changes that will be needed.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJW8/WPAAoJEKKk/i67LK/8wrAH/j2gU9ZfjVxTu8068TBGWRJP
yvvzq0cK5evB3dsVuUmKKRfU52nSv4J1WcFF569X0RulSLylR0dHlcxFJMn4kkgR
bm0AHRrqOf87ub3VimcpG146iVQij37l5A0SRoFbvSPLQx1KUW18v99x41Ji8dv6
oWXRc6/YhdzEE7l0nUsVjmScQ4b2emsems3cxZzXOY+nRJsiim6i+VaDeatdyey1
csLVqtRCs+x62TVtxG3+GhcLdRoPRbnHAGzrKDFIn1SrQaRXCc54wN5d2hWxjgNI
1laOwaj070lnJiWfBLIP/K+lx+VKRx5/O0rKZX35foLUTqJJKSyjAbKXuMCcSAM=
=2h2K
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"Nothing major this round. Mostly small clean ups and fixes.
Some visible changes:
- A new flag was added to distinguish traces done in NMI context.
- Preempt tracer now shows functions where preemption is disabled but
interrupts are still enabled.
Other notes:
- Updates were done to function tracing to allow better performance
with perf.
- Infrastructure code has been added to allow for a new histogram
feature for recording live trace event histograms that can be
configured by simple user commands. The feature itself was just
finished, but needs a round in linux-next before being pulled.
This only includes some infrastructure changes that will be needed"
* tag 'trace-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (22 commits)
tracing: Record and show NMI state
tracing: Fix trace_printk() to print when not using bprintk()
tracing: Remove redundant reset per-CPU buff in irqsoff tracer
x86: ftrace: Fix the misleading comment for arch/x86/kernel/ftrace.c
tracing: Fix crash from reading trace_pipe with sendfile
tracing: Have preempt(irqs)off trace preempt disabled functions
tracing: Fix return while holding a lock in register_tracer()
ftrace: Use kasprintf() in ftrace_profile_tracefs()
ftrace: Update dynamic ftrace calls only if necessary
ftrace: Make ftrace_hash_rec_enable return update bool
tracing: Fix typoes in code comment and printk in trace_nop.c
tracing, writeback: Replace cgroup path to cgroup ino
tracing: Use flags instead of bool in trigger structure
tracing: Add an unreg_all() callback to trigger commands
tracing: Add needs_rec flag to event triggers
tracing: Add a per-event-trigger 'paused' field
tracing: Add get_syscall_name()
tracing: Add event record param to trigger_ops.func()
tracing: Make event trigger functions available
tracing: Make ftrace_event_field checking functions available
...
Pull thermal updates from Zhang Rui:
- Fix a regression where bogus trip points on some Lenovo laptops start
to screw up thermal control after commit 81ad4276b5 ("Thermal:
initialize thermal zone device correctly").
On these Lenovo laptops, a bogus passive trip point is reported,
which is 0 degree Celsius. Without commit 81ad4276b5, thermal zone
fails to set cooling devices to proper cooling state, which is a bug.
But with commit 81ad4276b5 applied, the processors are always
throttled on these Lenovo laptops because the current temperature is
always higher than the passive trip point.
Fix things to ignore such bogus trip points. (Zhang Rui)
- Introduce Mediatek thermal driver. (Sascha Hauer)
- Introduce devm_ versions of OF thermal sensor register API. (Laxman
Dewangan)
- Changes in Kconfigs to allow compile test on UM arch. (Krzysztof
Kozlowski)
- Introduce Skylake support in intel_pch_thermal driver. (Srinivas
Pandruvada)
- Several small fixes on Rockchip, TI-SoC, Tegra, RCar, and Exynos
thermal drivers.
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (26 commits)
Thermal: Ignore invalid trip points
thermal: trace: migrating thermal traces to use TRACE_DEFINE_ENUM() macros
thermal: intel_pch_thermal: Enable Skylake PCH thermal
thermal: doc: Add details of devm_thermal_zone_of_sensor_{register,unregister}
thermal: of-thermal: Add devm version of thermal_zone_of_sensor_register
thermal: doc: Add details of thermal_zone_of_sensor_{register,unregister}
thermal: exynos: Defer probe if vtmu is present but not registered
thermal: exynos: Use devm_regulator_get_optional() for vtmu
thermal: exynos: List vtmu-supply as optional property in DT binding
thermal: exynos: Print a message about exceeded number of supported trip-points
thermal: exynos: Document number of supported trip-points
thermal: exynos: Document compatible for Exynos5433 TMU
thermal: mtk: allow compile testing on UM
thermal: tegra_soctherm: fix sign bit of temperature
thermal: Fix build error of missing devm_ioremap_resource on UM
thermal: ti-soc-thermal: clean up the error handling a bit
thermal: rcar: Use ARCH_RENESAS
thermal: rcar_thermal: don't open code of_device_get_match_data()
thermal: db8500_cpufreq_cooling: Compile with COMPILE_TEST
thermal: rockchip: fix the tsadc sequence output on rk3228/rk3399
...
Pull networking bugfixes from David Miller:
"Several bug fixes rolling in, some for changes introduced in this
merge window, and some for problems that have existed for some time:
1) Fix prepare_to_wait() handling in AF_VSOCK, from Claudio Imbrenda.
2) The new DST_CACHE should be a silent config option, from Dave
Jones.
3) inet_current_timestamp() unintentionally truncates timestamps to
16-bit, from Deepa Dinamani.
4) Missing reference to netns in ppp, from Guillaume Nault.
5) Free memory reference in hv_netvsc driver, from Haiyang Zhang.
6) Missing kernel doc documentation for function arguments in various
spots around the networking, from Luis de Bethencourt.
7) UDP stopped receiving broadcast packets properly, due to
overzealous multicast checks, fix from Paolo Abeni"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits)
net: ping: make ping_v6_sendmsg static
hv_netvsc: Fix the order of num_sc_offered decrement
net: Fix typos and whitespace.
hv_netvsc: Fix the array sizes to be max supported channels
hv_netvsc: Fix accessing freed memory in netvsc_change_mtu()
ppp: take reference on channels netns
net: Reset encap_level to avoid resetting features on inner IP headers
net: mediatek: fix checking for NULL instead of IS_ERR() in .probe
net: phy: at803x: Request 'reset' GPIO only for AT8030 PHY
at803x: fix reset handling
AF_VSOCK: Shrink the area influenced by prepare_to_wait
Revert "vsock: Fix blocking ops call in prepare_to_wait"
macb: fix PHY reset
ipv4: initialize flowi4_flags before calling fib_lookup()
fsl/fman: Workaround for Errata A-007273
ipv4: fix broadcast packets reception
net: hns: bug fix about the overflow of mss
net: hns: adds limitation for debug port mtu
net: hns: fix the bug about mtu setting
net: hns: fixes a bug of RSS
...
Pull f2fs updates from Jaegeuk Kim:
"New Features:
- uplift filesystem encryption into fs/crypto/
- give sysfs entries to control memroy consumption
Enhancements:
- aio performance by preallocating blocks in ->write_iter
- use writepages lock for only WB_SYNC_ALL
- avoid redundant inline_data conversion
- enhance forground GC
- use wait_for_stable_page as possible
- speed up SEEK_DATA and fiiemap
Bug Fixes:
- corner case in terms of -ENOSPC for inline_data
- hung task caused by long latency in shrinker
- corruption between atomic write and f2fs_trace_pid
- avoid garbage lengths in dentries
- revoke atomicly written pages if an error occurs
In addition, there are various minor bug fixes and clean-ups"
* tag 'for-f2fs-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (81 commits)
f2fs: submit node page write bios when really required
f2fs: add missing argument to f2fs_setxattr stub
f2fs: fix to avoid unneeded unlock_new_inode
f2fs: clean up opened code with f2fs_update_dentry
f2fs: declare static functions
f2fs: use cryptoapi crc32 functions
f2fs: modify the readahead method in ra_node_page()
f2fs crypto: sync ext4_lookup and ext4_file_open
fs crypto: move per-file encryption from f2fs tree to fs/crypto
f2fs: mutex can't be used by down_write_nest_lock()
f2fs: recovery missing dot dentries in root directory
f2fs: fix to avoid deadlock when merging inline data
f2fs: introduce f2fs_flush_merged_bios for cleanup
f2fs: introduce f2fs_update_data_blkaddr for cleanup
f2fs crypto: fix incorrect positioning for GCing encrypted data page
f2fs: fix incorrect upper bound when iterating inode mapping tree
f2fs: avoid hungtask problem caused by losing wake_up
f2fs: trace old block address for CoWed page
f2fs: try to flush inode after merging inline data
f2fs: show more info about superblock recovery
...
flowi6_tos of struct flowi6 is unused in IPv6, therefore dumping tos on
that tracepoint will also give incorrect information wrt traffic class.
If we want to fix it, we need to extract it via ip6_tclass(flp->flowlabel).
While for the same test case I get a count of 0 non-zero tos values before
the change, they now start to show up after the change:
# ./perf record -e fib6:fib6_table_lookup -a sleep 10
# ./perf script | grep -v "tos 0" | wc -l
60
Since there's no user in the kernel tree anymore of flowi6_tos, remove the
define to avoid any future confusion on this.
Fixes: b811580d91 ("net: IPv6 fib lookup tracepoint")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
"Highlights:
1) Support more Realtek wireless chips, from Jes Sorenson.
2) New BPF types for per-cpu hash and arrap maps, from Alexei
Starovoitov.
3) Make several TCP sysctls per-namespace, from Nikolay Borisov.
4) Allow the use of SO_REUSEPORT in order to do per-thread processing
of incoming TCP/UDP connections. The muxing can be done using a
BPF program which hashes the incoming packet. From Craig Gallek.
5) Add a multiplexer for TCP streams, to provide a messaged based
interface. BPF programs can be used to determine the message
boundaries. From Tom Herbert.
6) Add 802.1AE MACSEC support, from Sabrina Dubroca.
7) Avoid factorial complexity when taking down an inetdev interface
with lots of configured addresses. We were doing things like
traversing the entire address less for each address removed, and
flushing the entire netfilter conntrack table for every address as
well.
8) Add and use SKB bulk free infrastructure, from Jesper Brouer.
9) Allow offloading u32 classifiers to hardware, and implement for
ixgbe, from John Fastabend.
10) Allow configuring IRQ coalescing parameters on a per-queue basis,
from Kan Liang.
11) Extend ethtool so that larger link mode masks can be supported.
From David Decotigny.
12) Introduce devlink, which can be used to configure port link types
(ethernet vs Infiniband, etc.), port splitting, and switch device
level attributes as a whole. From Jiri Pirko.
13) Hardware offload support for flower classifiers, from Amir Vadai.
14) Add "Local Checksum Offload". Basically, for a tunneled packet
the checksum of the outer header is 'constant' (because with the
checksum field filled into the inner protocol header, the payload
of the outer frame checksums to 'zero'), and we can take advantage
of that in various ways. From Edward Cree"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits)
bonding: fix bond_get_stats()
net: bcmgenet: fix dma api length mismatch
net/mlx4_core: Fix backward compatibility on VFs
phy: mdio-thunder: Fix some Kconfig typos
lan78xx: add ndo_get_stats64
lan78xx: handle statistics counter rollover
RDS: TCP: Remove unused constant
RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket
net: smc911x: convert pxa dma to dmaengine
team: remove duplicate set of flag IFF_MULTICAST
bonding: remove duplicate set of flag IFF_MULTICAST
net: fix a comment typo
ethernet: micrel: fix some error codes
ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it
bpf, dst: add and use dst_tclassid helper
bpf: make skb->tc_classid also readable
net: mvneta: bm: clarify dependencies
cls_bpf: reset class and reuse major in da
ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c
ldmvsw: Add ldmvsw.c driver code
...
CMA allocation should be guaranteed to succeed by definition, but,
unfortunately, it would be failed sometimes. It is hard to track down
the problem, because it is related to page reference manipulation and we
don't have any facility to analyze it.
This patch adds tracepoints to track down page reference manipulation.
With it, we can find exact reason of failure and can fix the problem.
Following is an example of tracepoint output. (note: this example is
stale version that printing flags as the number. Recent version will
print it as human readable string.)
<...>-9018 [004] 92.678375: page_ref_set: pfn=0x17ac9 flags=0x0 count=1 mapcount=0 mapping=(nil) mt=4 val=1
<...>-9018 [004] 92.678378: kernel_stack:
=> get_page_from_freelist (ffffffff81176659)
=> __alloc_pages_nodemask (ffffffff81176d22)
=> alloc_pages_vma (ffffffff811bf675)
=> handle_mm_fault (ffffffff8119e693)
=> __do_page_fault (ffffffff810631ea)
=> trace_do_page_fault (ffffffff81063543)
=> do_async_page_fault (ffffffff8105c40a)
=> async_page_fault (ffffffff817581d8)
[snip]
<...>-9018 [004] 92.678379: page_ref_mod: pfn=0x17ac9 flags=0x40048 count=2 mapcount=1 mapping=0xffff880015a78dc1 mt=4 val=1
[snip]
...
...
<...>-9131 [001] 93.174468: test_pages_isolated: start_pfn=0x17800 end_pfn=0x17c00 fin_pfn=0x17ac9 ret=fail
[snip]
<...>-9018 [004] 93.174843: page_ref_mod_and_test: pfn=0x17ac9 flags=0x40068 count=0 mapcount=0 mapping=0xffff880015a78dc1 mt=4 val=-1 ret=1
=> release_pages (ffffffff8117c9e4)
=> free_pages_and_swap_cache (ffffffff811b0697)
=> tlb_flush_mmu_free (ffffffff81199616)
=> tlb_finish_mmu (ffffffff8119a62c)
=> exit_mmap (ffffffff811a53f7)
=> mmput (ffffffff81073f47)
=> do_exit (ffffffff810794e9)
=> do_group_exit (ffffffff81079def)
=> SyS_exit_group (ffffffff81079e74)
=> entry_SYSCALL_64_fastpath (ffffffff817560b6)
This output shows that problem comes from exit path. In exit path, to
improve performance, pages are not freed immediately. They are gathered
and processed by batch. During this process, migration cannot be
possible and CMA allocation is failed. This problem is hard to find
without this page reference tracepoint facility.
Enabling this feature bloat kernel text 30 KB in my configuration.
text data bss dec hex filename
12127327 2243616 1507328 15878271 f2487f vmlinux_disabled
12157208 2258880 1507328 15923416 f2f8d8 vmlinux_enabled
Note that, due to header file dependency problem between mm.h and
tracepoint.h, this feature has to open code the static key functions for
tracepoints. Proposed by Steven Rostedt in following link.
https://lkml.org/lkml/2015/12/9/699
[arnd@arndb.de: crypto/async_pq: use __free_page() instead of put_page()]
[iamjoonsoo.kim@lge.com: fix build failure for xtensa]
[akpm@linux-foundation.org: tweak Kconfig text, per Vlastimil]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Get list of VMA flags up-to-date and sort it to match VM_* definition
order.
[vbabka@suse.cz: add a note above vmaflag definitions to update the names when changing]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Memory compaction can be currently performed in several contexts:
- kswapd balancing a zone after a high-order allocation failure
- direct compaction to satisfy a high-order allocation, including THP
page fault attemps
- khugepaged trying to collapse a hugepage
- manually from /proc
The purpose of compaction is two-fold. The obvious purpose is to
satisfy a (pending or future) high-order allocation, and is easy to
evaluate. The other purpose is to keep overal memory fragmentation low
and help the anti-fragmentation mechanism. The success wrt the latter
purpose is more
The current situation wrt the purposes has a few drawbacks:
- compaction is invoked only when a high-order page or hugepage is not
available (or manually). This might be too late for the purposes of
keeping memory fragmentation low.
- direct compaction increases latency of allocations. Again, it would
be better if compaction was performed asynchronously to keep
fragmentation low, before the allocation itself comes.
- (a special case of the previous) the cost of compaction during THP
page faults can easily offset the benefits of THP.
- kswapd compaction appears to be complex, fragile and not working in
some scenarios. It could also end up compacting for a high-order
allocation request when it should be reclaiming memory for a later
order-0 request.
To improve the situation, we should be able to benefit from an
equivalent of kswapd, but for compaction - i.e. a background thread
which responds to fragmentation and the need for high-order allocations
(including hugepages) somewhat proactively.
One possibility is to extend the responsibilities of kswapd, which could
however complicate its design too much. It should be better to let
kswapd handle reclaim, as order-0 allocations are often more critical
than high-order ones.
Another possibility is to extend khugepaged, but this kthread is a
single instance and tied to THP configs.
This patch goes with the option of a new set of per-node kthreads called
kcompactd, and lays the foundations, without introducing any new
tunables. The lifecycle mimics kswapd kthreads, including the memory
hotplug hooks.
For compaction, kcompactd uses the standard compaction_suitable() and
ompact_finished() criteria and the deferred compaction functionality.
Unlike direct compaction, it uses only sync compaction, as there's no
allocation latency to minimize.
This patch doesn't yet add a call to wakeup_kcompactd. The kswapd
compact/reclaim loop for high-order pages will be replaced by waking up
kcompactd in the next patch with the description of what's wrong with
the old approach.
Waking up of the kcompactd threads is also tied to kswapd activity and
follows these rules:
- we don't want to affect any fastpaths, so wake up kcompactd only from
the slowpath, as it's done for kswapd
- if kswapd is doing reclaim, it's more important than compaction, so
don't invoke kcompactd until kswapd goes to sleep
- the target order used for kswapd is passed to kcompactd
Future possible future uses for kcompactd include the ability to wake up
kcompactd on demand in special situations, such as when hugepages are
not available (currently not done due to __GFP_NO_KSWAPD) or when a
fragmentation event (i.e. __rmqueue_fallback()) occurs. It's also
possible to perform periodic compaction with kcompactd.
[arnd@arndb.de: fix build errors with kcompactd]
[paul.gortmaker@windriver.com: don't use modular references for non modular code]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Redesign of cpufreq governors and the intel_pstate driver to
make them use callbacks invoked by the scheduler to trigger CPU
frequency evaluation instead of using per-CPU deferrable timers
for that purpose (Rafael Wysocki).
- Reorganization and cleanup of cpufreq governor code to make it
more straightforward and fix some concurrency problems in it
(Rafael Wysocki, Viresh Kumar).
- Cleanup and improvements of locking in the cpufreq core (Viresh
Kumar).
- Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh
Kumar, Eric Biggers).
- intel_pstate driver updates including fixes, optimizations and a
modification to make it enable enable hardware-coordinated P-state
selection (HWP) by default if supported by the processor (Philippe
Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe
Franciosi).
- Operating Performance Points (OPP) framework updates to improve
its handling of voltage regulators and device clocks and updates
of the cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter).
- Updates of the powernv cpufreq driver to fix initialization
and cleanup problems in it and correct its worker thread handling
with respect to CPU offline, new powernv_throttle tracepoint
(Shilpasri Bhat).
- ACPI cpufreq driver optimization and cleanup (Rafael Wysocki).
- ACPICA updates including one fix for a regression introduced
by previos changes in the ACPICA code (Bob Moore, Lv Zheng,
David Box, Colin Ian King).
- Support for installing ACPI tables from initrd (Lv Zheng).
- Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin
Chaugule).
- Support for _HID(ACPI0010) devices (ACPI processor containers)
and ACPI processor driver cleanups (Sudeep Holla).
- Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory,
Aleksey Makarov).
- Modification of the ACPI PCI IRQ management code to make it treat
255 in the Interrupt Line register as "not connected" on x86 (as
per the specification) and avoid attempts to use that value as
a valid interrupt vector (Chen Fan).
- ACPI APEI fixes related to resource leaks (Josh Hunt).
- Removal of modularity from a few ACPI drivers (BGRT, GHES,
intel_pmic_crc) that cannot be built as modules in practice (Paul
Gortmaker).
- PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS
as a valid resource type (Harb Abdulhamid).
- New device ID (future AMD I2C controller) in the ACPI driver for
AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu).
- Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin).
- cpuidle menu governor optimization to avoid a square root
computation in it (Rasmus Villemoes).
- Fix for potential use-after-free in the generic device properties
framework (Heikki Krogerus).
- Updates of the generic power domains (genpd) framework including
support for multiple power states of a domain, fixes and debugfs
output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart,
Geert Uytterhoeven).
- Intel RAPL power capping driver updates to reduce IPI overhead in
it (Jacob Pan).
- System suspend/hibernation code cleanups (Eric Biggers, Saurabh
Sengar).
- Year 2038 fix for the process freezer (Abhilash Jindal).
- turbostat utility updates including new features (decoding of more
registers and CPUID fields, sub-second intervals support, GFX MHz
and RC6 printout, --out command line option), fixes (syscall jitter
detection and workaround, reductioin of the number of syscalls made,
fixes related to Xeon x200 processors, compiler warning fixes) and
cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJW50NXAAoJEILEb/54YlRxvr8QAIktC9+ft0y5AmU46hDcBWcK
QutyWJL9X9BS6DWBJZA2qclDYFmhMfi5Fza1se0gQ9TnLB/KrBwHWLsiYoTsb1k+
nPKf214aPk+qAhkVuyB4leNWML9Qz9n9jwku/EYxWWpgtbSRf3+0ioIKZeWWc/8V
JvuaOu4O+g/tkmL7QTrnGWBwhIIssAAV85QPsHkx+g68MrCj4UMMzm7z9G21SPXX
bmP8yIHsczX/XnRsY0W2NSno7Vdk6ImHpDJ26IAZg28WRNPWICHgGYHvB0TTWMvb
tts+yqfF7/7QLRjT/M8k9CzDBDE/DnVqoZ0fNJ+aYr7hNKF32mtAN+jH9ZB9dl/P
fEFapJkPxnWyzAoVoB9Dz0rkcZkYMlbxlLWzUGpaPq0JflUUTzLk0ApSjmMn4HRO
UddwCDdyHTaYThp3gn6GbOb0pIP0SdOVbI1M2QV2x/4PLcT2Ft8Np1+1RFWOeinZ
Bdl9AE890big0808mqbBzw/buETwr9FjHtCdDPXpP0vJpkBLu3nIYRNb0LCt39es
mWMp6dFhGgvGj3D3ahTuV3GI8hdpDkh9SObexa11RCjkTKrXcwEmFxHxLeFXwKYq
alG278bo6cSChRMziS1lis+W/3tsJRN4TXUSv1PPzJHrFgptQVFRStU9ngBKP+pN
WB+itPc4Fw0YHOrAFsrx
=cfty
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI updates from Rafael Wysocki:
"This time the majority of changes go into cpufreq and they are
significant.
First off, the way CPU frequency updates are triggered is different
now. Instead of having to set up and manage a deferrable timer for
each CPU in the system to evaluate and possibly change its frequency
periodically, cpufreq governors set up callbacks to be invoked by the
scheduler on a regular basis (basically on utilization updates). The
"old" governors, "ondemand" and "conservative", still do all of their
work in process context (although that is triggered by the scheduler
now), but intel_pstate does it all in the callback invoked by the
scheduler with no need for any additional asynchronous processing.
Of course, this eliminates the overhead related to the management of
all those timers, but also it allows the cpufreq governor code to be
simplified quite a bit. On top of that, the common code and data
structures used by the "ondemand" and "conservative" governors are
cleaned up and made more straightforward and some long-standing and
quite annoying problems are addressed. In particular, the handling of
governor sysfs attributes is modified and the related locking becomes
more fine grained which allows some concurrency problems to be avoided
(particularly deadlocks with the core cpufreq code).
In principle, the new mechanism for triggering frequency updates
allows utilization information to be passed from the scheduler to
cpufreq. Although the current code doesn't make use of it, in the
works is a new cpufreq governor that will make decisions based on the
scheduler's utilization data. That should allow the scheduler and
cpufreq to work more closely together in the long run.
In addition to the core and governor changes, cpufreq drivers are
updated too. Fixes and optimizations go into intel_pstate, the
cpufreq-dt driver is updated on top of some modification in the
Operating Performance Points (OPP) framework and there are fixes and
other updates in the powernv cpufreq driver.
Apart from the cpufreq updates there is some new ACPICA material,
including a fix for a problem introduced by previous ACPICA updates,
and some less significant changes in the ACPI code, like CPPC code
optimizations, ACPI processor driver cleanups and support for loading
ACPI tables from initrd.
Also updated are the generic power domains framework, the Intel RAPL
power capping driver and the turbostat utility and we have a bunch of
traditional assorted fixes and cleanups.
Specifics:
- Redesign of cpufreq governors and the intel_pstate driver to make
them use callbacks invoked by the scheduler to trigger CPU
frequency evaluation instead of using per-CPU deferrable timers for
that purpose (Rafael Wysocki).
- Reorganization and cleanup of cpufreq governor code to make it more
straightforward and fix some concurrency problems in it (Rafael
Wysocki, Viresh Kumar).
- Cleanup and improvements of locking in the cpufreq core (Viresh
Kumar).
- Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh
Kumar, Eric Biggers).
- intel_pstate driver updates including fixes, optimizations and a
modification to make it enable enable hardware-coordinated P-state
selection (HWP) by default if supported by the processor (Philippe
Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe
Franciosi).
- Operating Performance Points (OPP) framework updates to improve its
handling of voltage regulators and device clocks and updates of the
cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter).
- Updates of the powernv cpufreq driver to fix initialization and
cleanup problems in it and correct its worker thread handling with
respect to CPU offline, new powernv_throttle tracepoint (Shilpasri
Bhat).
- ACPI cpufreq driver optimization and cleanup (Rafael Wysocki).
- ACPICA updates including one fix for a regression introduced by
previos changes in the ACPICA code (Bob Moore, Lv Zheng, David Box,
Colin Ian King).
- Support for installing ACPI tables from initrd (Lv Zheng).
- Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin
Chaugule).
- Support for _HID(ACPI0010) devices (ACPI processor containers) and
ACPI processor driver cleanups (Sudeep Holla).
- Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory,
Aleksey Makarov).
- Modification of the ACPI PCI IRQ management code to make it treat
255 in the Interrupt Line register as "not connected" on x86 (as
per the specification) and avoid attempts to use that value as a
valid interrupt vector (Chen Fan).
- ACPI APEI fixes related to resource leaks (Josh Hunt).
- Removal of modularity from a few ACPI drivers (BGRT, GHES,
intel_pmic_crc) that cannot be built as modules in practice (Paul
Gortmaker).
- PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS
as a valid resource type (Harb Abdulhamid).
- New device ID (future AMD I2C controller) in the ACPI driver for
AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu).
- Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin).
- cpuidle menu governor optimization to avoid a square root
computation in it (Rasmus Villemoes).
- Fix for potential use-after-free in the generic device properties
framework (Heikki Krogerus).
- Updates of the generic power domains (genpd) framework including
support for multiple power states of a domain, fixes and debugfs
output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart,
Geert Uytterhoeven).
- Intel RAPL power capping driver updates to reduce IPI overhead in
it (Jacob Pan).
- System suspend/hibernation code cleanups (Eric Biggers, Saurabh
Sengar).
- Year 2038 fix for the process freezer (Abhilash Jindal).
- turbostat utility updates including new features (decoding of more
registers and CPUID fields, sub-second intervals support, GFX MHz
and RC6 printout, --out command line option), fixes (syscall jitter
detection and workaround, reductioin of the number of syscalls
made, fixes related to Xeon x200 processors, compiler warning
fixes) and cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu)"
* tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (182 commits)
tools/power turbostat: bugfix: TDP MSRs print bits fixing
tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
tools/power turbostat: call __cpuid() instead of __get_cpuid()
tools/power turbostat: indicate SMX and SGX support
tools/power turbostat: detect and work around syscall jitter
tools/power turbostat: show GFX%rc6
tools/power turbostat: show GFXMHz
tools/power turbostat: show IRQs per CPU
tools/power turbostat: make fewer systems calls
tools/power turbostat: fix compiler warnings
tools/power turbostat: add --out option for saving output in a file
tools/power turbostat: re-name "%Busy" field to "Busy%"
tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
tools/power turbostat: allow sub-sec intervals
ACPI / APEI: ERST: Fixed leaked resources in erst_init
ACPI / APEI: Fix leaked resources
intel_pstate: Do not skip samples partially
intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
...
Merge first patch-bomb from Andrew Morton:
- some misc things
- ofs2 updates
- about half of MM
- checkpatch updates
- autofs4 update
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits)
autofs4: fix string.h include in auto_dev-ioctl.h
autofs4: use pr_xxx() macros directly for logging
autofs4: change log print macros to not insert newline
autofs4: make autofs log prints consistent
autofs4: fix some white space errors
autofs4: fix invalid ioctl return in autofs4_root_ioctl_unlocked()
autofs4: fix coding style line length in autofs4_wait()
autofs4: fix coding style problem in autofs4_get_set_timeout()
autofs4: coding style fixes
autofs: show pipe inode in mount options
kallsyms: add support for relative offsets in kallsyms address table
kallsyms: don't overload absolute symbol type for percpu symbols
x86: kallsyms: disable absolute percpu symbols on !SMP
checkpatch: fix another left brace warning
checkpatch: improve UNSPECIFIED_INT test for bare signed/unsigned uses
checkpatch: warn on bare unsigned or signed declarations without int
checkpatch: exclude asm volatile from complex macro check
mm: memcontrol: drop unnecessary lru locking from mem_cgroup_migrate()
mm: migrate: consolidate mem_cgroup_migrate() calls
mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
...
but lots of architecture-specific changes.
* ARM:
- VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
- PMU support for guests
- 32bit world switch rewritten in C
- various optimizations to the vgic save/restore code.
* PPC:
- enabled KVM-VFIO integration ("VFIO device")
- optimizations to speed up IPIs between vcpus
- in-kernel handling of IOMMU hypercalls
- support for dynamic DMA windows (DDW).
* s390:
- provide the floating point registers via sync regs;
- separated instruction vs. data accesses
- dirty log improvements for huge guests
- bugfixes and documentation improvements.
* x86:
- Hyper-V VMBus hypercall userspace exit
- alternative implementation of lowest-priority interrupts using vector
hashing (for better VT-d posted interrupt support)
- fixed guest debugging with nested virtualizations
- improved interrupt tracking in the in-kernel IOAPIC
- generic infrastructure for tracking writes to guest memory---currently
its only use is to speedup the legacy shadow paging (pre-EPT) case, but
in the future it will be used for virtual GPUs as well
- much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJW5r3BAAoJEL/70l94x66D2pMH/jTSWWwdTUJMctrDjPVzKzG0
yOzHW5vSLFoFlwEOY2VpslnXzn5TUVmCAfrdmFNmQcSw6hGb3K/xA/ZX/KLwWhyb
oZpr123ycahga+3q/ht/dFUBCCyWeIVMdsLSFwpobEBzPL0pMgc9joLgdUC6UpWX
tmN0LoCAeS7spC4TTiTTpw3gZ/L+aB0B6CXhOMjldb9q/2CsgaGyoVvKA199nk9o
Ngu7ImDt7l/x1VJX4/6E/17VHuwqAdUrrnbqerB/2oJ5ixsZsHMGzxQ3sHCmvyJx
WG5L00ubB1oAJAs9fBg58Y/MdiWX99XqFhdEfxq4foZEiQuCyxygVvq3JwZTxII=
=OUZZ
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"One of the largest releases for KVM... Hardly any generic
changes, but lots of architecture-specific updates.
ARM:
- VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
- PMU support for guests
- 32bit world switch rewritten in C
- various optimizations to the vgic save/restore code.
PPC:
- enabled KVM-VFIO integration ("VFIO device")
- optimizations to speed up IPIs between vcpus
- in-kernel handling of IOMMU hypercalls
- support for dynamic DMA windows (DDW).
s390:
- provide the floating point registers via sync regs;
- separated instruction vs. data accesses
- dirty log improvements for huge guests
- bugfixes and documentation improvements.
x86:
- Hyper-V VMBus hypercall userspace exit
- alternative implementation of lowest-priority interrupts using
vector hashing (for better VT-d posted interrupt support)
- fixed guest debugging with nested virtualizations
- improved interrupt tracking in the in-kernel IOAPIC
- generic infrastructure for tracking writes to guest
memory - currently its only use is to speedup the legacy shadow
paging (pre-EPT) case, but in the future it will be used for
virtual GPUs as well
- much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (217 commits)
KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch
KVM: x86: disable MPX if host did not enable MPX XSAVE features
arm64: KVM: vgic-v3: Only wipe LRs on vcpu exit
arm64: KVM: vgic-v3: Reset LRs at boot time
arm64: KVM: vgic-v3: Do not save an LR known to be empty
arm64: KVM: vgic-v3: Save maintenance interrupt state only if required
arm64: KVM: vgic-v3: Avoid accessing ICH registers
KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hit
KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exit
KVM: arm/arm64: vgic-v2: Reset LRs at boot time
KVM: arm/arm64: vgic-v2: Do not save an LR known to be empty
KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own function
KVM: arm/arm64: vgic-v2: Save maintenance interrupt state only if required
KVM: arm/arm64: vgic-v2: Avoid accessing GICH registers
KVM: s390: allocate only one DMA page per VM
KVM: s390: enable STFLE interpretation only if enabled for the guest
KVM: s390: wake up when the VCPU cpu timer expires
KVM: s390: step the VCPU timer while in enabled wait
KVM: s390: protect VCPU cpu timer with a seqcount
KVM: s390: step VCPU cpu timer during kvm_run ioctl
...
In tracepoints, it's possible to print gfp flags in a human-friendly
format through a macro show_gfp_flags(), which defines a translation
array and passes is to __print_flags(). Since the following patch will
introduce support for gfp flags printing in printk(), it would be nice
to reuse the array. This is not straightforward, since __print_flags()
can't simply reference an array defined in a .c file such as mm/debug.c
- it has to be a macro to allow the macro magic to communicate the
format to userspace tools such as trace-cmd.
The solution is to create a macro __def_gfpflag_names which is used both
in show_gfp_flags(), and to define the gfpflag_names[] array in
mm/debug.c.
On the other hand, mm/debug.c also defines translation tables for page
flags and vma flags, and desire was expressed (but not implemented in
this series) to use these also from tracepoints. Thus, this patch also
renames the events/gfpflags.h file to events/mmflags.h and moves the
table definitions there, using the same macro approach as for gfpflags.
This allows translating all three kinds of mm-specific flags both in
tracepoints and printk.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The show_gfp_flags() macro provides human-friendly printing of gfp flags
in tracepoints. However, it is somewhat out of date and missing several
flags. This patches fills in the missing flags, and distinguishes
properly between GFP_ATOMIC and __GFP_ATOMIC which were both translated
to "GFP_ATOMIC". More generally, all __GFP_X flags which were
previously printed as GFP_X, are now printed as __GFP_X, since ommiting
the underscores results in output that doesn't actually match the source
code, and can only lead to confusion. Where both variants are defined
equal (e.g. _DMA and _DMA32), the variant without underscores are
preferred.
Also add a note in gfp.h so hopefully future changes will be synced
better.
__GFP_MOVABLE is defined twice in include/linux/gfp.h with different
comments. Leave just the newer one, which was intended to replace the
old one.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull cpu hotplug updates from Thomas Gleixner:
"This is the first part of the ongoing cpu hotplug rework:
- Initial implementation of the state machine
- Runs all online and prepare down callbacks on the plugged cpu and
not on some random processor
- Replaces busy loop waiting with completions
- Adds tracepoints so the states can be followed"
More detailed commentary on this work from an earlier email:
"What's wrong with the current cpu hotplug infrastructure?
- Asymmetry
The hotplug notifier mechanism is asymmetric versus the bringup and
teardown. This is mostly caused by the notifier mechanism.
- Largely undocumented dependencies
While some notifiers use explicitely defined notifier priorities,
we have quite some notifiers which use numerical priorities to
express dependencies without any documentation why.
- Control processor driven
Most of the bringup/teardown of a cpu is driven by a control
processor. While it is understandable, that preperatory steps,
like idle thread creation, memory allocation for and initialization
of essential facilities needs to be done before a cpu can boot,
there is no reason why everything else must run on a control
processor. Before this patch series, bringup looks like this:
Control CPU Booting CPU
do preparatory steps
kick cpu into life
do low level init
sync with booting cpu sync with control cpu
bring the rest up
- All or nothing approach
There is no way to do partial bringups. That's something which is
really desired because we waste e.g. at boot substantial amount of
time just busy waiting that the cpu comes to life. That's stupid
as we could very well do preparatory steps and the initial IPI for
other cpus and then go back and do the necessary low level
synchronization with the freshly booted cpu.
- Minimal debuggability
Due to the notifier based design, it's impossible to switch between
two stages of the bringup/teardown back and forth in order to test
the correctness. So in many hotplug notifiers the cancel
mechanisms are either not existant or completely untested.
- Notifier [un]registering is tedious
To [un]register notifiers we need to protect against hotplug at
every callsite. There is no mechanism that bringup/teardown
callbacks are issued on the online cpus, so every caller needs to
do it itself. That also includes error rollback.
What's the new design?
The base of the new design is a symmetric state machine, where both
the control processor and the booting/dying cpu execute a well
defined set of states. Each state is symmetric in the end, except
for some well defined exceptions, and the bringup/teardown can be
stopped and reversed at almost all states.
So the bringup of a cpu will look like this in the future:
Control CPU Booting CPU
do preparatory steps
kick cpu into life
do low level init
sync with booting cpu sync with control cpu
bring itself up
The synchronization step does not require the control cpu to wait.
That mechanism can be done asynchronously via a worker or some
other mechanism.
The teardown can be made very similar, so that the dying cpu cleans
up and brings itself down. Cleanups which need to be done after
the cpu is gone, can be scheduled asynchronously as well.
There is a long way to this, as we need to refactor the notion when a
cpu is available. Today we set the cpu online right after it comes
out of the low level bringup, which is not really correct.
The proper mechanism is to set it to available, i.e. cpu local
threads, like softirqd, hotplug thread etc. can be scheduled on that
cpu, and once it finished all booting steps, it's set to online, so
general workloads can be scheduled on it. The reverse happens on
teardown. First thing to do is to forbid scheduling of general
workloads, then teardown all the per cpu resources and finally shut it
off completely.
This patch series implements the basic infrastructure for this at the
core level. This includes the following:
- Basic state machine implementation with well defined states, so
ordering and prioritization can be expressed.
- Interfaces to [un]register state callbacks
This invokes the bringup/teardown callback on all online cpus with
the proper protection in place and [un]installs the callbacks in
the state machine array.
For callbacks which have no particular ordering requirement we have
a dynamic state space, so that drivers don't have to register an
explicit hotplug state.
If a callback fails, the code automatically does a rollback to the
previous state.
- Sysfs interface to drive the state machine to a particular step.
This is only partially functional today. Full functionality and
therefor testability will be achieved once we converted all
existing hotplug notifiers over to the new scheme.
- Run all CPU_ONLINE/DOWN_PREPARE notifiers on the booting/dying
processor:
Control CPU Booting CPU
do preparatory steps
kick cpu into life
do low level init
sync with booting cpu sync with control cpu
wait for boot
bring itself up
Signal completion to control cpu
In a previous step of this work we've done a full tree mechanical
conversion of all hotplug notifiers to the new scheme. The balance
is a net removal of about 4000 lines of code.
This is not included in this series, as we decided to take a
different approach. Instead of mechanically converting everything
over, we will do a proper overhaul of the usage sites one by one so
they nicely fit into the symmetric callback scheme.
I decided to do that after I looked at the ugliness of some of the
converted sites and figured out that their hotplug mechanism is
completely buggered anyway. So there is no point to do a
mechanical conversion first as we need to go through the usage
sites one by one again in order to achieve a full symmetric and
testable behaviour"
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
cpu/hotplug: Document states better
cpu/hotplug: Fix smpboot thread ordering
cpu/hotplug: Remove redundant state check
cpu/hotplug: Plug death reporting race
rcu: Make CPU_DYING_IDLE an explicit call
cpu/hotplug: Make wait for dead cpu completion based
cpu/hotplug: Let upcoming cpu bring itself fully up
arch/hotplug: Call into idle with a proper state
cpu/hotplug: Move online calls to hotplugged cpu
cpu/hotplug: Create hotplug threads
cpu/hotplug: Split out the state walk into functions
cpu/hotplug: Unpark smpboot threads from the state machine
cpu/hotplug: Move scheduler cpu_online notifier to hotplug core
cpu/hotplug: Implement setup/removal interface
cpu/hotplug: Make target state writeable
cpu/hotplug: Add sysfs state interface
cpu/hotplug: Hand in target state to _cpu_up/down
cpu/hotplug: Convert the hotplugged cpu work to a state machine
cpu/hotplug: Convert to a state machine for the control processor
cpu/hotplug: Add tracepoints
...
Pull NOHZ updates from Ingo Molnar:
"NOHZ enhancements, by Frederic Weisbecker, which reorganizes/refactors
the NOHZ 'can the tick be stopped?' infrastructure and related code to
be data driven, and harmonizes the naming and handling of all the
various properties"
[ This makes the ugly "fetch_or()" macro that the scheduler used
internally a new generic helper, and does a bad job at it.
I'm pulling it, but I've asked Ingo and Frederic to get this
fixed up ]
* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched-clock: Migrate to use new tick dependency mask model
posix-cpu-timers: Migrate to use new tick dependency mask model
sched: Migrate sched to use new tick dependency mask model
sched: Account rr tasks
perf: Migrate perf to use new tick dependency mask model
nohz: Use enum code for tick stop failure tracing message
nohz: New tick dependency mask
nohz: Implement wide kick on top of irq work
atomic: Export fetch_or()
Userspace tools are not aware of how to convert the enums provided by
the tracepoints to their corresponding strings.
Adding TRACE_DEFINE_ENUM() macros allows to make the enums available
to userspace to let the tools know what those enum values represent.
In particular, for thermal zone trip types what we obtained before was
something like:
kworker/1:1-460 [001] 320.372732: thermal_zone_trip: thermal_zone=soc
id=0 trip=1 trip_type=1
Unfortunately, userspace tools do not know how to convert enum values to
strings and as a consequence they can only forward the enum value to the
output. By using TRACE_DEFINE_ENUM() macros for thermal traces we get the
following trace line:
kworker/1:1-460 [001] 320.372732: thermal_zone_trip: thermal_zone=soc
id=0 trip=1 trip_type=PASSIVE
Userspace tools are now able to better understand the meaning of the trip_type
and provide the user with more readable information.
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
* pm-cpufreq: (94 commits)
intel_pstate: Do not skip samples partially
intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
intel_pstate: Optimize calculation for max/min_perf_adj
intel_pstate: Remove extra conversions in pid calculation
cpufreq: Move scheduler-related code to the sched directory
Revert "cpufreq: postfix policy directory with the first CPU in related_cpus"
cpufreq: Reduce cpufreq_update_util() overhead a bit
cpufreq: Select IRQ_WORK if CPU_FREQ_GOV_COMMON is set
cpufreq: Remove 'policy->governor_enabled'
cpufreq: Rename __cpufreq_governor() to cpufreq_governor()
cpufreq: Relocate handle_update() to kill its declaration
cpufreq: governor: Drop unnecessary checks from show() and store()
cpufreq: governor: Fix race in dbs_update_util_handler()
cpufreq: governor: Make gov_set_update_util() static
cpufreq: governor: Narrow down the dbs_data_mutex coverage
cpufreq: governor: Make dbs_data_mutex static
cpufreq: governor: Relocate definitions of tuners structures
cpufreq: governor: Move per-CPU data to the common code
cpufreq: governor: Make governor private data per-policy
...
commit 5634cc2aa9 ("writeback: update writeback
tracepoints to report cgroup") made writeback tracepoints print out cgroup
path when CGROUP_WRITEBACK is enabled, but it may trigger the below bug on -rt
kernel since kernfs_path and kernfs_path_len are called by tracepoints, which
acquire spin lock that is sleepable on -rt kernel.
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:930
in_atomic(): 1, irqs_disabled(): 0, pid: 625, name: kworker/u16:3
INFO: lockdep is turned off.
Preemption disabled at:[<ffffffc000374a5c>] wb_writeback+0xec/0x830
CPU: 7 PID: 625 Comm: kworker/u16:3 Not tainted 4.4.1-rt5 #20
Hardware name: Freescale Layerscape 2085a RDB Board (DT)
Workqueue: writeback wb_workfn (flush-7:0)
Call trace:
[<ffffffc00008d708>] dump_backtrace+0x0/0x200
[<ffffffc00008d92c>] show_stack+0x24/0x30
[<ffffffc0007b0f40>] dump_stack+0x88/0xa8
[<ffffffc000127d74>] ___might_sleep+0x2ec/0x300
[<ffffffc000d5d550>] rt_spin_lock+0x38/0xb8
[<ffffffc0003e0548>] kernfs_path_len+0x30/0x90
[<ffffffc00036b360>] trace_event_raw_event_writeback_work_class+0xe8/0x2e8
[<ffffffc000374f90>] wb_writeback+0x620/0x830
[<ffffffc000376224>] wb_workfn+0x61c/0x950
[<ffffffc000110adc>] process_one_work+0x3ac/0xb30
[<ffffffc0001112fc>] worker_thread+0x9c/0x7a8
[<ffffffc00011a9e8>] kthread+0x190/0x1b0
[<ffffffc000086ca0>] ret_from_fork+0x10/0x30
With unlocked kernfs_* functions, synchronize_sched() has to be called in
kernfs_rename which could be called in syscall path, but it is problematic.
So, print out cgroup ino instead of path name, which could be converted to
path name by userland.
Withouth CGROUP_WRITEBACK enabled, it just prints out root dir. But, root
dir ino vary from different filesystems, so printing out -1U to indicate
an invalid cgroup ino.
Link: http://lkml.kernel.org/r/1456996137-8354-1-git-send-email-yang.shi@linaro.org
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Some trace events have conditions that check if the current CPU is online or
not before recording the tracepoint. That's because certain trace events are
in locations that can be called as the CPU is going offline and when RCU no
longer monitors it (like kfree and friends). The check was added because
trace events require RCU to be active.
This is a trace event infrastructure issue and not something that individual
trace events should worry about. The tracepoint.h code now has added a check
to see if the current CPU is considered online, and it only does the
tracepoint if it is. There's no more need for individual trace events to
also include this check. It is now redundant.
Cc: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
We want to trace the hotplug machinery. Add tracepoints to track the
invocation of callbacks and their result.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: Rik van Riel <riel@redhat.com>
Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: http://lkml.kernel.org/r/20160226182340.593563875@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
After a change to the snd_jack structure, the 'name' member
is no longer available in all configurations, which results in a
build failure in the tracing code:
include/trace/events/asoc.h: In function 'trace_event_raw_event_snd_soc_jack_report':
include/trace/events/asoc.h:240:32: error: 'struct snd_jack' has no member named 'name'
The name field is normally initialized from the card shortname and
the jack "id" field:
snprintf(jack->name, sizeof(jack->name), "%s %s",
card->shortname, jack->id);
This changes the tracing output to just contain the 'id' by
itself, which slightly changes the output format but avoids the
link error and is hopefully still enough to see what is going on.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: fe0d128c57 ("ALSA: jack: Allow building the jack layer without input device")
Signed-off-by: Mark Brown <broonie@kernel.org>
f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file
With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.
But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.
So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.
If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Right now halt_poll_ns can be change during runtime. The
grow and shrink factors can only be set during module load.
Lets fix several aspects of grow shrink:
- make grow/shrink changeable by root
- make all variables unsigned int
- read the variables once to prevent races
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add perf event macros for support of tracing and instrumentation
of LDC state machine
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the powernv_throttle tracepoint to trace the CPU
frequency throttling event, which is used by the powernv-cpufreq
driver in POWER8.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The first is a cut and paste issue that changed the amount of stack
to skip when tracing a stack dump from 0 to 6, which basically made
the stack disappear for small stack traces.
The second fix is just removing an unused field in a struct that is no
longer used, and currently just wastes space.
The third is another cut-and-paste fix that had a tracepoint recording
the wrong field (it was recording the previous field a second time).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWqibPAAoJEKKk/i67LK/8/NkH/3M6WB7RIiMMd4O403imbKcs
yIH0j9vH6Z5hwoAUUr0bEw+gHVgzsiRky5z+fP0f1J3QdVAdgEig6RgQtIbWRynu
i7fohNAiSMBob0wOIHTohQDKkQjvgoO9gO5S8nY6Axgpf4iqOTy3RF2a/gcltULY
qdgy9A0vLk6yMbP6c0P+kEzg4y+Q90DsUh8YzQKW7F1EJPneDmNdug3VM16gefTR
4yrodSBHxr8NV3kAhN8G7FjWmK5cBDFwD66vsti64mKVCW00hjYRCQ+5BrgQ7h0V
EDC7kHisckLb415SQxe8XdF4fKbfE1PuQYZhjTo02hx9XCMeyxDWbjTF2PrZCHw=
=gab6
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull minor tracing fixes from Steven Rostedt:
"This includes three minor fixes, mostly due to cut-and-paste issues.
The first is a cut and paste issue that changed the amount of stack to
skip when tracing a stack dump from 0 to 6, which basically made the
stack disappear for small stack traces.
The second fix is just removing an unused field in a struct that is no
longer used, and currently just wastes space.
The third is another cut-and-paste fix that had a tracepoint recording
the wrong field (it was recording the previous field a second time)"
* tag 'trace-v4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/dma-buf/fence: Fix timeline str value on fence_annotate_wait_on
ftrace: Remove unused nr_trampolines var
tracing: Fix stacktrace skip depth in trace_buffer_unlock_commit_regs()
- Remove usage of ib_query_device and instead store attributes in
ib_device struct
- Move iopoll out of block and into lib, rename to irqpoll, and use
in several places in the rdma stack as our new completion queue
polling library mechanism. Update the other block drivers that
already used iopoll to use the new mechanism too.
- Replace the per-entry GID table locks with a single GID table lock
- IPoIB multicast cleanup
- Cleanups to the IB MR facility
- Add support for 64bit extended IB counters
- Fix for netlink oops while parsing RDMA nl messages
- RoCEv2 support for the core IB code
- mlx4 RoCEv2 support
- mlx5 RoCEv2 support
- Cross Channel support for mlx5
- Timestamp support for mlx5
- Atomic support for mlx5
- Raw QP support for mlx5
- MAINTAINERS update for mlx4/mlx5
- Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates
- Add support for remote invalidate to the iSER driver (pushed through the
RDMA tree due to dependencies, acknowledged by nab)
- Update to NFSoRDMA (pushed through the RDMA tree due to dependencies,
acknowledged by Bruce)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWoSygAAoJELgmozMOVy/dDjsP/2vbTda2MvQfkfkGEZBQdJSg
095RN0gQgCJdg78lAl8yuaK8r4VN/7uefpDtFdudH1I/Pei7X0wxN9R1UzFNG4KR
AD53lz92IVPs15328SbPR2kvNWISR9aBFQo3rlElq3Grqlp0EMn2Ou1vtu87rekF
aMllxr8Nl0uZhP+eWusOsYpJUUtwirLgRnrAyfqo2UxZh/TMIroT0TCx1KXjVcAg
dhDARiZAdu3OgSc6OsWqmH+DELEq6dFVA5F+DDBGAb8bFZqlJc7cuMHWInwNsNXT
so4bnEQ835alTbsdYtqs5DUNS8heJTAJP4Uz0ehkTh/uNCcvnKeUTw1c2P/lXI1k
7s33gMM+0FXj0swMBw0kKwAF2d9Hhus9UAN7NwjBuOyHcjGRd5q7SAnfWkvKx000
s9jVW19slb2I38gB58nhjOh8s+vXUArgxnV1+kTia1+bJSR5swvVoWRicRXdF0vh
TvLX/BjbSIU73g1TnnLNYoBTV3ybFKQ6bVdQW7fzSTDs54dsI1vvdHXi3bYZCpnL
HVwQTZRfEzkvb0AdKbcvf8p/TlaAHem3ODqtO1eHvO4if1QJBSn+SptTEeJVYYdK
n4B3l/dMoBH4JXJUmEHB9jwAvYOpv/YLAFIvdL7NFwbqGNsC3nfXFcmkVORB1W3B
KEMcM2we4bz+uyKMjEAD
=5oO7
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma updates from Doug Ledford:
"Initial roundup of 4.5 merge window patches
- Remove usage of ib_query_device and instead store attributes in
ib_device struct
- Move iopoll out of block and into lib, rename to irqpoll, and use
in several places in the rdma stack as our new completion queue
polling library mechanism. Update the other block drivers that
already used iopoll to use the new mechanism too.
- Replace the per-entry GID table locks with a single GID table lock
- IPoIB multicast cleanup
- Cleanups to the IB MR facility
- Add support for 64bit extended IB counters
- Fix for netlink oops while parsing RDMA nl messages
- RoCEv2 support for the core IB code
- mlx4 RoCEv2 support
- mlx5 RoCEv2 support
- Cross Channel support for mlx5
- Timestamp support for mlx5
- Atomic support for mlx5
- Raw QP support for mlx5
- MAINTAINERS update for mlx4/mlx5
- Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates
- Add support for remote invalidate to the iSER driver (pushed
through the RDMA tree due to dependencies, acknowledged by nab)
- Update to NFSoRDMA (pushed through the RDMA tree due to
dependencies, acknowledged by Bruce)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (169 commits)
IB/mlx5: Unify CQ create flags check
IB/mlx5: Expose Raw Packet QP to user space consumers
{IB, net}/mlx5: Move the modify QP operation table to mlx5_ib
IB/mlx5: Support setting Ethernet priority for Raw Packet QPs
IB/mlx5: Add Raw Packet QP query functionality
IB/mlx5: Add create and destroy functionality for Raw Packet QP
IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types
IB/mlx5: Allocate a Transport Domain for each ucontext
net/mlx5_core: Warn on unsupported events of QP/RQ/SQ
net/mlx5_core: Add RQ and SQ event handling
net/mlx5_core: Export transport objects
IB/mlx5: Expose CQE version to user-space
IB/mlx5: Add CQE version 1 support to user QPs and SRQs
IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext
IB/sa: Fix netlink local service GFP crash
IB/srpt: Remove redundant wc array
IB/qib: Improve ipoib UD performance
IB/mlx4: Advertise RoCE v2 support
IB/mlx4: Create and use another QP1 for RoCEv2
IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
...
encryption fixes from me, and Li Xi's Project Quota commits.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJWnwkiAAoJEPL5WVaVDYGjyAAH/1dj1nNL9h+d12V3zXbvoPkg
5RFw/2QfMZ+GE3Lln9gxTBDSyo/9m8hUK8eg0WpIRtGX9NbKcyrWEGJa2XF++43k
tVpKGyN6cqkwPu4M6EPIK9yRvuALGB5PJE/u0q1lu9VoIAgtin3F/bAQK/iHnrUg
M3+lVDtKcmbhqCdocaLLZD6Q4xlQI3wJne99pYt+Dtx95aOQY9v9SV030i7sOnEt
R5JrhmfkgNqVTB8Zz0IxOp5LQlOkuyvtnZ44yYgJH8ckCUnDQI2hbksSqcMamJ1Y
QJWBzRhVXU9gs1nCRy/Xh48mSk+nvZW9aglk+Syzbzg5C63SgwYcqvbCBqJJEdc=
=HjkT
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Some locking and page fault bug fixes from Jan Kara, some ext4
encryption fixes from me, and Li Xi's Project Quota commits"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
fs: clean up the flags definition in uapi/linux/fs.h
ext4: add FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support
ext4: add project quota support
ext4: adds project ID support
ext4 crypto: simplify interfaces to directory entry insert functions
ext4 crypto: add missing locking for keyring_key access
ext4: use pre-zeroed blocks for DAX page faults
ext4: implement allocation of pre-zeroed blocks
ext4: provide ext4_issue_zeroout()
ext4: get rid of EXT4_GET_BLOCKS_NO_LOCK flag
ext4: document lock ordering
ext4: fix races of writeback with punch hole and zero range
ext4: fix races between buffered IO and collapse / insert range
ext4: move unlocked dio protection from ext4_alloc_file_blocks()
ext4: fix races between page faults and hole punching
Pull btrfs updates from Chris Mason:
"This has our usual assortment of fixes and cleanups, but the biggest
change included is Omar Sandoval's free space tree. It's not the
default yet, mounting -o space_cache=v2 enables it and sets a readonly
compat bit. The tree can actually be deleted and regenerated if there
are any problems, but it has held up really well in testing so far.
For very large filesystems (30T+) our existing free space caching code
can end up taking a huge amount of time during commits. The new tree
based code is faster and less work overall to update as the commit
progresses.
Omar worked on this during the summer and we'll hammer on it in
production here at FB over the next few months"
* 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (73 commits)
Btrfs: fix fitrim discarding device area reserved for boot loader's use
Btrfs: Check metadata redundancy on balance
btrfs: statfs: report zero available if metadata are exhausted
btrfs: preallocate path for snapshot creation at ioctl time
btrfs: allocate root item at snapshot ioctl time
btrfs: do an allocation earlier during snapshot creation
btrfs: use smaller type for btrfs_path locks
btrfs: use smaller type for btrfs_path lowest_level
btrfs: use smaller type for btrfs_path reada
btrfs: cleanup, use enum values for btrfs_path reada
btrfs: constify static arrays
btrfs: constify remaining structs with function pointers
btrfs tests: replace whole ops structure for free space tests
btrfs: use list_for_each_entry* in backref.c
btrfs: use list_for_each_entry_safe in free-space-cache.c
btrfs: use list_for_each_entry* in check-integrity.c
Btrfs: use linux/sizes.h to represent constants
btrfs: cleanup, remove stray return statements
btrfs: zero out delayed node upon allocation
btrfs: pass proper enum type to start_transaction()
...
Prepare khugepaged to see compound pages mapped with pte. For now we
won't collapse the pmd table with such pte.
khugepaged is subject for future rework wrt new refcounting.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch series makes swapin readahead up to a certain number to gain
more thp performance and adds tracepoint for khugepaged_scan_pmd,
collapse_huge_page, __collapse_huge_page_isolate.
This patch series was written to deal with programs that access most,
but not all, of their memory after they get swapped out. Currently
these programs do not get their memory collapsed into THPs after the
system swapped their memory out, while they would get THPs before
swapping happened.
This patch series was tested with a test program, it allocates 400MB of
memory, writes to it, and then sleeps. I force the system to swap out
all. Afterwards, the test program touches the area by writing and
leaves a piece of it without writing. This shows how much swap in
readahead made by the patch.
Test results:
After swapped out
-------------------------------------------------------------------
| Anonymous | AnonHugePages | Swap | Fraction |
-------------------------------------------------------------------
With patch | 90076 kB | 88064 kB | 309928 kB | %99 |
-------------------------------------------------------------------
Without patch | 194068 kB | 192512 kB | 205936 kB | %99 |
-------------------------------------------------------------------
After swapped in
-------------------------------------------------------------------
| Anonymous | AnonHugePages | Swap | Fraction |
-------------------------------------------------------------------
With patch | 201408 kB | 198656 kB | 198596 kB | %98 |
-------------------------------------------------------------------
Without patch | 292624 kB | 192512 kB | 107380 kB | %65 |
-------------------------------------------------------------------
This patch (of 3):
Using static tracepoints, data of functions is recorded. It is good to
automatize debugging without doing a lot of changes in the source code.
This patch adds tracepoint for khugepaged_scan_pmd, collapse_huge_page
and __collapse_huge_page_isolate.
[dan.carpenter@oracle.com: add a missing tab]
Signed-off-by: Ebru Akagunduz <ebru.akagunduz@gmail.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move node_id zone_idx shrink flags into trace function, so thay we don't
need caculate these args if the trace is disabled, and will make this
function have less arguments.
Signed-off-by: yalin wang <yalin.wang2010@gmail.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cma allocation should be guranteeded to succeed. But sometimes it can
fail in the current implementation. To track down the problem, we need
to know which page is problematic and this new tracepoint will report
it.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move trace_reclaim_flags() into trace function, so that we don't need
caculate these flags if the trace is disabled.
Signed-off-by: yalin wang <yalin.wang2010@gmail.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull f2fs updates from Jaegeuk Kim:
"This series adds two ioctls to control cached data and fragmented
files. Most of the rest fixes missing error cases and bugs that we
have not covered so far. Summary:
Enhancements:
- support an ioctl to execute online file defragmentation
- support an ioctl to flush cached data
- speed up shrinking of extent_cache entries
- handle broken superblock
- refector dirty inode management infra
- revisit f2fs_map_blocks to handle more cases
- reduce global lock coverage
- add detecting user's idle time
Major bug fixes:
- fix data race condition on cached nat entries
- fix error cases of volatile and atomic writes"
* tag 'for-f2fs-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (87 commits)
f2fs: should unset atomic flag after successful commit
f2fs: fix wrong memory condition check
f2fs: monitor the number of background checkpoint
f2fs: detect idle time depending on user behavior
f2fs: introduce time and interval facility
f2fs: skip releasing nodes in chindless extent tree
f2fs: use atomic type for node count in extent tree
f2fs: recognize encrypted data in f2fs_fiemap
f2fs: clean up f2fs_balance_fs
f2fs: remove redundant calls
f2fs: avoid unnecessary f2fs_balance_fs calls
f2fs: check the page status filled from disk
f2fs: introduce __get_node_page to reuse common code
f2fs: check node id earily when readaheading node page
f2fs: read isize while holding i_mutex in fiemap
Revert "f2fs: check the node block address of newly allocated nid"
f2fs: cover more area with nat_tree_lock
f2fs: introduce max_file_blocks in sbi
f2fs crypto: check CONFIG_F2FS_FS_XATTR for encrypted symlink
f2fs: introduce zombie list for fast shrinking extent trees
...
minor fixes.
Here's what else is new:
o A new TRACE_EVENT_FN_COND macro, combining both _FN and _COND for
those that want both.
o New selftest to test the instance create and delete
o Better debug output when ftrace fails
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWlU8tAAoJEKKk/i67LK/8JckH/2XIhjwMunm35uCg1308sDqy
d44G3+p0pm8ztjBf8iD8wH2nP3m7z+nC8JBmSPIUgAHsKOYHWsBy2A/36OVWv5lK
1hVXvBwOuZXnyWXr7bC2RO9S9f9acSFaabZXWDi1BCJRJSgEcknz32V7ZAL4jOCO
SfBWBNrWJfUsURbfbElfVxPLArvyUg9Bb5dW5B+QFf6PuoJaORYzNLYXHlbsq++T
WlrlnD+mFZ/DKFZ/gl3FMSGMPaGimw09/3eqMzv/tLQobp6PbCWlJTwjUoxJ/9dO
XOY4sWUrUUZilU8qCk0i0ZSEumWmE+SWS3eq+Ef18B/5haIj/LkoM4UQD3h2Rc4=
=FDR+
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"Not much new with tracing for this release. Mostly just clean ups and
minor fixes.
Here's what else is new:
- A new TRACE_EVENT_FN_COND macro, combining both _FN and _COND for
those that want both.
- New selftest to test the instance create and delete
- Better debug output when ftrace fails"
* tag 'trace-v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (24 commits)
ftrace: Fix the race between ftrace and insmod
ftrace: Add infrastructure for delayed enabling of module functions
x86: ftrace: Fix the comments for ftrace_modify_code_direct()
tracing: Fix comment to use tracing_on over tracing_enable
metag: ftrace: Fix the comments for ftrace_modify_code
sh: ftrace: Fix the comments for ftrace_modify_code()
ia64: ftrace: Fix the comments for ftrace_modify_code()
ftrace: Clean up ftrace_module_init() code
ftrace: Join functions ftrace_module_init() and ftrace_init_module()
tracing: Introduce TRACE_EVENT_FN_COND macro
tracing: Use seq_buf_used() in seq_buf_to_user() instead of len
bpf: Constify bpf_verifier_ops structure
ftrace: Have ftrace_ops_get_func() handle RCU and PER_CPU flags too
ftrace: Remove use of control list and ops
ftrace: Fix output of enabled_functions for showing tramp
ftrace: Fix a typo in comment
ftrace: Show all tramps registered to a record on ftrace_bug()
ftrace: Add variable ftrace_expected for archs to show expected code
ftrace: Add new type to distinguish what kind of ftrace_bug()
tracing: Update cond flag when enabling or disabling a trigger
...
Pull networking updates from Davic Miller:
1) Support busy polling generically, for all NAPI drivers. From Eric
Dumazet.
2) Add byte/packet counter support to nft_ct, from Floriani Westphal.
3) Add RSS/XPS support to mvneta driver, from Gregory Clement.
4) Implement IPV6_HDRINCL socket option for raw sockets, from Hannes
Frederic Sowa.
5) Add support for T6 adapter to cxgb4 driver, from Hariprasad Shenai.
6) Add support for VLAN device bridging to mlxsw switch driver, from
Ido Schimmel.
7) Add driver for Netronome NFP4000/NFP6000, from Jakub Kicinski.
8) Provide hwmon interface to mlxsw switch driver, from Jiri Pirko.
9) Reorganize wireless drivers into per-vendor directories just like we
do for ethernet drivers. From Kalle Valo.
10) Provide a way for administrators "destroy" connected sockets via the
SOCK_DESTROY socket netlink diag operation. From Lorenzo Colitti.
11) Add support to add/remove multicast routes via netlink, from Nikolay
Aleksandrov.
12) Make TCP keepalive settings per-namespace, from Nikolay Borisov.
13) Add forwarding and packet duplication facilities to nf_tables, from
Pablo Neira Ayuso.
14) Dead route support in MPLS, from Roopa Prabhu.
15) TSO support for thunderx chips, from Sunil Goutham.
16) Add driver for IBM's System i/p VNIC protocol, from Thomas Falcon.
17) Rationalize, consolidate, and more completely document the checksum
offloading facilities in the networking stack. From Tom Herbert.
18) Support aborting an ongoing scan in mac80211/cfg80211, from
Vidyullatha Kanchanapally.
19) Use per-bucket spinlock for bpf hash facility, from Tom Leiming.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1375 commits)
net: bnxt: always return values from _bnxt_get_max_rings
net: bpf: reject invalid shifts
phonet: properly unshare skbs in phonet_rcv()
dwc_eth_qos: Fix dma address for multi-fragment skbs
phy: remove an unneeded condition
mdio: remove an unneed condition
mdio_bus: NULL dereference on allocation error
net: Fix typo in netdev_intersect_features
net: freescale: mac-fec: Fix build error from phy_device API change
net: freescale: ucc_geth: Fix build error from phy_device API change
bonding: Prevent IPv6 link local address on enslaved devices
IB/mlx5: Add flow steering support
net/mlx5_core: Export flow steering API
net/mlx5_core: Make ipv4/ipv6 location more clear
net/mlx5_core: Enable flow steering support for the IB driver
net/mlx5_core: Initialize namespaces only when supported by device
net/mlx5_core: Set priority attributes
net/mlx5_core: Connect flow tables
net/mlx5_core: Introduce modify flow table command
net/mlx5_core: Managing root flow table
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWkwhdAAoJEAAOaEEZVoIVgpUQAMhB2+ryZtlJy4s7lkfI3Wwi
ni7lAuJ6xXB0FIA8wqNzz6fVDY0pbsfwR45OS11fh+hU2FnM8REHCDPC47E8MQYx
ft0Kfp7Z0tLAPni7XTVd/gFy8zTDGOKXBlu44PNaVEdtPJzIXwVzm2QkT7F3ExOz
mkXSCta7lFemBQ0DhbafiWbfQ8yav1HFGZG7XN06A76y8ZET+Uu1oyiPPI4jvHlO
vHrxpwia2ROnQHeG0pLR7KvOmN3ZSTJZuH6LiMZH0QFqyocYzmhR9rQ/hrxBg0rU
IDzcMjP0ybU9Fu/o7sDShnkTawRuVLt0zasfdlYtGVCTYBx8f7WqkJnLTCwWYVDG
MLQM7y8xWHM1f7uLhgT8WHg8O/e5saVUQ/djBqPI/ubGG1/LHDxyxH/GPVbeKa66
G8jChyPmIdxdsjIapzefOjnTIi2vhZqv9I1gSKCj+x554GahoYQe7l0YbNnZGmNS
O12QQ7dUpkzgDQEiTh73S3Ay2Ng95K2DztuHs6NXFdbiwpFMZqVATLXBEOYryBx/
n487ZqrsTV7T3jH/ekxth1+j0Hpmigj8FNy21/nZ0Nr0OaTJFwsLEdN4Vi7LIM+H
jBMEBk5dGIHODMvB/8NCud0eWzB671iLgVto7or/rT1YmaFapl/KR7FEWNv19sLN
tshSViTosLGffQMpObOk
=wJUS
-----END PGP SIGNATURE-----
Merge tag 'locks-v4.5-1' of git://git.samba.org/jlayton/linux
Pull file locking updates from Jeff Layton:
"File locking related changes for v4.5 (pile #1)
Highlights:
- new Kconfig option to allow disabling mandatory locking (which is
racy anyway)
- new tracepoints for setlk and close codepaths
- fix for a long-standing bug in code that handles races between
setting a POSIX lock and close()"
* tag 'locks-v4.5-1' of git://git.samba.org/jlayton/linux:
locks: rename __posix_lock_file to posix_lock_inode
locks: prink more detail when there are leaked locks
locks: pass inode pointer to locks_free_lock_context
locks: sprinkle some tracepoints around the file locking code
locks: don't check for race with close when setting OFD lock
locks: fix unlock when fcntl_setlk races with a close
fs: make locks.c explicitly non-modular
locks: use list_first_entry_or_null()
locks: Don't allow mounts in user namespaces to enable mandatory locking
locks: Allow disabling mandatory locking at compile time
Add some tracepoints around the POSIX locking code. These were useful
when tracking down problems when handling the race between setlk and
close.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
TRACE_EVENT_FN can't be used in some circumstances
like invoking trace functions from offlined CPU due
to RCU usage.
This patch adds the TRACE_EVENT_FN_COND macro
to make such trace points conditional.
Link: http://lkml.kernel.org/r/1450124286-4822-1-git-send-email-kda@linux-powerpc.org
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Move timestamp from struct vb2_v4l2_buffer to struct vb2_buffer
for common use, and change its type to u64 in order to handling
y2038 problem. This patch also includes all device drivers' changes related to
this restructuring.
Signed-off-by: Junghak Sung <jh1009.sung@samsung.com>
Signed-off-by: Geunyoung Kim <nenggun.kim@samsung.com>
Acked-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Acked-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Hans Verkuil <hansverk@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
The on-disk format for the free space tree is straightforward. Each
block group is represented in the free space tree by a free space info
item that stores accounting information: whether the free space for this
block group is stored as bitmaps or extents and how many extents of free
space exist for this block group (regardless of which format is being
used in the tree). Extents are (start, FREE_SPACE_EXTENT, length) keys
with no corresponding item, and bitmaps instead have the
FREE_SPACE_BITMAP type and have a bitmap item attached, which is just an
array of bytes.
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
The new name is irq_poll as iopoll is already taken. Better suggestions
welcome.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
DAX page fault path needs to get blocks that are pre-zeroed to avoid
races when two concurrent page faults happen in the same block of a
file. Implement support for this in ext4_map_blocks().
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When dioread_nolock mode is enabled, we grab i_data_sem in
ext4_ext_direct_IO() and therefore we need to instruct _ext4_get_block()
not to grab i_data_sem again using EXT4_GET_BLOCKS_NO_LOCK. However
holding i_data_sem over overwrite direct IO isn't needed these days. We
have exclusion against truncate / hole punching because we increase
i_dio_count under i_mutex in ext4_ext_direct_IO() so once
ext4_file_write_iter() verifies blocks are allocated & written, they are
guaranteed to stay so during the whole direct IO even after we drop
i_mutex.
So we can just remove this locking abuse and the no longer necessary
EXT4_GET_BLOCKS_NO_LOCK flag.
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Add tracepoint to show fib6 table lookups and result.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull thermal updates from Zhang Rui:
- Implement generic devfreq cooling mechanism through frequency
reduction for devices using devfreq. From Ørjan Eide and Javi
Merino.
- Introduce OMAP3 support on TI SoC thermal driver. From Pavel Mack
and Eduardo Valentin.
- A bounch of small fixes on devfreq_cooling, Exynos, IMX, Armada, and
Rockchip thermal drivers.
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (24 commits)
thermal: exynos: Directly return 0 instead of using local ret variable
thermal: exynos: Remove unneeded semicolon
thermal: exynos: Use IS_ERR() because regulator cannot be NULL
thermal: exynos: Fix first temperature read after registering sensor
thermal: exynos: Fix unbalanced regulator disable on probe failure
devfreq_cooling: return on allocation failure
thermal: rockchip: support the sleep pinctrl state to avoid glitches in s2r
dt-bindings: rockchip-thermal: Add the pinctrl states in this document
thermal: devfreq_cooling: Make power a u64
thermal: devfreq_cooling: use a thermal_cooling_device for register and unregister
thermal: underflow bug in imx_set_trip_temp()
thermal: armada: Fix possible overflow in the Armada 380 thermal sensor formula
thermal: imx: register irq handler later in probe
thermal: rockhip: fix setting thermal shutdown polarity
thermal: rockchip: fix handling of invalid readings
devfreq_cooling: add trace information
thermal: Add devfreq cooling
PM / OPP: get the voltage for all OPPs
tools/thermal: tmon: use pkg-config also for CFLAGS
linux/thermal.h: rename KELVIN_TO_CELSIUS to DECI_KELVIN_TO_CELSIUS
...
Merge second patch-bomb from Andrew Morton:
- most of the rest of MM
- procfs
- lib/ updates
- printk updates
- bitops infrastructure tweaks
- checkpatch updates
- nilfs2 update
- signals
- various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
dma-debug, dma-mapping, ...
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
ipc,msg: drop dst nil validation in copy_msg
include/linux/zutil.h: fix usage example of zlib_adler32()
panic: release stale console lock to always get the logbuf printed out
dma-debug: check nents in dma_sync_sg*
dma-mapping: tidy up dma_parms default handling
pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
kexec: use file name as the output message prefix
fs, seqfile: always allow oom killer
seq_file: reuse string_escape_str()
fs/seq_file: use seq_* helpers in seq_hex_dump()
coredump: change zap_threads() and zap_process() to use for_each_thread()
coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
signals: kill block_all_signals() and unblock_all_signals()
nilfs2: fix gcc uninitialized-variable warnings in powerpc build
nilfs2: fix gcc unused-but-set-variable warnings
MAINTAINERS: nilfs2: add header file for tracing
nilfs2: add tracepoints for analyzing reading and writing metadata files
...
This patch adds tracepoints which would be useful for analyzing segment
usage from a perspective of high level sufile manipulation (check, alloc,
free). sufile is an important in-place updated metadata file, so
analyzing the behavior would be useful for performance turning.
example of usage (a case of allocation):
$ sudo bin/tpoint nilfs2:nilfs2_segment_usage_allocated
Tracing nilfs2:nilfs2_segment_usage_allocated. Ctrl-C to end.
segctord-17800 [002] ...1 10671.867294: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 2
segctord-17800 [002] ...1 10675.073477: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 3
Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benixon Dhas <benixon.dhas@wdc.com>
Cc: TK Kato <TK.Kato@wdc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds a tracepoint for transaction events of nilfs. With the
tracepoint, these events can be tracked: begin, abort, commit, trylock,
lock, and unlock. Basically, these events have corresponding functions
e.g. begin event corresponds nilfs_transaction_begin(). The unlock event
is an exception. It corresponds to the iteration in
nilfs_transaction_lock().
Only one tracepoint is introcued: nilfs2_transaction_transition. The
above events are distinguished with newly introduced enum. With this
tracepoint, we can analyse a critical section of segment constructoin.
Sample output by tpoint of perf-tools:
cp-4457 [000] ...1 63.266220: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 1 flags = 9 state = BEGIN
cp-4457 [000] ...1 63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
cp-4457 [000] ...1 63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
segctord-4371 [001] ...1 68.261196: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK
segctord-4371 [001] ...1 68.261280: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = LOCK
segctord-4371 [001] ...1 68.261877: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 1 flags = 10 state = BEGIN
segctord-4371 [001] ...1 68.262116: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = COMMIT
segctord-4371 [001] ...1 68.265032: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = UNLOCK
segctord-4371 [001] ...1 132.376847: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK
This patch also does trivial cleaning of comma usage in collection stage
transition event for consistent coding style.
Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds a tracepoint for tracking stage transition of block
collection in segment construction. With the tracepoint, we can analysis
the behavior of segment construction in depth. It would be useful for
bottleneck detection and debugging, etc.
The tracepoint is created with the standard trace API of linux (like ext3,
ext4, f2fs and btrfs). So we can analysis with existing tools easily. Of
course, more detailed analysis will be possible if we can create nilfs
specific analysis tools.
Below is an example of event dump with Brendan Gregg's perf-tools
(https://github.com/brendangregg/perf-tools). Time consumption between
each stage can be obtained.
$ sudo bin/tpoint nilfs2:nilfs2_collection_stage_transition
Tracing nilfs2:nilfs2_collection_stage_transition. Ctrl-C to end.
segctord-14875 [003] ...1 28311.067794: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_INIT
segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_GC
segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_FILE
segctord-14875 [003] ...1 28311.068486: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_IFILE
segctord-14875 [003] ...1 28311.068540: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_CPFILE
segctord-14875 [003] ...1 28311.068561: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SUFILE
segctord-14875 [003] ...1 28311.068565: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DAT
segctord-14875 [003] ...1 28311.068573: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SR
segctord-14875 [003] ...1 28311.068574: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DONE
For capturing transition correctly, this patch adds wrappers for the
member scnt of nilfs_cstage. With this change, every transition of the
stage can produce trace event in a correct manner.
Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".
Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.
This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.
This patch then converts a number of sites
o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.
o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.
o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.
o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.
The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.
The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull btrfs updates from Chris Mason:
"We have a lot of subvolume quota improvements in here, along with big
piles of cleanups from Dave Sterba and Anand Jain and others.
Josef pitched in a batch of allocator fixes based on production use
here at FB. We found that mount -o ssd_spread greatly improved our
performance on hardware raid5/6, but it exposed some CPU bottlenecks
in the allocator. These patches make a huge difference"
* 'for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (100 commits)
Btrfs: fix hole punching when using the no-holes feature
Btrfs: find_free_extent: Do not erroneously skip LOOP_CACHING_WAIT state
btrfs: Fix a data space underflow warning
btrfs: qgroup: Fix a rebase bug which will cause qgroup double free
btrfs: qgroup: Fix a race in delayed_ref which leads to abort trans
btrfs: clear PF_NOFREEZE in cleaner_kthread()
btrfs: qgroup: Don't copy extent buffer to do qgroup rescan
btrfs: add balance filters limits, stripes and usage to supported mask
btrfs: extend balance filter usage to take minimum and maximum
btrfs: add balance filter for stripes
btrfs: extend balance filter limit to take minimum and maximum
btrfs: fix use after free iterating extrefs
btrfs: check unsupported filters in balance arguments
Btrfs: fix regression running delayed references when using qgroups
Btrfs: fix regression when running delayed references
Btrfs: don't do extra bitmap search in one bit case
Btrfs: keep track of largest extent in bitmaps
Btrfs: don't keep trying to build clusters if we are fragmented
Btrfs: cut down on loops through the allocator
Btrfs: don't continue setting up space cache when enospc
...
stable tags to them. I searched through my INBOX just as the merge window
opened and found lots of patches to pull. I ran them through all my tests
and they were in linux-next for a few days.
Features added this release:
----------------------------
o Module globbing. You can now filter function tracing to several
modules. # echo '*:mod:*snd*' > set_ftrace_filter (Dmitry Safonov)
o Tracer specific options are now visible even when the tracer is not
active. It was rather annoying that you can only see and modify tracer
options after enabling the tracer. Now they are in the options/ directory
even when the tracer is not active. Although they are still only visible
when the tracer is active in the trace_options file.
o Trace options are now per instance (although some of the tracer specific
options are global)
o New tracefs file: set_event_pid. If any pid is added to this file, then
all events in the instance will filter out events that are not part of
this pid. sched_switch and sched_wakeup events handle next and the wakee
pids.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWPLQ5AAoJEKKk/i67LK/8CTYIAI1u8DE5QCzv3J0p54jVpNVR
J5FqEU3eXIzd6FS4JXD4nxCeMpUZAy21YnhlZpsnrbJJM5bc9bUsBCwiKKM+MuSZ
ztmy2sgYKkO0h/KUdhNgYJrzis3/Ojquyx9iAqK5ST/Fr+nKYx81akFKjNK53iur
RJRut45sSa8rv11LaL8sgJ6hAWQTc+YkybUdZ5xaMdJmZ6A61T7Y6VzTjbUexuvL
hntCfTjYLtVd8dbfknAnf3B7n/VOO3IFF85wr7ciYR5oEVfPrF8tHmJBlhHExPpX
kaXAiDDRY/UTg/5DQqnp4zmxJoR5BQ2l4pT5PwiLcnwhcphIDNYS8EYUmOYAWjU=
=TjOE
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracking updates from Steven Rostedt:
"Most of the changes are clean ups and small fixes. Some of them have
stable tags to them. I searched through my INBOX just as the merge
window opened and found lots of patches to pull. I ran them through
all my tests and they were in linux-next for a few days.
Features added this release:
----------------------------
- Module globbing. You can now filter function tracing to several
modules. # echo '*:mod:*snd*' > set_ftrace_filter (Dmitry Safonov)
- Tracer specific options are now visible even when the tracer is not
active. It was rather annoying that you can only see and modify
tracer options after enabling the tracer. Now they are in the
options/ directory even when the tracer is not active. Although
they are still only visible when the tracer is active in the
trace_options file.
- Trace options are now per instance (although some of the tracer
specific options are global)
- New tracefs file: set_event_pid. If any pid is added to this file,
then all events in the instance will filter out events that are not
part of this pid. sched_switch and sched_wakeup events handle next
and the wakee pids"
* tag 'trace-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (68 commits)
tracefs: Fix refcount imbalance in start_creating()
tracing: Put back comma for empty fields in boot string parsing
tracing: Apply tracer specific options from kernel command line.
tracing: Add some documentation about set_event_pid
ring_buffer: Remove unneeded smp_wmb() before wakeup of reader benchmark
tracing: Allow dumping traces without tracking trace started cpus
ring_buffer: Fix more races when terminating the producer in the benchmark
ring_buffer: Do no not complete benchmark reader too early
tracing: Remove redundant TP_ARGS redefining
tracing: Rename max_stack_lock to stack_trace_max_lock
tracing: Allow arch-specific stack tracer
recordmcount: arm64: Replace the ignored mcount call into nop
recordmcount: Fix endianness handling bug for nop_mcount
tracepoints: Fix documentation of RCU lockdep checks
tracing: ftrace_event_is_function() can return boolean
tracing: is_legal_op() can return boolean
ring-buffer: rb_event_is_commit() can return boolean
ring-buffer: rb_per_cpu_empty() can return boolean
ring_buffer: ring_buffer_empty{cpu}() can return boolean
ring-buffer: rb_is_reader_page() can return boolean
...
Merge patch-bomb from Andrew Morton:
- inotify tweaks
- some ocfs2 updates (many more are awaiting review)
- various misc bits
- kernel/watchdog.c updates
- Some of mm. I have a huge number of MM patches this time and quite a
lot of it is quite difficult and much will be held over to next time.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits)
selftests: vm: add tests for lock on fault
mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage
mm: introduce VM_LOCKONFAULT
mm: mlock: add new mlock system call
mm: mlock: refactor mlock, munlock, and munlockall code
kasan: always taint kernel on report
mm, slub, kasan: enable user tracking by default with KASAN=y
kasan: use IS_ALIGNED in memory_is_poisoned_8()
kasan: Fix a type conversion error
lib: test_kasan: add some testcases
kasan: update reference to kasan prototype repo
kasan: move KASAN_SANITIZE in arch/x86/boot/Makefile
kasan: various fixes in documentation
kasan: update log messages
kasan: accurately determine the type of the bad access
kasan: update reported bug types for kernel memory accesses
kasan: update reported bug types for not user nor kernel memory accesses
mm/kasan: prevent deadlock in kasan reporting
mm/kasan: don't use kasan shadow pointer in generic functions
mm/kasan: MODULE_VADDR is not available on all archs
...
Compaction returns prematurely with COMPACT_PARTIAL when contended or has
fatal signal pending. This is ok for the callers, but might be misleading
in the traces, as the usual reason to return COMPACT_PARTIAL is that we
think the allocation should succeed. After this patch we distinguish the
premature ending condition in the mm_compaction_finished and
mm_compaction_end tracepoints.
The contended status covers the following reasons:
- lock contention or need_resched() detected in async compaction
- fatal signal pending
- too many pages isolated in the zone (only for async compaction)
Further distinguishing the exact reason seems unnecessary for now.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some compaction tracepoints use zone->name to print which zone is being
compacted. This works for in-kernel printing, but not userspace trace
printing of raw captured trace such as via trace-cmd report.
This patch uses zone_idx() instead of zone->name as the raw value, and
when printing, converts the zone_type to string using the appropriate EM()
macros and some ugly tricks to overcome the problem that half the values
depend on CONFIG_ options and one does not simply use #ifdef inside of
#define.
trace-cmd output before:
transhuge-stres-4235 [000] 453.149280: mm_compaction_finished: node=0
zone=ffffffff81815d7a order=9 ret=partial
after:
transhuge-stres-4235 [000] 453.149280: mm_compaction_finished: node=0
zone=Normal order=9 ret=partial
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some compaction tracepoints convert the integer return values to strings
using the compaction_status_string array. This works for in-kernel
printing, but not userspace trace printing of raw captured trace such as
via trace-cmd report.
This patch converts the private array to appropriate tracepoint macros
that result in proper userspace support.
trace-cmd output before:
transhuge-stres-4235 [000] 453.149280: mm_compaction_finished: node=0
zone=ffffffff81815d7a order=9 ret=
after:
transhuge-stres-4235 [000] 453.149280: mm_compaction_finished: node=0
zone=ffffffff81815d7a order=9 ret=partial
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWOJZoAAoJEAhfPr2O5OEVmjYP/0RnfVjvRDtx0RxHDmvsowlt
sHyrm5C7VME06b4J3O9qpC7PbMCAalvSkYp+bbxF+b//9EfwjvRER+NR8ebgn1Mw
1NQKMtCusWRf4RzI+9osB3pFYgg/cYG2nKl0QVCXHL6xZszEQ9dBrFHEEHfVe8db
JU1fGuF6TQNJdYgsVNMN9rStRB0vj3urfehLjB+E138VzDAnzPNA7I7Z4xsWWJw3
V+J7CWLN1xW9IT59LXtRjbD/aCF9KrAmGigS0nCtDz7XVRPh+ZoXQLD073uLP2L3
uYxOmadvc5+5iVwUP4zSsJ6+vw9kLr6Q30sNtLP7V+VkCSlCQNTOePLavB5T8qVY
M2qALvwWjujtoSEjZHr7TqrlEpio98OSy1dNJ8GmuOb3UUAKocNN8sGG8h2nR/BR
wv2OL/XPNcyB2LV6HeHZz9JiXB+rTbyXEN8CP2cD8ruGhNM5haak3d2l4FYszRXr
/a/5JlYAcNrJii6PAXHyBtm6l0C4GPiAk3HQhII2fTErRr8fpln/G5AfaKjun5H8
1Rbxx5JP+5qSHozmz2hNb4w92qqtPugj7qqu7sHCbwKLhh2Aspwo12GkN9acOIsI
Kn1U/DWMRrkyptJAxBihsrEX3BXeQdNOPydKfMYEM7qE8EfTDM0uaIFQ+KVWCmNA
Qh2TXAp6CZiuBvaqKzyl
=sR0p
-----END PGP SIGNATURE-----
Merge tag 'media/v4.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
"Media updates, including:
- Lots of improvements at the kABI documentation
- Split of Videobuf2 into a common part and a V4L2 specific one
- Split of the VB2 tracing events into a separate header file
- s5p-mfc got support for Exynos 5433
- v4l2 fixes for 64-bits alignment when running 32 bits userspace
on ARM
- Added support for SDR radio transmitter at core, vivid and hackrf
drivers
- Some y2038 fixups
- Some improvements at V4L2 colorspace support
- saa7164 converted to use the V4L2 core control framework
- several new boards additions, cleanups and fixups
PS: There are two patches for scripts/kernel-doc that are needed by
the documentation patches on Media. Jon is OK on merging those via
my tree"
* tag 'media/v4.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (146 commits)
[media] c8sectpfe: Remove select on CONFIG_FW_LOADER_USER_HELPER_FALLBACK
[media] DocBook media: update copyright/version numbers
[media] ivtv: Convert to get_user_pages_unlocked()
[media] media/v4l2-ctrls: fix setting autocluster to manual with VIDIOC_S_CTRL
[media] DocBook media: Fix a typo in encoder cmd
[media] DocBook: add SDR specific info to G_MODULATOR / S_MODULATOR
[media] DocBook: add SDR specific info to G_TUNER / S_TUNER
[media] hackrf: do not set human readable name for formats
[media] hackrf: add support for transmitter
[media] hackrf: switch to single function which configures everything
[media] hackrf: add control for RF amplifier
[media] DocBook: add modulator type field
[media] v4l: add type field to v4l2_modulator struct
[media] DocBook: document SDR transmitter
[media] v4l2: add support for SDR transmitter
[media] DocBook: document tuner RF gain control
[media] v4l2: add RF gain control
[media] v4l2: rename V4L2_TUNER_ADC to V4L2_TUNER_SDR
[media] media/vivid-osd: fix info leak in ioctl
[media] media: videobuf2: Move v4l2-specific stuff to videobuf2-v4l2
...
Pull f2fs updates from Jaegeuk Kim:
"Most part of the patches include enhancing the stability and
performance of in-memory extent caches feature.
In addition, it introduces several new features and configurable
points:
- F2FS_GOING_DOWN_METAFLUSH ioctl to test power failures
- F2FS_IOC_WRITE_CHECKPOINT ioctl to trigger checkpoint by users
- background_gc=sync mount option to do gc synchronously
- periodic checkpoints
- sysfs entry to control readahead blocks for free nids
And the following bug fixes have been merged.
- fix SSA corruption by collapse/insert_range
- correct a couple of gc behaviors
- fix the results of f2fs_map_blocks
- fix error case handling of volatile/atomic writes"
* tag 'for-f2fs-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (54 commits)
f2fs: fix to skip shrinking extent nodes
f2fs: fix error path of ->symlink
f2fs: fix to clear GCed flag for atomic written page
f2fs: don't need to submit bio on error case
f2fs: fix leakage of inmemory atomic pages
f2fs: refactor __find_rev_next_{zero}_bit
f2fs: support fiemap for inline_data
f2fs: flush dirty data for bmap
f2fs: relocate the tracepoint for background_gc
f2fs crypto: fix racing of accessing encrypted page among
f2fs: export ra_nid_pages to sysfs
f2fs: readahead for free nids building
f2fs: support lower priority asynchronous readahead in ra_meta_pages
f2fs: don't tag REQ_META for temporary non-meta pages
f2fs: add a tracepoint for f2fs_read_data_pages
f2fs: set GFP_NOFS for grab_cache_page
f2fs: fix SSA updates resulting in corruption
Revert "f2fs: do not skip dentry block writes"
f2fs: add F2FS_GOING_DOWN_METAFLUSH to test power-failure
f2fs: merge meta writes as many possible
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWNsKlAAoJEAAOaEEZVoIVKNMP+QHb96HMNWnMlBE9jwPbBK/2
yM80sa6wRcbCF519sRFbmOheet4bgNSHixegtUez5kyqyI7Hr0tsRYvIo5/amAWX
EIh03fZoM+Bgm+dblYivorSrPmmx2UQ9RG6pUbcOPtxdCpQ79tfzVyYVykG5wcb5
NLSibG9s5USutOXPTatxDqS6P2QwvvWXHR5oX1mkU2W7nQXfHOdQKSuk5CqUeIWx
JSGIa+plS9fath1Ndu4pJ7atvU8cR0t+VeOqPmGoqqIDyGVbo45XgXZmk0xCxEs9
XsVSbdGBMAtA63xlZHFROADFNXIosay2zA7mdG0i3IrLRMQr/okQhTqBrFMKmj0m
cDMDNOs4j4M8JJPkwrJQ3S/1Tnl+zyAuKKTJwgvVnd1tcyTZjs3g77I9e84pSTsp
chL4FmfeR7dhk+YJgcnbzvnnP7tBbQcV0ET/ILVsDU7bNDujWlcDzYkbbWx70WLa
KobjmsW/OAGaQugIMA1oGLTexT1u9HtDYOw8JVNBKwlrnPKyFVb8X88gx2Laf34L
Qa04TdrFseuxbnBGifLyQTsLxgF9QalUo+51J0I4a7G3WX0U2Zuk+ZTbHc6ChhdW
d0oL2SEyToscRADRL0/u2CUR1dEXkdDXi3pxgvDs5PTJVU+lIy4czp/dI5JrjKUA
L7O27Kstgoe2GctHn6FI
=OYAZ
-----END PGP SIGNATURE-----
Merge tag 'locks-v4.4-1' of git://git.samba.org/jlayton/linux
Pull file locking updates from Jeff Layton:
"The largest series of changes is from Ben who offered up a set to add
a new helper function for setting locks based on the type set in
fl_flags. Dmitry also send in a fix for a potential race that he
found with KTSAN"
* tag 'locks-v4.4-1' of git://git.samba.org/jlayton/linux:
locks: cleanup posix_lock_inode_wait and flock_lock_inode_wait
Move locks API users to locks_lock_inode_wait()
locks: introduce locks_lock_inode_wait()
locks: Use more file_inode and fix a comment
fs: fix data races on inode->i_flctx
locks: change tracepoint for generic_add_lease
Tracing is useful for debugging and performance tuning. Add similar
traces to what's present in the cpu cooling device.
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Now each qgroup reserve for data will has its ftrace event for better
debugging.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
Add a new options to trace Kconfig, CONFIG_TRACING_EVENTS_GPIO, that is
used for enabling/disabling compilation of gpio function trace events.
Link: http://lkml.kernel.org/r/1438432079-11704-4-git-send-email-tal.shorer@gmail.com
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Tal Shorer <tal.shorer@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Allow a trace events header file to disable compilation of its
trace events by defining the preprocessor macro NOTRACE.
This could be done, for example, according to a Kconfig option.
Link: http://lkml.kernel.org/r/1438432079-11704-3-git-send-email-tal.shorer@gmail.com
Signed-off-by: Tal Shorer <tal.shorer@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Prepare to divide videobuf2
- Separate vb2 trace events from v4l2 trace event.
- Make wrapper functions that will move to v4l2-side.
- Make vb2_core_* functions that will remain in core-side.
- Add a callback function table for buffer operation which makes vb2-core
to be able to invoke a v4l2-side functions.
- Rename internal functions as vb2_*.
Signed-off-by: Junghak Sung <jh1009.sung@samsung.com>
Signed-off-by: Geunyoung Kim <nenggun.kim@samsung.com>
Acked-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Acked-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
This patch adds a tracepoint for f2fs_read_data_pages to trace when pages
are readahead by VFS.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch introduces a tracepoint to monitor background gc behaviors.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Rename trace_f2fs_update_extent_tree to trace_f2fs_update_extent_tree_range,
then expand and enable it to trace in batches extent info updates.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
__trace_sched_switch_state() is the last remaining PREEMPT_ACTIVE
user, move trace_sched_switch() from prepare_task_switch() to
__schedule() and propagate the @preempt argument.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Remove v4l2 stuff - v4l2_buf, v4l2_plane - from struct vb2_buffer.
Add new member variables - bytesused, length, offset, userptr, fd,
data_offset - to struct vb2_plane in order to cover all information
of v4l2_plane.
struct vb2_plane {
<snip>
unsigned int bytesused;
unsigned int length;
union {
unsigned int offset;
unsigned long userptr;
int fd;
} m;
unsigned int data_offset;
}
Replace v4l2_buf with new member variables - index, type, memory - which
are common fields for buffer management.
struct vb2_buffer {
<snip>
unsigned int index;
unsigned int type;
unsigned int memory;
unsigned int num_planes;
struct vb2_plane planes[VIDEO_MAX_PLANES];
<snip>
};
v4l2 specific fields - flags, field, timestamp, timecode,
sequence - are moved to vb2_v4l2_buffer in videobuf2-v4l2.c
struct vb2_v4l2_buffer {
struct vb2_buffer vb2_buf;
__u32 flags;
__u32 field;
struct timeval timestamp;
struct v4l2_timecode timecode;
__u32 sequence;
};
Signed-off-by: Junghak Sung <jh1009.sung@samsung.com>
Signed-off-by: Geunyoung Kim <nenggun.kim@samsung.com>
Acked-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Acked-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Commit ee53bbd172 "tracing: Move the perf code out of trace_event.h" moved
more than just the perf code out of trace_event.h, but also removed a bit of
the tracing code too. Move it back.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Pull thermal updates from Zhang Rui:
- use int instead of unsigned long to represent temperature to avoid
bogus overheat detection when negative temperature reported. From
Sascha Hauer.
- export available thermal governors information to user space via
sysfs. From Wei Ni.
- introduce new thermal driver for Wildcat Point platform controller
hub, which uses PCH thermal sensor and associated critical and hot
trip points. From Tushar Dave.
- add suuport for Intel Skylake and Denlow platforms in powerclamp
driver.
- some small cleanups in thermal core.
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal: Add Intel PCH thermal driver
thermal: Add comment explaining test for critical temperature
thermal: Use IS_ENABLED instead of #ifdef
thermal: remove unnecessary call to thermal_zone_device_set_polling
thermal: trivial: fix typo in comment
thermal: consistently use int for temperatures
thermal: add available policies sysfs attribute
thermal/powerclamp: add cpu id for denlow platform
thermal/powerclamp: add cpu id for Skylake u/y
thermal/powerclamp: add cpu id for skylake h/s
Pull blk-cg updates from Jens Axboe:
"A bit later in the cycle, but this has been in the block tree for a a
while. This is basically four patchsets from Tejun, that improve our
buffered cgroup writeback. It was dependent on the other cgroup
changes, but they went in earlier in this cycle.
Series 1 is set of 5 patches that has cgroup writeback updates:
- bdi_writeback iteration fix which could lead to some wb's being
skipped or repeated during e.g. sync under memory pressure.
- Simplification of wb work wait mechanism.
- Writeback tracepoints updated to report cgroup.
Series 2 is is a set of updates for the CFQ cgroup writeback handling:
cfq has always charged all async IOs to the root cgroup. It didn't
have much choice as writeback didn't know about cgroups and there
was no way to tell who to blame for a given writeback IO.
writeback finally grew support for cgroups and now tags each
writeback IO with the appropriate cgroup to charge it against.
This patchset updates cfq so that it follows the blkcg each bio is
tagged with. Async cfq_queues are now shared across cfq_group,
which is per-cgroup, instead of per-request_queue cfq_data. This
makes all IOs follow the weight based IO resource distribution
implemented by cfq.
- Switched from GFP_ATOMIC to GFP_NOWAIT as suggested by Jeff.
- Other misc review points addressed, acks added and rebased.
Series 3 is the blkcg policy cleanup patches:
This patchset contains assorted cleanups for blkcg_policy methods
and blk[c]g_policy_data handling.
- alloc/free added for blkg_policy_data. exit dropped.
- alloc/free added for blkcg_policy_data.
- blk-throttle's async percpu allocation is replaced with direct
allocation.
- all methods now take blk[c]g_policy_data instead of blkcg_gq or
blkcg.
And finally, series 4 is a set of patches cleaning up the blkcg stats
handling:
blkcg's stats have always been somwhat of a mess. This patchset
tries to improve the situation a bit.
- The following patches added to consolidate blkcg entry point and
blkg creation. This is in itself is an improvement and helps
colllecting common stats on bio issue.
- per-blkg stats now accounted on bio issue rather than request
completion so that bio based and request based drivers can behave
the same way. The issue was spotted by Vivek.
- cfq-iosched implements custom recursive stats and blk-throttle
implements custom per-cpu stats. This patchset make blkcg core
support both by default.
- cfq-iosched and blk-throttle keep track of the same stats
multiple times. Unify them"
* 'for-4.3/blkcg' of git://git.kernel.dk/linux-block: (45 commits)
blkcg: use CGROUP_WEIGHT_* scale for io.weight on the unified hierarchy
blkcg: s/CFQ_WEIGHT_*/CFQ_WEIGHT_LEGACY_*/
blkcg: implement interface for the unified hierarchy
blkcg: misc preparations for unified hierarchy interface
blkcg: separate out tg_conf_updated() from tg_set_conf()
blkcg: move body parsing from blkg_conf_prep() to its callers
blkcg: mark existing cftypes as legacy
blkcg: rename subsystem name from blkio to io
blkcg: refine error codes returned during blkcg configuration
blkcg: remove unnecessary NULL checks from __cfqg_set_weight_device()
blkcg: reduce stack usage of blkg_rwstat_recursive_sum()
blkcg: remove cfqg_stats->sectors
blkcg: move io_service_bytes and io_serviced stats into blkcg_gq
blkcg: make blkg_[rw]stat_recursive_sum() to be able to index into blkcg_gq
blkcg: make blkcg_[rw]stat per-cpu
blkcg: add blkg_[rw]stat->aux_cnt and replace cfq_group->dead_stats with it
blkcg: consolidate blkg creation in blkcg_bio_issue_check()
blk-throttle: improve queue bypass handling
blkcg: move root blkg lookup optimization from throtl_lookup_tg() to __blkg_lookup()
blkcg: inline [__]blkg_lookup()
...
The changes with more meat are:
o Allowing the trace event filters to filter on CPU number and process ids
o Two new markers for trace output latency were added
(10 and 100 msec latencies)
o Have tracing_thresh filter function profiling time
I also worked on modifying the ring buffer code for some future
work, and moved the adding of the timestamp around. One of my changes
caused a regression, and since other changes were built on top of it
and already tested, I had to operate a revert of that change. Instead
of rebasing, this change set has the code that caused a regression
as well as the code to revert that change without touching the other
changes that were made on top of it.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJV6aZEAAoJEEjnJuOKh9ldrR4H/A1RcQf1prLLoUibPP4w3lat
dmQcdpS1NY+cqyiKuKPAOkFDGQL7qWzRqZ8whcPSJIsHq57ufqNSLf+0bbQYPzg9
g3CgGL7OApmGi5ulj0sNxhadvc9TFm/SAN0nVJlNuUWdm8e1UWHLsrJZaMfopu2r
RDEtkOhg619mhDL4rktNdS6rk0B92Fhu2o2PwLZPVlUl1NNEt4WJU+ejitXUVO1A
Nb70/rTGGJKtyHbW+74on4LnEN5Uu0Viu6rMwGfYyIgRmC2otdBDvE4xfKMiTUKr
SzBjzrhIoMIRn4Vl0vElfulkpYaw7pcC2BdpZ4d9VpIOiLSlZs0x/TgCtpFEv5M=
=baZ3
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing update from Steven Rostedt:
"Mostly this is just clean ups and micro optimizations.
The changes with more meat are:
- Allowing the trace event filters to filter on CPU number and
process ids
- Two new markers for trace output latency were added (10 and 100
msec latencies)
- Have tracing_thresh filter function profiling time
I also worked on modifying the ring buffer code for some future work,
and moved the adding of the timestamp around. One of my changes
caused a regression, and since other changes were built on top of it
and already tested, I had to operate a revert of that change. Instead
of rebasing, this change set has the code that caused a regression as
well as the code to revert that change without touching the other
changes that were made on top of it"
* tag 'trace-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Revert "ring-buffer: Get timestamp after event is allocated"
tracing: Don't make assumptions about length of string on task rename
tracing: Allow triggers to filter for CPU ids and process names
ftrace: Format MCOUNT_ADDR address as type unsigned long
tracing: Introduce two additional marks for delay
ftrace: Fix function_graph duration spacing with 7-digits
ftrace: add tracing_thresh to function profile
tracing: Clean up stack tracing and fix fentry updates
ring-buffer: Reorganize function locations
ring-buffer: Make sure event has enough room for extend and padding
ring-buffer: Get timestamp after event is allocated
ring-buffer: Move the adding of the extended timestamp out of line
ring-buffer: Add event descriptor to simplify passing data
ftrace: correct the counter increment for trace_buffer data
tracing: Fix for non-continuous cpu ids
tracing: Prefer kcalloc over kzalloc with multiply
Fixes compilation with ppc64_defconfig.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tracepoint for dynamic halt_pool_ns, fired on every potential change.
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV6ifEAAoJEAhfPr2O5OEVn5kP/i2jM1tWcmV/ZEBKGAN0jpRk
5Y/Q+rnXvOpIJSQC3dEkweoBymVMclSgSB/wFSWCZtp5MaB8KrH4/2uc3UvolF91
7bqXt+fCUacMbDQyaabMCR83mz9tdOJLd5sf0ABqBgXGfwh5uXmBPaYBzmcYvKcW
4D89MFUpaFDPARTs9rdpVyr0aPRU4GcN0R3snRO9Ly+cQnyV/RxPf9NqCgnI+yPq
+NvA9ScUBcBt62piSIGR4egcAR8boxYC+0r57340S21/JVMvsHQ3ok9b1aT8/rtd
Yl24FkcKrRV0ShN5S1RmW5DLH/HRGabuMjkiEz9xq52FGD2sQQda0At58dWivsa4
XYdxS9UUfb9Z+qyeMdmCl1MUFRrV2G4H6VItP+GKyT3UZLEDcLl6hBg3SkyWxWB4
CSO5WuRThiIB86OVcIaREftzqDy5HdvH3ZKRD7QrW0DItGVjQwV5j6gvwqO9OEXs
99BnSohyKwUBonumE2ZtFGGhIwIomllrMSqg991bPH9+13bg/rPxUqntkPrVap/9
cV3qKO8ZFrz5UInBnR1U83l60ZK7rV4G6AVMSMKpM9XVK9TDKryAUN9Mhj5XWRH8
hbma89TQVdhdrITtt27uzj8F622cvZvxd1BqDBR8DjKVvtv/E2GPzJrAj7GHe3/o
NgzP5fF6X2Si32GNb7J8
=cIed
-----END PGP SIGNATURE-----
Merge tag 'media/v4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- new DVB frontend drivers: ascot2e, cxd2841er, horus3a, lnbh25
- new HDMI capture driver: tc358743
- new driver for NetUP DVB new boards (netup_unidvb)
- IR support for DVBSky cards (smipcie-ir)
- Coda driver has gain macroblock tiling support
- Renesas R-Car gains JPEG codec driver
- new DVB platform driver for STi boards: c8sectpfe
- added documentation for the media core kABI to device-drivers DocBook
- lots of driver fixups, cleanups and improvements
* tag 'media/v4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (297 commits)
[media] c8sectpfe: Remove select on undefined LIBELF_32
[media] i2c: fix platform_no_drv_owner.cocci warnings
[media] cx231xx: Use wake_up_interruptible() instead of wake_up_interruptible_nr()
[media] tc358743: only queue subdev notifications if devnode is set
[media] tc358743: add missing Kconfig dependency/select
[media] c8sectpfe: Use %pad to print 'dma_addr_t'
[media] DocBook media: Fix typo "the the" in xml files
[media] tc358743: make reset gpio optional
[media] tc358743: set direction of reset gpio using devm_gpiod_get
[media] dvbdev: document most of the functions/data structs
[media] dvb_frontend.h: document the struct dvb_frontend
[media] dvb-frontend.h: document struct dtv_frontend_properties
[media] dvb-frontend.h: document struct dvb_frontend_ops
[media] dvb: Use DVBFE_ALGO_HW where applicable
[media] dvb_frontend.h: document struct analog_demod_ops
[media] dvb_frontend.h: Document struct dvb_tuner_ops
[media] Docbook: Document struct analog_parameters
[media] dvb_frontend.h: get rid of dvbfe_modcod
[media] add documentation for struct dvb_tuner_info
[media] dvb_frontend: document dvb_frontend_tune_settings
...
- Add Jeff Layton as an nfsd co-maintainer: no change to
existing practice, just an acknowledgement of the status quo.
- Two patches ("nfsd: ensure that...") for a race overlooked by
the state locking rewrite, causing a crash noticed by multiple
users.
- Lots of smaller bugfixes all over from Kinglong Mee.
- From Jeff, some cleanup of server rpc code in preparation for
possible shift of nfsd threads to workqueues.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV6fbLAAoJECebzXlCjuG+qGkP/j2YnZynwqCa4uz1+FU7qfYI
kZWNGFFQ7O7e1i9Wznp7BkSA020rvM5d1HPwZhtstURM3i52XWRtbppwKF2+IuEU
tpNdPKb28BPCZO29Z8mQk9IS2sX5jmBiibXRqBk0VK7e43PXrIwg1LJJ9HOfOpLh
b1MvxdEB7vqK+fAVIYyhlg0UDd5AHAkQ+vS8YuohRXbDcsdhhE4vmusLlUl5UKp8
5Yunz+b+pXfXPYaKidmpar6U2KoRSTPP1uO3bNfN6URO1W1nchPadLs0DnsBKlhb
U8II5RZEmc+YfiIMoeptkJHoNhWT6Zu7CNJR6B0USTKv4L6TmFQVpxptVutzYVwx
sGJ65lvCiXXOPz8JJwvBty//HTmbyOiCm64/vMbhQRlSNLSmcmTXEpw/uT5Huaxx
bX9lnznoVVCd3eRoXPwMdZTbg/uEKqREZsQWVoqA6gexYqeyp79kvGbttLoUJ27Z
IjtNb9W6akxfPKrHMgan6j7dy866o6TdSfWRayHwUoswbNnVOnMYKHjApOtF0oev
k2pdLuy9tjl2a9Ow9sSwHZDbNsXgJO76E0aYnSTBP/YvctlG7KoZ+E0oxa6DWTC+
0dE+g1xhIuUtW5WRL4pfWWk1G7jnf16J91bKkn91VveDn666RncAbLBtePmpIcIu
5Ah6KxztTVCW++i5pmHh
=aecc
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.3' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Nothing major, but:
- Add Jeff Layton as an nfsd co-maintainer: no change to existing
practice, just an acknowledgement of the status quo.
- Two patches ("nfsd: ensure that...") for a race overlooked by the
state locking rewrite, causing a crash noticed by multiple users.
- Lots of smaller bugfixes all over from Kinglong Mee.
- From Jeff, some cleanup of server rpc code in preparation for
possible shift of nfsd threads to workqueues"
* tag 'nfsd-4.3' of git://linux-nfs.org/~bfields/linux: (52 commits)
nfsd: deal with DELEGRETURN racing with CB_RECALL
nfsd: return CLID_INUSE for unexpected SETCLIENTID_CONFIRM case
nfsd: ensure that delegation stateid hash references are only put once
nfsd: ensure that the ol stateid hash reference is only put once
net: sunrpc: fix tracepoint Warning: unknown op '->'
nfsd: allow more than one laundry job to run at a time
nfsd: don't WARN/backtrace for invalid container deployment.
fs: fix fs/locks.c kernel-doc warning
nfsd: Add Jeff Layton as co-maintainer
NFSD: Return word2 bitmask if setting security label in OPEN/CREATE
NFSD: Set the attributes used to store the verifier for EXCLUSIVE4_1
nfsd: SUPPATTR_EXCLCREAT must be encoded before SECURITY_LABEL.
nfsd: Fix an FS_LAYOUT_TYPES/LAYOUT_TYPES encode bug
NFSD: Store parent's stat in a separate value
nfsd: Fix two typos in comments
lockd: NLM grace period shouldn't block NFSv4 opens
nfsd: include linux/nfs4.h in export.h
sunrpc: Switch to using hash list instead single list
sunrpc/nfsd: Remove redundant code by exports seq_operations functions
sunrpc: Store cache_detail in seq_file's private directly
...
Merge patch-bomb from Andrew Morton:
- a few misc things
- Andy's "ambient capabilities"
- fs/nofity updates
- the ocfs2 queue
- kernel/watchdog.c updates and feature work.
- some of MM. Includes Andrea's userfaultfd feature.
[ Hadn't noticed that userfaultfd was 'default y' when applying the
patches, so that got fixed in this merge instead. We do _not_ mark
new features that nobody uses yet 'default y' - Linus ]
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
mm/hugetlb.c: make vma_has_reserves() return bool
mm/madvise.c: make madvise_behaviour_valid() return bool
mm/memory.c: make tlb_next_batch() return bool
mm/dmapool.c: change is_page_busy() return from int to bool
mm: remove struct node_active_region
mremap: simplify the "overlap" check in mremap_to()
mremap: don't do uneccesary checks if new_len == old_len
mremap: don't do mm_populate(new_addr) on failure
mm: move ->mremap() from file_operations to vm_operations_struct
mremap: don't leak new_vma if f_op->mremap() fails
mm/hugetlb.c: make vma_shareable() return bool
mm: make GUP handle pfn mapping unless FOLL_GET is requested
mm: fix status code which move_pages() returns for zero page
mm: memcontrol: bring back the VM_BUG_ON() in mem_cgroup_swapout()
genalloc: add support of multiple gen_pools per device
genalloc: add name arg to gen_pool_get() and devm_gen_pool_create()
mm/memblock: WARN_ON when nid differs from overlap region
Documentation/features/vm: add feature description and arch support status for batched TLB flush after unmap
mm: defer flush of writable TLB entries
mm: send one IPI per CPU to TLB flush all entries after unmapping pages
...
When unmapping pages it is necessary to flush the TLB. If that page was
accessed by another CPU then an IPI is used to flush the remote CPU. That
is a lot of IPIs if kswapd is scanning and unmapping >100K pages per
second.
There already is a window between when a page is unmapped and when it is
TLB flushed. This series increases the window so multiple pages can be
flushed using a single IPI. This should be safe or the kernel is hosed
already.
Patch 1 simply made the rest of the series easier to write as ftrace
could identify all the senders of TLB flush IPIS.
Patch 2 tracks what CPUs potentially map a PFN and then sends an IPI
to flush the entire TLB.
Patch 3 tracks when there potentially are writable TLB entries that
need to be batched differently
Patch 4 increases SWAP_CLUSTER_MAX to further batch flushes
The performance impact is documented in the changelogs but in the optimistic
case on a 4-socket machine the full series reduces interrupts from 900K
interrupts/second to 60K interrupts/second.
This patch (of 4):
It is easy to trace when an IPI is received to flush a TLB but harder to
detect what event sent it. This patch makes it easy to identify the
source of IPIs being transmitted for TLB flushes on x86.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are little changes in core part, but lots of development are
found in drivers, especially ASoC. The diffstat shows regmap-
related changes for a slight API additions / changes, and that's all.
Looking at the code size statistics, the most significant addition
is for Intel Skylake. (Note that SKL support is still underway, the
codec driver is missing.) Also STI controller driver is a major
addition as well as a few new codec drivers.
In HD-audio side, there are fewer changes than the past. The
noticeable change is the support of ELD notification from i915
graphics driver. Thus this pull request carries a few changes in
drm/i915.
Other than that, USB-audio got a rewrite of runtime PM code. It
was initiated by lockdep warning, but resulted in a good cleanup in
the end.
Below are the highlights:
Common:
- Factoring out of AC'97 reset code from ASoC into the core helper
- A few regmap API extensions (in case it's not pulled yet)
ASoC:
- New drivers for Cirrus CS4349, GTM601, InvenSense ICS43432, Realtek
RT298 and ST STI controllers
- Machine drivers for Rockchip systems with MAX98090 and RT5645 and
RT5650
- Initial driver support for Intel Skylake devices
- Lots of rsnd cleanup and enhancements
- A few DAPM fixes and cleanups
- A large number of cleanups in various drivers (conversion and
standardized to regmap, component) mostly by Lars-Peter and Axel
HD-audio:
- Extended HD-audio core for Intel Skylake controller support
- Quirks for Dell headsets, Alienware 15
- Clean up of pin-based quirk tables for Realtek codecs
- ELD notifier implenetation for Intel HDMI/DP
USB-audio:
- Refactor runtime PM code to make lockdep happier
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJV6TwJAAoJEGwxgFQ9KSmkZoEP/06GrsGlfgIfBbnlAKcsZ0t0
RDDCbxmwD8IsjTk180Gs3qBuhVPurhmPxq6Leow5fBktkEK5bIN3eAQkO9aIMroW
xxU1UF6Q9XE2j97e/PhhUld7/NP0IQK/YTMuwX74G2kfEkA9Lktl4UjNMw9mKJX2
8OIwz8ZuqSG60znmGlgiqRE4M3Svs1L/jVP1wrPg2DXQfe+ptAJpUTsyVGOMRWm3
IaJ9h5OelPg8Jm61zcg6/pgsdYx4oquCV5wLwMz8rzIUfHb7ox8F7YKOzB+sXtYI
zcsTfF2CqifoBcQAh9c+XE4+gMamAdheA+uc8ScUkcskucTj4Fr5tXLiPSN9QMt4
QGOOVjqcpWv5rWwAgzUJvl1/PT4HyQfkXn5tEQVGdg9Ab1SIcQBzD1+nHUV94vKZ
N7/grMdqJ56zUGK2fEcBS6BEDlaSToOIHDrQ1iPFNBvmW8qjBq9tYaufTGC6Vtj2
0YKJukzIbyqLIgQtQf44aqLouFIz2lq437PqRQ4W+9C3FwGN9FKCYJ/JzvOGDIJa
sSjEwQkJ9vnmZ3E2B30NKb24TG8pPq9WPIN2Rqe5EbHctU3gEnMScwvmG7SmCSG5
LtDVr6Q5XKFM56cVb7tdZl6Jv97BvGu6EERM+zN+8YyMver206rC8upWOev6R2q3
asvLDEchv7Qm3upx+PYg
=/sXs
-----END PGP SIGNATURE-----
Merge tag 'sound-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"There are little changes in core part, but lots of development are
found in drivers, especially ASoC. The diffstat shows regmap-related
changes for a slight API additions / changes, and that's all.
Looking at the code size statistics, the most significant addition is
for Intel Skylake. (Note that SKL support is still underway, the
codec driver is missing.) Also STI controller driver is a major
addition as well as a few new codec drivers.
In HD-audio side, there are fewer changes than the past. The
noticeable change is the support of ELD notification from i915
graphics driver. Thus this pull request carries a few changes in
drm/i915.
Other than that, USB-audio got a rewrite of runtime PM code. It was
initiated by lockdep warning, but resulted in a good cleanup in the
end.
Below are the highlights:
Common:
- Factoring out of AC'97 reset code from ASoC into the core helper
- A few regmap API extensions (in case it's not pulled yet)
ASoC:
- New drivers for Cirrus CS4349, GTM601, InvenSense ICS43432, Realtek
RT298 and ST STI controllers
- Machine drivers for Rockchip systems with MAX98090 and RT5645 and
RT5650
- Initial driver support for Intel Skylake devices
- Lots of rsnd cleanup and enhancements
- A few DAPM fixes and cleanups
- A large number of cleanups in various drivers (conversion and
standardized to regmap, component) mostly by Lars-Peter and Axel
HD-audio:
- Extended HD-audio core for Intel Skylake controller support
- Quirks for Dell headsets, Alienware 15
- Clean up of pin-based quirk tables for Realtek codecs
- ELD notifier implenetation for Intel HDMI/DP
USB-audio:
- Refactor runtime PM code to make lockdep happier"
* tag 'sound-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (411 commits)
drm/i915: Add locks around audio component bind/unbind
drm/i915: Drop port_mst_index parameter from pin/eld callback
ALSA: hda - Fix missing inline for dummy snd_hdac_set_codec_wakeup()
ALSA: hda - Wake the codec up on pin/ELD notify events
ALSA: hda - allow codecs to access the i915 pin/ELD callback
drm/i915: Call audio pin/ELD notify function
drm/i915: Add audio pin sense / ELD callback
ASoC: zx296702-i2s: Fix resource leak when unload module
ASoC: sti_uniperif: Ensure component is unregistered when unload module
ASoC: au1x: psc-i2s: Convert to use devm_ioremap_resource
ASoC: sh: dma-sh7760: Convert to devm_snd_soc_register_platform
ASoC: spear_pcm: Use devm_snd_dmaengine_pcm_register to fix resource leak
ALSA: fireworks/bebob/dice/oxfw: fix substreams counting at vmalloc failure
ASoC: Clean up docbook warnings
ASoC: txx9: Convert to devm_snd_soc_register_platform
ASoC: pxa: Convert to devm_snd_soc_register_platform
ASoC: nuc900: Convert to devm_snd_soc_register_platform
ASoC: blackfin: Convert to devm_snd_soc_register_platform
ASoC: au1x: Convert to devm_snd_soc_register_platform
ASoC: qcom: Constify asoc_qcom_lpass_cpu_dai_ops
...
Pull f2fs updates from Jaegeuk Kim:
"The major work includes fixing and enhancing the existing extent_cache
feature, which has been well settling down so far and now it becomes a
default mount option accordingly.
Also, this version newly registers a f2fs memory shrinker to reclaim
several objects consumed by a couple of data structures in order to
avoid memory pressures.
Another new feature is to add ioctl(F2FS_GARBAGE_COLLECT) which
triggers a cleaning job explicitly by users.
Most of the other patches are to fix bugs occurred in the corner cases
across the whole code area"
* tag 'for-f2fs-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (85 commits)
f2fs: upset segment_info repair
f2fs: avoid accessing NULL pointer in f2fs_drop_largest_extent
f2fs: update extent tree in batches
f2fs: fix to release inode correctly
f2fs: handle f2fs_truncate error correctly
f2fs: avoid unneeded initializing when converting inline dentry
f2fs: atomically set inode->i_flags
f2fs: fix wrong pointer access during try_to_free_nids
f2fs: use __GFP_NOFAIL to avoid infinite loop
f2fs: lookup neighbor extent nodes for merging later
f2fs: split __insert_extent_tree_ret for readability
f2fs: kill dead code in __insert_extent_tree
f2fs: adjust showing of extent cache stat
f2fs: add largest/cached stat in extent cache
f2fs: fix incorrect mapping for bmap
f2fs: add annotation for space utilization of regular/inline dentry
f2fs: fix to update cached_en of extent tree properly
f2fs: fix typo
f2fs: check the node block address of newly allocated nid
f2fs: go out for insert_inode_locked failure
...
Pull ext3 removal, quota & udf fixes from Jan Kara:
"The biggest change in the pull is the removal of ext3 filesystem
driver (~28k lines removed). Ext4 driver is a full featured
replacement these days and both RH and SUSE use it for several years
without issues. Also there are some workarounds in VM & block layer
mainly for ext3 which we could eventually get rid of.
Other larger change is addition of proper error handling for
dquot_initialize(). The rest is small fixes and cleanups"
[ I wasn't convinced about the ext3 removal and worried about things
falling through the cracks for legacy users, but ext4 maintainers
piped up and were all unanimously in favor of removal, and maintaining
all legacy ext3 support inside ext4. - Linus ]
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Don't modify filesystem for read-only mounts
quota: remove an unneeded condition
ext4: memory leak on error in ext4_symlink()
mm/Kconfig: NEED_BOUNCE_POOL: clean-up condition
ext4: Improve ext4 Kconfig test
block: Remove forced page bouncing under IO
fs: Remove ext3 filesystem driver
doc: Update doc about journalling layer
jfs: Handle error from dquot_initialize()
reiserfs: Handle error from dquot_initialize()
ocfs2: Handle error from dquot_initialize()
ext4: Handle error from dquot_initialize()
ext2: Handle error from dquot_initalize()
quota: Propagate error from ->acquire_dquot()
Pull networking updates from David Miller:
"Another merge window, another set of networking changes. I've heard
rumblings that the lightweight tunnels infrastructure has been voted
networking change of the year. But what do I know?
1) Add conntrack support to openvswitch, from Joe Stringer.
2) Initial support for VRF (Virtual Routing and Forwarding), which
allows the segmentation of routing paths without using multiple
devices. There are some semantic kinks to work out still, but
this is a reasonably strong foundation. From David Ahern.
3) Remove spinlock fro act_bpf fast path, from Alexei Starovoitov.
4) Ignore route nexthops with a link down state in ipv6, just like
ipv4. From Andy Gospodarek.
5) Remove spinlock from fast path of act_gact and act_mirred, from
Eric Dumazet.
6) Document the DSA layer, from Florian Fainelli.
7) Add netconsole support to bcmgenet, systemport, and DSA. Also
from Florian Fainelli.
8) Add Mellanox Switch Driver and core infrastructure, from Jiri
Pirko.
9) Add support for "light weight tunnels", which allow for
encapsulation and decapsulation without bearing the overhead of a
full blown netdevice. From Thomas Graf, Jiri Benc, and a cast of
others.
10) Add Identifier Locator Addressing support for ipv6, from Tom
Herbert.
11) Support fragmented SKBs in iwlwifi, from Johannes Berg.
12) Allow perf PMUs to be accessed from eBPF programs, from Kaixu Xia.
13) Add BQL support to 3c59x driver, from Loganaden Velvindron.
14) Stop using a zero TX queue length to mean that a device shouldn't
have a qdisc attached, use an explicit flag instead. From Phil
Sutter.
15) Use generic geneve netdevice infrastructure in openvswitch, from
Pravin B Shelar.
16) Add infrastructure to avoid re-forwarding a packet in software
that was already forwarded by a hardware switch. From Scott
Feldman.
17) Allow AF_PACKET fanout function to be implemented in a bpf
program, from Willem de Bruijn"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1458 commits)
netfilter: nf_conntrack: make nf_ct_zone_dflt built-in
netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabled
net: fec: clear receive interrupts before processing a packet
ipv6: fix exthdrs offload registration in out_rt path
xen-netback: add support for multicast control
bgmac: Update fixed_phy_register()
sock, diag: fix panic in sock_diag_put_filterinfo
flow_dissector: Use 'const' where possible.
flow_dissector: Fix function argument ordering dependency
ixgbe: Resolve "initialized field overwritten" warnings
ixgbe: Remove bimodal SR-IOV disabling
ixgbe: Add support for reporting 2.5G link speed
ixgbe: fix bounds checking in ixgbe_setup_tc for 82598
ixgbe: support for ethtool set_rxfh
ixgbe: Avoid needless PHY access on copper phys
ixgbe: cleanup to use cached mask value
ixgbe: Remove second instance of lan_id variable
ixgbe: use kzalloc for allocating one thing
flow: Move __get_hash_from_flowi{4,6} into flow_dissector.c
ixgbe: Remove unused PCI bus types
...
A number of VRF patches used 'int' for table id. It should be u32 to be
consistent with the rest of the stack.
Fixes:
4e3c89920c ("net: Introduce VRF related flags and helpers")
15be405eb2 ("net: Add inet_addr lookup by table")
30bbaa1950 ("net: Fix up inet_addr_type checks")
021dd3b8a1 ("net: Add routes to the table associated with the device")
dc028da54e ("inet: Move VRF table lookup to inlined function")
f6d3c19274 ("net: FIB tracepoints")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull scheduler updates from Ingo Molnar:
"The biggest change in this cycle is the rewrite of the main SMP load
balancing metric: the CPU load/utilization. The main goal was to make
the metric more precise and more representative - see the changelog of
this commit for the gory details:
9d89c257df ("sched/fair: Rewrite runnable load and utilization average tracking")
It is done in a way that significantly reduces complexity of the code:
5 files changed, 249 insertions(+), 494 deletions(-)
and the performance testing results are encouraging. Nevertheless we
need to keep an eye on potential regressions, since this potentially
affects every SMP workload in existence.
This work comes from Yuyang Du.
Other changes:
- SCHED_DL updates. (Andrea Parri)
- Simplify architecture callbacks by removing finish_arch_switch().
(Peter Zijlstra et al)
- cputime accounting: guarantee stime + utime == rtime. (Peter
Zijlstra)
- optimize idle CPU wakeups some more - inspired by Facebook server
loads. (Mike Galbraith)
- stop_machine fixes and updates. (Oleg Nesterov)
- Introduce the 'trace_sched_waking' tracepoint. (Peter Zijlstra)
- sched/numa tweaks. (Srikar Dronamraju)
- misc fixes and small cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
sched/deadline: Fix comment in enqueue_task_dl()
sched/deadline: Fix comment in push_dl_tasks()
sched: Change the sched_class::set_cpus_allowed() calling context
sched: Make sched_class::set_cpus_allowed() unconditional
sched: Fix a race between __kthread_bind() and sched_setaffinity()
sched: Ensure a task has a non-normalized vruntime when returning back to CFS
sched/numa: Fix NUMA_DIRECT topology identification
tile: Reorganize _switch_to()
sched, sparc32: Update scheduler comments in copy_thread()
sched: Remove finish_arch_switch()
sched, tile: Remove finish_arch_switch
sched, sh: Fold finish_arch_switch() into switch_to()
sched, score: Remove finish_arch_switch()
sched, avr32: Remove finish_arch_switch()
sched, MIPS: Get rid of finish_arch_switch()
sched, arm: Remove finish_arch_switch()
sched/fair: Clean up load average references
sched/fair: Provide runnable_load_avg back to cfs_rq
sched/fair: Remove task and group entity load when they are dead
sched/fair: Init cfs_rq's sched_entity load average
...
Pull RCU updates from Ingo Molnar:
"The main RCU changes in this cycle are:
- the combination of tree geometry-initialization simplifications and
OS-jitter-reduction changes to expedited grace periods. These two
are stacked due to the large number of conflicts that would
otherwise result.
- privatize smp_mb__after_unlock_lock().
This commit moves the definition of smp_mb__after_unlock_lock() to
kernel/rcu/tree.h, in recognition of the fact that RCU is the only
thing using this, that nothing else is likely to use it, and that
it is likely to go away completely.
- documentation updates.
- torture-test updates.
- misc fixes"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
rcu,locking: Privatize smp_mb__after_unlock_lock()
rcu: Silence lockdep false positive for expedited grace periods
rcu: Don't disable CPU hotplug during OOM notifiers
scripts: Make checkpatch.pl warn on expedited RCU grace periods
rcu: Update MAINTAINERS entry
rcu: Clarify CONFIG_RCU_EQS_DEBUG help text
rcu: Fix backwards RCU_LOCKDEP_WARN() in synchronize_rcu_tasks()
rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()
rcu: Make rcu_is_watching() really notrace
cpu: Wait for RCU grace periods concurrently
rcu: Create a synchronize_rcu_mult()
rcu: Fix obsolete priority-boosting comment
rcu: Use WRITE_ONCE in RCU_INIT_POINTER
rcu: Hide RCU_NOCB_CPU behind RCU_EXPERT
rcu: Add RCU-sched flavors of get-state and cond-sync
rcu: Add fastpath bypassing funnel locking
rcu: Rename RCU_GP_DONE_FQS to RCU_GP_DOING_FQS
rcu: Pull out wait_event*() condition into helper function
documentation: Describe new expedited stall warnings
rcu: Add stall warnings to synchronize_sched_expedited()
...
`perf stat -e sunrpc:svc_xprt_do_enqueue true` results in
Warning: unknown op '->'
Warning: [sunrpc:svc_xprt_do_enqueue] unknown op '->'
Similar warning for svc_handle_xprt as well.
Actually TP_printk() should never dereference an address saved in the ring
buffer that points somewhere in the kernel. There's no guarantee that that
object still exists (with the exception of static strings).
Therefore change all the arguments for TP_printk(), so that it references
values existing in the ring buffer only.
While doing that, also fix another possible bug when argument xprt could be
NULL and TP_fast_assign() tries to access it's elements.
Signed-off-by: Pratyush Anand <panand@redhat.com>
Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: stable@vger.kernel.org
Fixes: 83a712e0af "sunrpc: add some tracepoints around ..."
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
TOS is another key aspect of the lookup passed to fib_validate_source.
Add it to the tracepoint.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While the dest comm string size is assured to be at least TASK_COMM_LEN long,
doing a memcpy() also adds the assumption that the source is at least that
long as well, which isn't assured, and isn't true in cases such as:
set_task_comm(worker->task, "kworker/dying");
This leads to accessing invalid memory.
Link: http://lkml.kernel.org/r/1440760018-1557-1-git-send-email-sasha.levin@oracle.com
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
A few useful tracepoints developing VRF driver.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following tracepoints are updated to report the cgroup used during
cgroup writeback.
* writeback_write_inode[_start]
* writeback_queue
* writeback_exec
* writeback_start
* writeback_written
* writeback_wait
* writeback_nowork
* writeback_wake_background
* wbc_writepage
* writeback_queue_io
* bdi_dirty_ratelimit
* balance_dirty_pages
* writeback_sb_inodes_requeue
* writeback_single_inode[_start]
Note that writeback_bdi_register is separated out from writeback_class
as reporting cgroup doesn't make sense to it. Tracepoints which take
bdi are updated to take bdi_writeback instead.
Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
The snd_soc_dapm_input_path and snd_soc_dapm_output_path trace events are
identical except for the direction. Instead of having two events have a
single one that has a field that contains the direction.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Add tracepoints to retrieve information about read, write
and non-data commands. For performance measurement support
tracepoints are added at the beginning and at the end of
transfers. Following is a list showing the new tracepoint
events. The "cmd" parameter here represents the opcode, SID,
and full 16-bit address.
spmi_write_begin: cmd and data buffer.
spmi_write_end : cmd and return value.
spmi_read_begin : cmd.
spmi_read_end : cmd, return value and data buffer.
spmi_cmd : cmd.
The reason that cmd appears at both the beginning and at
the end event is that SPMI drivers can request commands
concurrently. cmd helps in matching the corresponding
events.
SPMI tracepoints can be enabled like:
echo 1 >/sys/kernel/debug/tracing/events/spmi/enable
and will dump messages that can be viewed in
/sys/kernel/debug/tracing/trace that look like:
... spmi_read_begin: opc=56 sid=00 addr=0x0000
... spmi_read_end: opc=56 sid=00 addr=0x0000 ret=0 len=02 buf=0x[01-40]
... spmi_write_begin: opc=48 sid=00 addr=0x0000 len=3 buf=0x[ff-ff-ff]
Suggested-by: Sagar Dharia <sdharia@codeaurora.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Gilad Avidov <gavidov@codeaurora.org>
Signed-off-by: Ankit Gupta <ankgupta@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Because of the extent shrinker or other -ENOMEM scenarios, it cannot guarantee
that the largest extent would be cached in the tree all the time.
Instead of relying on extent_tree, we can simply check the cached one in extent
tree accordingly.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The thermal code uses int, long and unsigned long for temperatures
in different places.
Using an unsigned type limits the thermal framework to positive
temperatures without need. Also several drivers currently will report
temperatures near UINT_MAX for temperatures below 0°C. This will probably
immediately shut the machine down due to overtemperature if started below
0°C.
'long' is 64bit on several architectures. This is not needed since INT_MAX °mC
is above the melting point of all known materials.
Consistently use a plain 'int' for temperatures throughout the thermal code and
the drivers. This only changes the places in the drivers where the temperature
is passed around as pointer, when drivers internally use another type this is
not changed.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Reviewed-by: Lukasz Majewski <l.majewski@samsung.com>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Peter Feuerer <peter@piie.net>
Cc: Punit Agrawal <punit.agrawal@arm.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Jean Delvare <jdelvare@suse.de>
Cc: Peter Feuerer <peter@piie.net>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: Lukasz Majewski <l.majewski@samsung.com>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: linux-acpi@vger.kernel.org
Cc: platform-driver-x86@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-omap@vger.kernel.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Maxime Ripard <maxime.ripard@free-electrons.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: lm-sensors@lm-sensors.org
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Mathieu reported that since 317f394160 ("sched: Move the second half
of ttwu() to the remote cpu") trace_sched_wakeup() can happen out of
context of the waker.
This is a problem when you want to analyse wakeup paths because it is
now very hard to correlate the wakeup event to whoever issued the
wakeup.
OTOH trace_sched_wakeup() is issued at the point where we set
p->state = TASK_RUNNING, which is right were we hand the task off to
the scheduler, so this is an important point when looking at
scheduling behaviour, up to here its been the wakeup path everything
hereafter is due to scheduler policy.
To bridge this gap, introduce a second tracepoint: trace_sched_waking.
It is guaranteed to be called in the waker context.
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Francis Giraldeau <francis.giraldeau@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150609091336.GQ3644@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The functionality of ext3 is fully supported by ext4 driver. Major
distributions (SUSE, RedHat) already use ext4 driver to handle ext3
filesystems for quite some time. There is some ugliness in mm resulting
from jbd cleaning buffers in a dirty page without cleaning page dirty
bit and also support for buffer bouncing in the block layer when stable
pages are required is there only because of jbd. So let's remove the
ext3 driver. This saves us some 28k lines of duplicated code.
Acked-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
The rcu_seq operations were open-coded in _rcu_barrier(), so this commit
replaces the open-coding with the shiny new rcu_seq operations.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Add videobuf2 specific vb2_qbuf and vb2_dqbuf trace events that mirror the
v4l2_qbuf and v4l2_dqbuf trace events, only they include additional
information about queue fill state and are emitted right before the buffer
is enqueued in the driver or userspace is woken up. This allows to make
sense of the timeline of trace events in combination with others that might
be triggered by __enqueue_in_driver.
Also two new trace events vb2_buf_queue and vb2_buf_done are added,
allowing to trace the handover between videobuf2 framework and driver.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Trace events with exactly the same parameters and trace output, such as
v4l2_qbuf and v4l2_dqbuf, are supposed to use the DECLARE_EVENT_CLASS and
DEFINE_EVENT macros instead of duplicated TRACE_EVENT macro calls.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Pull btrfs updates from Chris Mason:
"Outside of our usual batch of fixes, this integrates the subvolume
quota updates that Qu Wenruo from Fujitsu has been working on for a
few releases now. He gets an extra gold star for making btrfs smaller
this time, and fixing a number of quota corners in the process.
Dave Sterba tested and integrated Anand Jain's sysfs improvements.
Outside of exporting a symbol (ack'd by Greg) these are all internal
to btrfs and it's mostly cleanups and fixes. Anand also attached some
of our sysfs objects to our internal device management structs instead
of an object off the super block. It will make device management
easier overall and it's a better fit for how the sysfs files are used.
None of the existing sysfs files are moved around.
Thanks for all the fixes everyone"
* 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (87 commits)
btrfs: delayed-ref: double free in btrfs_add_delayed_tree_ref()
Btrfs: Check if kobject is initialized before put
lib: export symbol kobject_move()
Btrfs: sysfs: add support to show replacing target in the sysfs
Btrfs: free the stale device
Btrfs: use received_uuid of parent during send
Btrfs: fix use-after-free in btrfs_replay_log
btrfs: wait for delayed iputs on no space
btrfs: qgroup: Make snapshot accounting work with new extent-oriented qgroup.
btrfs: qgroup: Add the ability to skip given qgroup for old/new_roots.
btrfs: ulist: Add ulist_del() function.
btrfs: qgroup: Cleanup the old ref_node-oriented mechanism.
btrfs: qgroup: Switch self test to extent-oriented qgroup mechanism.
btrfs: qgroup: Switch to new extent-oriented qgroup mechanism.
btrfs: qgroup: Switch rescan to new mechanism.
btrfs: qgroup: Add new qgroup calculation function btrfs_qgroup_account_extents().
btrfs: backref: Add special time_seq == (u64)-1 case for btrfs_find_all_roots().
btrfs: qgroup: Add new function to record old_roots.
btrfs: qgroup: Record possible quota-related extent for qgroup.
btrfs: qgroup: Add function qgroup_update_counters().
...
"monitonic raw". Also some enhancements to make the ring buffer even
faster. But the biggest and most noticeable change is the renaming of
the ftrace* files, structures and variables that have to deal with
trace events.
Over the years I've had several developers tell me about their confusion
with what ftrace is compared to events. Technically, "ftrace" is the
infrastructure to do the function hooks, which include tracing and also
helps with live kernel patching. But the trace events are a separate
entity altogether, and the files that affect the trace events should
not be named "ftrace". These include:
include/trace/ftrace.h -> include/trace/trace_events.h
include/linux/ftrace_event.h -> include/linux/trace_events.h
Also, functions that are specific for trace events have also been renamed:
ftrace_print_*() -> trace_print_*()
(un)register_ftrace_event() -> (un)register_trace_event()
ftrace_event_name() -> trace_event_name()
ftrace_trigger_soft_disabled()-> trace_trigger_soft_disabled()
ftrace_define_fields_##call() -> trace_define_fields_##call()
ftrace_get_offsets_##call() -> trace_get_offsets_##call()
Structures have been renamed:
ftrace_event_file -> trace_event_file
ftrace_event_{call,class} -> trace_event_{call,class}
ftrace_event_buffer -> trace_event_buffer
ftrace_subsystem_dir -> trace_subsystem_dir
ftrace_event_raw_##call -> trace_event_raw_##call
ftrace_event_data_offset_##call-> trace_event_data_offset_##call
ftrace_event_type_funcs_##call -> trace_event_type_funcs_##call
And a few various variables and flags have also been updated.
This has been sitting in linux-next for some time, and I have not heard
a single complaint about this rename breaking anything. Mostly because
these functions, variables and structures are mostly internal to the
tracing system and are seldom (if ever) used by anything external to that.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJViYhVAAoJEEjnJuOKh9ldcJ0IAI+mytwoMAN/CWDE8pXrTrgs
aHlcr1zorSzZ0Lq6lKsWP+V0VGVhP8KWO16vl35HaM5ZB9U+cDzWiGobI8JTHi/3
eeTAPTjQdgrr/L+ZO1ApzS1jYPhN3Xi5L7xublcYMJjKfzU+bcYXg/x8gRt0QbG3
S9QN/kBt0JIIjT7McN64m5JVk2OiU36LxXxwHgCqJvVCPHUrriAdIX7Z5KRpEv13
zxgCN4d7Jiec/FsMW8dkO0vRlVAvudZWLL7oDmdsvNhnLy8nE79UOeHos2c1qifQ
LV4DeQ+2Hlu7w9wxixHuoOgNXDUEiQPJXzPc/CuCahiTL9N/urQSGQDoOVMltR4=
=hkdz
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"This patch series contains several clean ups and even a new trace
clock "monitonic raw". Also some enhancements to make the ring buffer
even faster. But the biggest and most noticeable change is the
renaming of the ftrace* files, structures and variables that have to
deal with trace events.
Over the years I've had several developers tell me about their
confusion with what ftrace is compared to events. Technically,
"ftrace" is the infrastructure to do the function hooks, which include
tracing and also helps with live kernel patching. But the trace
events are a separate entity altogether, and the files that affect the
trace events should not be named "ftrace". These include:
include/trace/ftrace.h -> include/trace/trace_events.h
include/linux/ftrace_event.h -> include/linux/trace_events.h
Also, functions that are specific for trace events have also been renamed:
ftrace_print_*() -> trace_print_*()
(un)register_ftrace_event() -> (un)register_trace_event()
ftrace_event_name() -> trace_event_name()
ftrace_trigger_soft_disabled() -> trace_trigger_soft_disabled()
ftrace_define_fields_##call() -> trace_define_fields_##call()
ftrace_get_offsets_##call() -> trace_get_offsets_##call()
Structures have been renamed:
ftrace_event_file -> trace_event_file
ftrace_event_{call,class} -> trace_event_{call,class}
ftrace_event_buffer -> trace_event_buffer
ftrace_subsystem_dir -> trace_subsystem_dir
ftrace_event_raw_##call -> trace_event_raw_##call
ftrace_event_data_offset_##call-> trace_event_data_offset_##call
ftrace_event_type_funcs_##call -> trace_event_type_funcs_##call
And a few various variables and flags have also been updated.
This has been sitting in linux-next for some time, and I have not
heard a single complaint about this rename breaking anything. Mostly
because these functions, variables and structures are mostly internal
to the tracing system and are seldom (if ever) used by anything
external to that"
* tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
ring_buffer: Allow to exit the ring buffer benchmark immediately
ring-buffer-benchmark: Fix the wrong type
ring-buffer-benchmark: Fix the wrong param in module_param
ring-buffer: Add enum names for the context levels
ring-buffer: Remove useless unused tracing_off_permanent()
ring-buffer: Give NMIs a chance to lock the reader_lock
ring-buffer: Add trace_recursive checks to ring_buffer_write()
ring-buffer: Allways do the trace_recursive checks
ring-buffer: Move recursive check to per_cpu descriptor
ring-buffer: Add unlikelys to make fast path the default
tracing: Rename ftrace_get_offsets_##call() to trace_event_get_offsets_##call()
tracing: Rename ftrace_define_fields_##call() to trace_event_define_fields_##call()
tracing: Rename ftrace_event_type_funcs_##call to trace_event_type_funcs_##call
tracing: Rename ftrace_data_offset_##call to trace_event_data_offset_##call
tracing: Rename ftrace_raw_##call event structures to trace_event_raw_##call
tracing: Rename ftrace_trigger_soft_disabled() to trace_trigger_soft_disabled()
tracing: Rename FTRACE_EVENT_FL_* flags to EVENT_FILE_FL_*
tracing: Rename struct ftrace_subsystem_dir to trace_subsystem_dir
tracing: Rename ftrace_event_name() to trace_event_name()
tracing: Rename FTRACE_MAX_EVENT to TRACE_EVENT_TYPE_MAX
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVi+2qAAoJEAhfPr2O5OEVJdUP/2JzjQ17fswt4JCqXZRMjSZi
ZZThdFY5Cirs4lKovigTsBwoFFf0nZ5ti+8MpbrglKUBRQTOwWXP/KrJt4hCCikD
nkcEPkATqlhCYNqxI/6TgUgvVODmTO6PtLbWYpnW64zi9zq0FM4Ko1h+s8ynB91O
UeDXbn00G2ifMo9BXuzyms+uW7f1LI0+Fqhwf3t9QrO8DTjjEU5Km9teUPMrJmgn
rcfCrj4c3uDX4Wh4xe1DEs6T7Cf6qRKG5BLjwm6uNO3RMsZ5sA6tgCdE6FonhGrF
Kvso2NCLQggZg6mgvMXoazYmaqxeeXsy06GBkmrQ9Xx6jo5z+nyJAWwVn7awdt7R
89CWWn9MRrjhG3QLiluacJaH/5Z+fULSe6Stg2AVqfQGy/EwE7N2BR1CqO0i4pbN
PredVtT77wyuDfUg6cYgJNjhhCcSt2i71X5Wt42rCZMylTGg3vWq//RqLo7xReaz
XC0uhrJRDnF02BMzwQftUa8+UTn8Ozb7OyV91DzmZq9njsVHLAyY5u5Q/Fye5buH
4Ejcjou3edVPMKu7aBtOt4Bmnwc03jBLTlMTdBMeHQugaj3aTP7G7EcX2UAGq4SK
2e5GaYyIaAvpiddhBJzSzzd8u+XtRynKtFvfWnR+wTnILC+w+nfM/PbthVlXNJZx
4sUv1XmWxbJT3T7wJ8xE
=iJ/U
-----END PGP SIGNATURE-----
Merge tag 'media/v4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- Lots of improvements at the DVB API DocBook documentation. Now, the
frontend and the network APIs are fully in sync with the Kernel and
looks more like the rest of the media documentation;
- New frontend driver: cx24120
- New driver for a PCI device: cobalt. This driver is actually not
sold in the market, but it is a good example of a multi-HDMI input
device;
- The dt3155 driver were promoted from staging;
- The mantis driver got remote controller support;
- New V4L2 driver for ST bdisp SoC chipsets;
- Make sparse and smatch happier: several bugs were solved by fixing
the issues reported by those static code analyzers.
- Lots of new device additions, new features, improvements and cleanups
at the existing drivers.
* tag 'media/v4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (553 commits)
[media] lmedm04: fix the range for relative measurements
[media] lmedm04: use u32 instead of u64 for relative stats
[media] omap3isp: remove unused var
[media] saa7134: fix page size on some archs
[media] use CONFIG_PM_SLEEP for suspend/resume
[media] tuner-i2c: be consistent with I2C declaration
[media] si470x: cleanup define namespace
[media] bdisp: prevent compiling on random arch
[media] vb2: Don't WARN when v4l2_buffer.bytesused is 0 for multiplanar buffers
[media] MAINTAINERS: Add entry for the Renesas VSP1 driver
[media] videodev2.h: fix copy-and-paste error in V4L2_MAP_XFER_FUNC_DEFAULT
[media] Revert "[media] vb2: Push mmap_sem down to memops"
[media] mantis: cleanup a warning
[media] bdisp-debug: don't try to divide by s64
[media] cx88: don't declare restart_video_queue if not used
[media] au0828: move dev->boards atribuition to happen earlier
[media] lmedm04: implement dvb v5 statistics
[media] bdisp: remove unused var
[media] bdisp: remove needless check
ts2020: fix compilation on i386
...
Pull thermal management updates from Zhang Rui:
"Specifics:
- enhance Thermal Framework with several new capabilities:
* use power estimates
* compute weights with relative integers instead of percentages
* allow governors to have private data in thermal zones
* export thermal zone parameters through sysfs
Thanks to the ARM thermal team (Javi, Punit, KP).
- introduce a new thermal governor: power allocator. First in kernel
closed loop PI(D) controller for thermal control. Thanks to ARM
thermal team.
- enhance OF thermal to allow thermal zones to have sustainable power
HW specification. Thanks to Punit.
- introduce thermal driver for Intel Quark SoC x1000platform. Thanks
to Ong, Boon Leong.
- introduce QPNP PMIC temperature alarm driver. Thanks to Ivan T. I.
- introduce thermal driver for Hisilicon hi6220. Thanks to
kongxinwei.
- enhance Exynos thermal driver to handle Exynos5433 TMU. Thanks to
Chanwoo C.
- TI thermal driver now has a better implementation for EOCZ bit.
From Pavel M.
- add id for Skylake processors in int340x processor thermal driver.
- a couple of small fixes and cleanups."
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (36 commits)
thermal: hisilicon: add new hisilicon thermal sensor driver
dt-bindings: Document the hi6220 thermal sensor bindings
thermal: of-thermal: add support for reading coefficients property
thermal: support slope and offset coefficients
thermal: power_allocator: round the division when divvying up power
thermal: exynos: Add the support for Exynos5433 TMU
thermal: cpu_cooling: Fix power calculation when CPUs are offline
thermal: cpu_cooling: Remove cpu_dev update on policy CPU update
thermal: export thermal_zone_parameters to sysfs
thermal: cpu_cooling: Check memory allocation of power_table
ti-soc-thermal: request temperature periodically if hw can't do that itself
ti-soc-thermal: implement eocz bit to make driver useful on omap3
cleanup ti-soc-thermal
thermal: remove stale THERMAL_POWER_ACTOR select
thermal: Default OF created trip points to writable
thermal: core: Add Kconfig option to enable writable trips
thermal: x86_pkg_temp: drop const for thermal_zone_parameters
of: thermal: Introduce sustainable power for a thermal zone
thermal: add trace events to the power allocator governor
thermal: introduce the Power Allocator governor
...
Pull cgroup writeback support from Jens Axboe:
"This is the big pull request for adding cgroup writeback support.
This code has been in development for a long time, and it has been
simmering in for-next for a good chunk of this cycle too. This is one
of those problems that has been talked about for at least half a
decade, finally there's a solution and code to go with it.
Also see last weeks writeup on LWN:
http://lwn.net/Articles/648292/"
* 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits)
writeback, blkio: add documentation for cgroup writeback support
vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB
writeback: do foreign inode detection iff cgroup writeback is enabled
v9fs: fix error handling in v9fs_session_init()
bdi: fix wrong error return value in cgwb_create()
buffer: remove unusued 'ret' variable
writeback: disassociate inodes from dying bdi_writebacks
writeback: implement foreign cgroup inode bdi_writeback switching
writeback: add lockdep annotation to inode_to_wb()
writeback: use unlocked_inode_to_wb transaction in inode_congested()
writeback: implement unlocked_inode_to_wb transaction and use it for stat updates
writeback: implement [locked_]inode_to_wb_and_lock_list()
writeback: implement foreign cgroup inode detection
writeback: make writeback_control track the inode being written back
writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb()
mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use
writeback: implement memcg writeback domain based throttling
writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes
writeback: implement memcg wb_domain
writeback: update wb_over_bg_thresh() to use wb_domain aware operations
...
the ext4 encryption patches, which is a new feature added in the last
merge window. Also fix a number of long-standing xfstest failures.
(Quota writes failing due to ENOSPC, a race between truncate and
writepage in data=journalled mode that was causing generic/068 to
fail, and other corner cases.)
Also add support for FALLOC_FL_INSERT_RANGE, and improve jbd2
performance eliminating locking when a buffer is modified more than
once during a transaction (which is very common for allocation
bitmaps, for example), in which case the state of the journalled
buffer head doesn't need to change.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJVi3PeAAoJEPL5WVaVDYGj+I0H/jRPexvyvnGfxiqs1sxIlbSk
cwewFJSsuKsy/pGYdmHvozWZyWGGORc89NrxoNwdbG+axvHbgUWt/3+vF+rzmaek
vX4v9QvCEo4PfpRgzbnYJFhbxGMJtwci887sq1o/UoNXikFYT2kz8rpdf0++eO5W
/GJNRA5ZUY0L0eeloUILAMrBr7KjtkI2oXwOZt5q68jh7B3n3XdNQXyEiQS/28aK
QYcFrqA/e2Fiuk6l5OSGBCP38mySu+x0nBTLT5LFwwrUBnoZvGtdjM6Sj/yADDDn
uP/Zpq56aLzkFRwwItrDaF26BIf2MhIH/WUYs65CraEGxjMaiPuzAudGA/iUVL8=
=1BdR
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"A very large number of cleanups and bug fixes --- in particular for
the ext4 encryption patches, which is a new feature added in the last
merge window. Also fix a number of long-standing xfstest failures.
(Quota writes failing due to ENOSPC, a race between truncate and
writepage in data=journalled mode that was causing generic/068 to
fail, and other corner cases.)
Also add support for FALLOC_FL_INSERT_RANGE, and improve jbd2
performance eliminating locking when a buffer is modified more than
once during a transaction (which is very common for allocation
bitmaps, for example), in which case the state of the journalled
buffer head doesn't need to change"
[ I renamed "ext4_follow_link()" to "ext4_encrypted_follow_link()" in
the merge resolution, to make it clear that that function is _only_
used for encrypted symlinks. The function doesn't actually work for
non-encrypted symlinks at all, and they use the generic helpers
- Linus ]
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (52 commits)
ext4: set lazytime on remount if MS_LAZYTIME is set by mount
ext4: only call ext4_truncate when size <= isize
ext4: make online defrag error reporting consistent
ext4: minor cleanup of ext4_da_reserve_space()
ext4: don't retry file block mapping on bigalloc fs with non-extent file
ext4: prevent ext4_quota_write() from failing due to ENOSPC
ext4: call sync_blockdev() before invalidate_bdev() in put_super()
jbd2: speedup jbd2_journal_dirty_metadata()
jbd2: get rid of open coded allocation retry loop
ext4: improve warning directory handling messages
jbd2: fix ocfs2 corrupt when updating journal superblock fails
ext4: mballoc: avoid 20-argument function call
ext4: wait for existing dio workers in ext4_alloc_file_blocks()
ext4: recalculate journal credits as inode depth changes
jbd2: use GFP_NOFS in jbd2_cleanup_journal_tail()
ext4: use swap() in mext_page_double_lock()
ext4: use swap() in memswap()
ext4: fix race between truncate and __ext4_journalled_writepage()
ext4 crypto: fail the mount if blocksize != pagesize
ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate
...
Pull f2fs updates from Jaegeuk Kim:
"New features:
- per-file encryption (e.g., ext4)
- FALLOC_FL_ZERO_RANGE
- FALLOC_FL_COLLAPSE_RANGE
- RENAME_WHITEOUT
Major enhancement/fixes:
- recovery broken superblocks
- enhance f2fs_trim_fs with a discard_map
- fix a race condition on dentry block allocation
- fix a deadlock during summary operation
- fix a missing fiemap result
.. and many minor bug fixes and clean-ups were done"
* tag 'for-f2fs-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (83 commits)
f2fs: do not trim preallocated blocks when truncating after i_size
f2fs crypto: add alloc_bounce_page
f2fs crypto: fix to handle errors likewise ext4
f2fs: drop the volatile_write flag only
f2fs: skip committing valid superblock
f2fs: setting discard option in parse_options()
f2fs: fix to return exact trimmed size
f2fs: support FALLOC_FL_INSERT_RANGE
f2fs: hide common code in f2fs_replace_block
f2fs: disable the discard option when device doesn't support
f2fs crypto: remove alloc_page for bounce_page
f2fs: fix a deadlock for summary page lock vs. sentry_lock
f2fs crypto: clean up error handling in f2fs_fname_setup_filename
f2fs crypto: avoid f2fs_inherit_context for symlink
f2fs crypto: do not set encryption policy for non-directory by ioctl
f2fs crypto: allow setting encryption policy once
f2fs crypto: check context consistent for rename2
f2fs: avoid duplicated code by reusing f2fs_read_end_io
f2fs crypto: use per-inode tfm structure
f2fs: recovering broken superblock during mount
...
This is the usual grab bag of driver updates (lpfc, hpsa,
megaraid_sas, cxgbi, be2iscsi) plus an assortment of minor updates.
There are also one new driver: the Cisco snic; the advansys driver has
been rewritten to get rid of the warning about converting it to the
DMA API, the tape statistics patch got in and finally, there's a
resuffle of SCSI header files to separate more cleanly initiator from
target mode (and better share the common definitions).
Signed-off-by: James Bottomley <JBottomley@Odin.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJViKWdAAoJEDeqqVYsXL0MAr8IAMmlA6HBVjMJJFCEOY9corHj
e70MNQa7LUgf+JCdOtzGcvHXTiFFd4IHZAwXUJAnsC4IU2QWEfi1bjUTErlqBIGk
LoZlXXpEHnFpmWot3OluOzzcGcxede8rVgPiKWVVdojIngBC2+LL/i2vPCJ84ri9
WCVlk6KBvWZXuU6JuOKAb2FO9HOX7Q61wuKAMast2Qc6RNc2ksgc7VbstsITqzZ9
FVEsjmQ5lqUj+xdxBpiUOdUpc22IJ4VcpBgQ2HrThvg6vf4aq937RJ/g4vi/g0SU
Utk0a3bUw1H/WnYAfJVFx83nVEsS/954Z7/ERDg1sjlfLYwQtQnpov0XIbPIbZU=
=k9IT
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is the usual grab bag of driver updates (lpfc, hpsa,
megaraid_sas, cxgbi, be2iscsi) plus an assortment of minor updates.
There is also one new driver: the Cisco snic. The advansys driver has
been rewritten to get rid of the warning about converting it to the
DMA API, the tape statistics patch got in and finally, there's a
resuffle of SCSI header files to separate more cleanly initiator from
target mode (and better share the common definitions)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (156 commits)
snic: driver for Cisco SCSI HBA
qla2xxx: Fix indentation
qla2xxx: Comment out unreachable code
fusion: remove dead MTRR code
advansys: fix compilation errors and warnings when CONFIG_PCI is not set
mptsas: fix depth param in scsi_track_queue_full
megaraid: fix irq setup process regression
lpfc: Update version to 10.7.0.0 for upstream patch set.
lpfc: Fix to drop PLOGIs from fabric node till LOGO processing completes
lpfc: Fix scsi task management error message.
lpfc: Fix cq_id masking problem.
lpfc: Fix scsi prep dma buf error.
lpfc: Add support for using block multi-queue
lpfc: Devices are not discovered during takeaway/giveback testing
lpfc: Fix vport deletion failure.
lpfc: Check for active portpeerbeacon.
lpfc: Update driver version for upstream patch set 10.6.0.1.
lpfc: Change buffer pool empty message to miscellaneous category
lpfc: Fix incorrect log message reported for empty FCF record.
lpfc: Fix rport leak.
...
- ACPICA update to upstream revision 20150515 including basic
support for ACPI 6 features: new ACPI tables introduced by
ACPI 6 (STAO, XENV, WPBT, NFIT, IORT), changes related to the
other tables (DTRM, FADT, LPIT, MADT), new predefined names
(_BTH, _CR3, _DSD, _LPI, _MTL, _PRR, _RDI, _RST, _TFP, _TSN),
fixes and cleanups (Bob Moore, Lv Zheng).
- ACPI device power management core code update to follow ACPI 6
which reflects the ACPI device power management implementation
in Windows (Rafael J Wysocki).
- Rework of the backlight interface selection logic to reduce the
number of kernel command line options and improve the handling
of DMI quirks that may be involved in that and to make the
code generally more straightforward (Hans de Goede).
- Fixes for the ACPI Embedded Controller (EC) driver related to
the handling of EC transactions (Lv Zheng).
- Fix for a regression related to the ACPI resources management
and resulting from a recent change of ACPI initialization code
ordering (Rafael J Wysocki).
- Fix for a system initialization regression related to ACPI
introduced during the 3.14 cycle and caused by running the
code that switches the platform over to the ACPI mode too
early in the initialization sequence (Rafael J Wysocki).
- Support for the ACPI _CCA device configuration object related
to DMA cache coherence (Suravee Suthikulpanit).
- ACPI/APEI fixes and cleanups (Jiri Kosina, Borislav Petkov).
- ACPI battery driver cleanups (Luis Henriques, Mathias Krause).
- ACPI processor driver cleanups (Hanjun Guo).
- Cleanups and documentation update related to the ACPI device
properties interface based on _DSD (Rafael J Wysocki).
- ACPI device power management fixes (Rafael J Wysocki).
- Assorted cleanups related to ACPI (Dominik Brodowski. Fabian
Frederick, Lorenzo Pieralisi, Mathias Krause, Rafael J Wysocki).
- Fix for a long-standing issue causing General Protection Faults
to be generated occasionally on return to user space after resume
from ACPI-based suspend-to-RAM on 32-bit x86 (Ingo Molnar).
- Fix to make the suspend core code return -EBUSY consistently in
all cases when system suspend is aborted due to wakeup detection
(Ruchi Kandoi).
- Support for automated device wakeup IRQ handling allowing drivers
to make their PM support more starightforward (Tony Lindgren).
- New tracepoints for suspend-to-idle tracing and rework of the
prepare/complete callbacks tracing in the PM core (Todd E Brandt,
Rafael J Wysocki).
- Wakeup sources framework enhancements (Jin Qian).
- New macro for noirq system PM callbacks (Grygorii Strashko).
- Assorted cleanups related to system suspend (Rafael J Wysocki).
- cpuidle core cleanups to make the code more efficient (Rafael J
Wysocki).
- powernv/pseries cpuidle driver update (Shilpasri G Bhat).
- cpufreq core fixes related to CPU online/offline that should
reduce the overhead of these operations quite a bit, unless the
CPU in question is physically going away (Viresh Kumar, Saravana
Kannan).
- Serialization of cpufreq governor callbacks to avoid race
conditions in some cases (Viresh Kumar).
- intel_pstate driver fixes and cleanups (Doug Smythies, Prarit
Bhargava, Joe Konno).
- cpufreq driver (arm_big_little, cpufreq-dt, qoriq) updates (Sudeep
Holla, Felipe Balbi, Tang Yuantian).
- Assorted cleanups in cpufreq drivers and core (Shailendra Verma,
Fabian Frederick, Wang Long).
- New Device Tree bindings for representing Operating Performance
Points (Viresh Kumar).
- Updates for the common clock operations support code in the PM
core (Rajendra Nayak, Geert Uytterhoeven).
- PM domains core code update (Geert Uytterhoeven).
- Intel Knights Landing support for the RAPL (Running Average Power
Limit) power capping driver (Dasaratharaman Chandramouli).
- Fixes related to the floor frequency setting on Atom SoCs in the
RAPL power capping driver (Ajay Thomas).
- Runtime PM framework documentation update (Ben Dooks).
- cpupower tool fix (Herton R Krzesinski).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJViJdWAAoJEILEb/54YlRx/9gP/3gHoFevNRycvn0VpKqdufCI
Mxy2LBBLlfyW2uD3+NvqvA2WWSo0Cs/LgXa04eAVxPdU7k48s8w+54U23wSouzjW
gfwAmuHxzDR8v0h8X3h6BxNzmkIQHtmDcQlA/cZdHejY/UUw01yxRGNUUZDNbxlm
WXn2nmlBLmGqXTYq0fpBV+3jicUghJqHHsBCqa3VR2yQioHMJG01F4UZMqYTZunN
OIvDUghxByKz6alzdCqlLl1Y0exV6vwWUAzBsl1qHqmHu/bWFSZn3ujNNVrjqHhw
Kl7/8dC2pQkv3Zo3gEVvfQ0onotwWZxGHzPQRdvmxvRnBunQVCi/wynx90yABX/r
PPb/iBNV0mZskbF0zb0GZT3ZZWGA8Z0p3o5JQv2jV4m62qTzx8w50Y5kbn9N1WT+
5bre7AVbVAlGonWszcS9iE+6TOboRz9OD1CCwPFXHItFutlBkau+1hHfFoLM0o9n
LhpGuyszT/EUa1BHkLzuCckFqO2DpbF3N2CKmuTekw0CdgdsvRL2pRByuerk3j7R
WQhlcvBq5YH6j43AuoEZKp8r1iN8oG/iqlrMYQaYWrW9hJaoQOoU8dGJxp/e7gKN
r/qeYjETI+tIsjCbtH5WQzzxDI3gPISAYAtfqs7G34EEo+Lwp6kyRUAF4kDot2V3
ZIyuKMmTu4cdwDETr/O+
=7jTj
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI updates from Rafael Wysocki:
"The rework of backlight interface selection API from Hans de Goede
stands out from the number of commits and the number of affected
places perspective. The cpufreq core fixes from Viresh Kumar are
quite significant too as far as the number of commits goes and because
they should reduce CPU online/offline overhead quite a bit in the
majority of cases.
From the new featues point of view, the ACPICA update (to upstream
revision 20150515) adding support for new ACPI 6 material to ACPICA is
the one that matters the most as some new significant features will be
based on it going forward. Also included is an update of the ACPI
device power management core to follow ACPI 6 (which in turn reflects
the Windows' device PM implementation), a PM core extension to support
wakeup interrupts in a more generic way and support for the ACPI _CCA
device configuration object.
The rest is mostly fixes and cleanups all over and some documentation
updates, including new DT bindings for Operating Performance Points.
There is one fix for a regression introduced in the 4.1 cycle, but it
adds quite a number of lines of code, it wasn't really ready before
Thursday and you were on vacation, so I refrained from pushing it on
the last minute for 4.1.
Specifics:
- ACPICA update to upstream revision 20150515 including basic support
for ACPI 6 features: new ACPI tables introduced by ACPI 6 (STAO,
XENV, WPBT, NFIT, IORT), changes related to the other tables (DTRM,
FADT, LPIT, MADT), new predefined names (_BTH, _CR3, _DSD, _LPI,
_MTL, _PRR, _RDI, _RST, _TFP, _TSN), fixes and cleanups (Bob Moore,
Lv Zheng).
- ACPI device power management core code update to follow ACPI 6
which reflects the ACPI device power management implementation in
Windows (Rafael J Wysocki).
- rework of the backlight interface selection logic to reduce the
number of kernel command line options and improve the handling of
DMI quirks that may be involved in that and to make the code
generally more straightforward (Hans de Goede).
- fixes for the ACPI Embedded Controller (EC) driver related to the
handling of EC transactions (Lv Zheng).
- fix for a regression related to the ACPI resources management and
resulting from a recent change of ACPI initialization code ordering
(Rafael J Wysocki).
- fix for a system initialization regression related to ACPI
introduced during the 3.14 cycle and caused by running the code
that switches the platform over to the ACPI mode too early in the
initialization sequence (Rafael J Wysocki).
- support for the ACPI _CCA device configuration object related to
DMA cache coherence (Suravee Suthikulpanit).
- ACPI/APEI fixes and cleanups (Jiri Kosina, Borislav Petkov).
- ACPI battery driver cleanups (Luis Henriques, Mathias Krause).
- ACPI processor driver cleanups (Hanjun Guo).
- cleanups and documentation update related to the ACPI device
properties interface based on _DSD (Rafael J Wysocki).
- ACPI device power management fixes (Rafael J Wysocki).
- assorted cleanups related to ACPI (Dominik Brodowski, Fabian
Frederick, Lorenzo Pieralisi, Mathias Krause, Rafael J Wysocki).
- fix for a long-standing issue causing General Protection Faults to
be generated occasionally on return to user space after resume from
ACPI-based suspend-to-RAM on 32-bit x86 (Ingo Molnar).
- fix to make the suspend core code return -EBUSY consistently in all
cases when system suspend is aborted due to wakeup detection (Ruchi
Kandoi).
- support for automated device wakeup IRQ handling allowing drivers
to make their PM support more starightforward (Tony Lindgren).
- new tracepoints for suspend-to-idle tracing and rework of the
prepare/complete callbacks tracing in the PM core (Todd E Brandt,
Rafael J Wysocki).
- wakeup sources framework enhancements (Jin Qian).
- new macro for noirq system PM callbacks (Grygorii Strashko).
- assorted cleanups related to system suspend (Rafael J Wysocki).
- cpuidle core cleanups to make the code more efficient (Rafael J
Wysocki).
- powernv/pseries cpuidle driver update (Shilpasri G Bhat).
- cpufreq core fixes related to CPU online/offline that should reduce
the overhead of these operations quite a bit, unless the CPU in
question is physically going away (Viresh Kumar, Saravana Kannan).
- serialization of cpufreq governor callbacks to avoid race
conditions in some cases (Viresh Kumar).
- intel_pstate driver fixes and cleanups (Doug Smythies, Prarit
Bhargava, Joe Konno).
- cpufreq driver (arm_big_little, cpufreq-dt, qoriq) updates (Sudeep
Holla, Felipe Balbi, Tang Yuantian).
- assorted cleanups in cpufreq drivers and core (Shailendra Verma,
Fabian Frederick, Wang Long).
- new Device Tree bindings for representing Operating Performance
Points (Viresh Kumar).
- updates for the common clock operations support code in the PM core
(Rajendra Nayak, Geert Uytterhoeven).
- PM domains core code update (Geert Uytterhoeven).
- Intel Knights Landing support for the RAPL (Running Average Power
Limit) power capping driver (Dasaratharaman Chandramouli).
- fixes related to the floor frequency setting on Atom SoCs in the
RAPL power capping driver (Ajay Thomas).
- runtime PM framework documentation update (Ben Dooks).
- cpupower tool fix (Herton R Krzesinski)"
* tag 'pm+acpi-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (194 commits)
cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state
x86: Load __USER_DS into DS/ES after resume
PM / OPP: Add binding for 'opp-suspend'
PM / OPP: Allow multiple OPP tables to be passed via DT
PM / OPP: Add new bindings to address shortcomings of existing bindings
ACPI: Constify ACPI device IDs in documentation
ACPI / enumeration: Document the rules regarding the PRP0001 device ID
ACPI / video: Make acpi_video_unregister_backlight() private
acpi-video-detect: Remove old API
toshiba-acpi: Port to new backlight interface selection API
thinkpad-acpi: Port to new backlight interface selection API
sony-laptop: Port to new backlight interface selection API
samsung-laptop: Port to new backlight interface selection API
msi-wmi: Port to new backlight interface selection API
msi-laptop: Port to new backlight interface selection API
intel-oaktrail: Port to new backlight interface selection API
ideapad-laptop: Port to new backlight interface selection API
fujitsu-laptop: Port to new backlight interface selection API
eeepc-laptop: Port to new backlight interface selection API
dell-wmi: Port to new backlight interface selection API
...
Pull timer updates from Thomas Gleixner:
"A rather largish update for everything time and timer related:
- Cache footprint optimizations for both hrtimers and timer wheel
- Lower the NOHZ impact on systems which have NOHZ or timer migration
disabled at runtime.
- Optimize run time overhead of hrtimer interrupt by making the clock
offset updates smarter
- hrtimer cleanups and removal of restrictions to tackle some
problems in sched/perf
- Some more leap second tweaks
- Another round of changes addressing the 2038 problem
- First step to change the internals of clock event devices by
introducing the necessary infrastructure
- Allow constant folding for usecs/msecs_to_jiffies()
- The usual pile of clockevent/clocksource driver updates
The hrtimer changes contain updates to sched, perf and x86 as they
depend on them plus changes all over the tree to cleanup API changes
and redundant code, which got copied all over the place. The y2038
changes touch s390 to remove the last non 2038 safe code related to
boot/persistant clock"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
clocksource: Increase dependencies of timer-stm32 to limit build wreckage
timer: Minimize nohz off overhead
timer: Reduce timer migration overhead if disabled
timer: Stats: Simplify the flags handling
timer: Replace timer base by a cpu index
timer: Use hlist for the timer wheel hash buckets
timer: Remove FIFO "guarantee"
timers: Sanitize catchup_timer_jiffies() usage
hrtimer: Allow hrtimer::function() to free the timer
seqcount: Introduce raw_write_seqcount_barrier()
seqcount: Rename write_seqcount_barrier()
hrtimer: Fix hrtimer_is_queued() hole
hrtimer: Remove HRTIMER_STATE_MIGRATE
selftest: Timers: Avoid signal deadlock in leap-a-day
timekeeping: Copy the shadow-timekeeper over the real timekeeper last
clockevents: Check state instead of mode in suspend/resume path
selftests: timers: Add leap-second timer edge testing to leap-a-day.c
ntp: Do leapsecond adjustment in adjtimex read path
time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge
ntp: Introduce and use SECS_PER_DAY macro instead of 86400
...
Pull scheduler updates from Ingo Molnar:
"The main changes are:
- lockless wakeup support for futexes and IPC message queues
(Davidlohr Bueso, Peter Zijlstra)
- Replace spinlocks with atomics in thread_group_cputimer(), to
improve scalability (Jason Low)
- NUMA balancing improvements (Rik van Riel)
- SCHED_DEADLINE improvements (Wanpeng Li)
- clean up and reorganize preemption helpers (Frederic Weisbecker)
- decouple page fault disabling machinery from the preemption
counter, to improve debuggability and robustness (David
Hildenbrand)
- SCHED_DEADLINE documentation updates (Luca Abeni)
- topology CPU masks cleanups (Bartosz Golaszewski)
- /proc/sched_debug improvements (Srikar Dronamraju)"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (79 commits)
sched/deadline: Remove needless parameter in dl_runtime_exceeded()
sched: Remove superfluous resetting of the p->dl_throttled flag
sched/deadline: Drop duplicate init_sched_dl_class() declaration
sched/deadline: Reduce rq lock contention by eliminating locking of non-feasible target
sched/deadline: Make init_sched_dl_class() __init
sched/deadline: Optimize pull_dl_task()
sched/preempt: Add static_key() to preempt_notifiers
sched/preempt: Fix preempt notifiers documentation about hlist_del() within unsafe iteration
sched/stop_machine: Fix deadlock between multiple stop_two_cpus()
sched/debug: Add sum_sleep_runtime to /proc/<pid>/sched
sched/debug: Replace vruntime with wait_sum in /proc/sched_debug
sched/debug: Properly format runnable tasks in /proc/sched_debug
sched/numa: Only consider less busy nodes as numa balancing destinations
Revert 095bebf61a ("sched/numa: Do not move past the balance point if unbalanced")
sched/fair: Prevent throttling in early pick_next_task_fair()
preempt: Reorganize the notrace definitions a bit
preempt: Use preempt_schedule_context() as the official tracing preemption point
sched: Make preempt_schedule_context() function-tracing safe
x86: Remove cpu_sibling_mask() and cpu_core_mask()
x86: Replace cpu_**_mask() with topology_**_cpumask()
...
Remove outdated comments and dead code from ext4_da_reserve_space.
Clean up its trace point, and relocate it to make it more useful.
While we're at it, fix a nearby conditional used to determine if
we have a non-bigalloc file system. It doesn't match usage elsewhere
in the code, and misleadingly suggests that an s_cluster_ratio value
of 0 would be legal.
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Instead of storing a pointer to the per cpu tvec_base we can simply
cache a CPU index in the timer_list and use that to get hold of the
correct per cpu tvec_base. This is only used in lock_timer_base() and
the slightly larger code is peanuts versus the spinlock operation and
the d-cache foot print of the timer wheel.
Aside of that this allows to get rid of following nuisances:
- boot_tvec_base
That statically allocated 4k bss data is just kept around so the
timer has a home when it gets statically initialized. It serves no
other purpose.
With the CPU index we assign the timer to CPU0 at static
initialization time and therefor can avoid the whole boot_tvec_base
dance. That also simplifies the init code, which just can use the
per cpu base.
Before:
text data bss dec hex filename
17491 9201 4160 30852 7884 ../build/kernel/time/timer.o
After:
text data bss dec hex filename
17440 9193 0 26633 6809 ../build/kernel/time/timer.o
- Overloading the base pointer with various flags
The CPU index has enough space to hold the flags (deferrable,
irqsafe) so we can get rid of the extra masking and bit fiddling
with the base pointer.
As a benefit we reduce the size of struct timer_list on 64 bit
machines. 4 - 8 bytes, a size reduction up to 15% per struct timer_list,
which is a real win as we have tons of them embedded in other structs.
This changes also the newly added deferrable printout of the timer
start trace point to capture and print all timer->flags, which allows
us to decode the target cpu of the timer as well.
We might have used bitfields for this, but that would change the
static initializers and the init function for no value to accomodate
big endian bitfields.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Badhri Jagan Sridharan <Badhri@google.com>
Link: http://lkml.kernel.org/r/20150526224511.950084301@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* pm-cpufreq: (37 commits)
cpufreq: dt: allow driver to boot automatically
intel_pstate: Fix overflow in busy_scaled due to long delay
cpufreq: qoriq: optimize the CPU frequency switching time
cpufreq: gx-suspmod: Fix two typos in two comments
cpufreq: nforce2: Fix typo in comment to function nforce2_init()
cpufreq: governor: Serialize governor callbacks
cpufreq: governor: split cpufreq_governor_dbs()
cpufreq: governor: register notifier from cs_init()
cpufreq: Remove cpufreq_update_policy()
cpufreq: Restart governor as soon as possible
cpufreq: Call cpufreq_policy_put_kobj() from cpufreq_policy_free()
cpufreq: Initialize policy->kobj while allocating policy
cpufreq: Stop migrating sysfs files on hotplug
cpufreq: Don't allow updating inactive policies from sysfs
intel_pstate: Force setting target pstate when required
intel_pstate: change some inconsistent debug information
cpufreq: Track cpu managing sysfs kobjects separately
cpufreq: Fix for typos in two comments
cpufreq: Mark policy->governor = NULL for inactive policies
cpufreq: Manage governor usage history with 'policy->last_governor'
...
This patch implements fallocate's FALLOC_FL_INSERT_RANGE for Ext4.
1) Make sure that both offset and len are block size aligned.
2) Update the i_size of inode by len bytes.
3) Compute the file's logical block number against offset. If the computed
block number is not the starting block of the extent, split the extent
such that the block number is the starting block of the extent.
4) Shift all the extents which are lying between [offset, last allocated extent]
towards right by len bytes. This step will make a hole of len bytes
at offset.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Pull block layer fixes from Jens Axboe:
"Sending this off now, as I'm not aware of other current bugs, nor do I
expect further fixes before 4.1 final. This contains two fixes:
- a fix for a bdi unregister warning that gets spewed on md, due to a
regression introduced earlier in this cycle. From Neil Brown.
- a fix for a compile warning for NVMe on 32-bit platforms, also a
regression introduced in this cycle. From Arnd Bergmann"
* 'for-linus' of git://git.kernel.dk/linux-block:
NVMe: fix type warning on 32-bit
block: discard bdi_unregister() in favour of bdi_destroy()
Only include SCSI initiator header files in target code that needs
these header files, namely the SCSI pass-through code and the tcm_loop
driver. Change SCSI_SENSE_BUFFERSIZE into TRANSPORT_SENSE_BUFFER in
target code because the former is intended for initiator code and the
latter for target code. With this patch the only initiator include
directives in target code that remain are as follows:
$ git grep -nHE 'include .scsi/(scsi.h|scsi_host.h|scsi_device.h|scsi_cmnd.h)' drivers/target drivers/infiniband/ulp/{isert,srpt} drivers/usb/gadget/legacy/tcm_*.[ch] drivers/{vhost,xen} include/{target,trace/events/target.h}
drivers/target/loopback/tcm_loop.c:29:#include <scsi/scsi.h>
drivers/target/loopback/tcm_loop.c:31:#include <scsi/scsi_host.h>
drivers/target/loopback/tcm_loop.c:32:#include <scsi/scsi_device.h>
drivers/target/loopback/tcm_loop.c:33:#include <scsi/scsi_cmnd.h>
drivers/target/target_core_pscsi.c:39:#include <scsi/scsi_device.h>
drivers/target/target_core_pscsi.c:40:#include <scsi/scsi_host.h>
drivers/xen/xen-scsiback.c:52:#include <scsi/scsi_host.h> /* SG_ALL */
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
This patch is a part of the series to define wb_domain which
represents a domain that wb's (bdi_writeback's) belong to and are
measured against each other in. This will enable IO backpressure
propagation for cgroup writeback.
global_dirty_limit exists to regulate the global dirty threshold which
is a property of the wb_domain. This patch moves hard_dirty_limit,
dirty_lock, and update_time into wb_domain.
This is pure reorganization and doesn't introduce any behavioral
changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback)
and the role of the separation is unclear. For cgroup support for
writeback IOs, a bdi will be updated to host multiple wb's where each
wb serves writeback IOs of a different cgroup on the bdi. To achieve
that, a wb should carry all states necessary for servicing writeback
IOs for a cgroup independently.
This patch moves bandwidth related fields from backing_dev_info into
bdi_writeback.
* The moved fields are: bw_time_stamp, dirtied_stamp, written_stamp,
write_bandwidth, avg_write_bandwidth, dirty_ratelimit,
balanced_dirty_ratelimit, completions and dirty_exceeded.
* writeback_chunk_size() and over_bground_thresh() now take @wb
instead of @bdi.
* bdi_writeout_fraction(bdi, ...) -> wb_writeout_fraction(wb, ...)
bdi_dirty_limit(bdi, ...) -> wb_dirty_limit(wb, ...)
bdi_position_ration(bdi, ...) -> wb_position_ratio(wb, ...)
bdi_update_writebandwidth(bdi, ...) -> wb_update_write_bandwidth(wb, ...)
[__]bdi_update_bandwidth(bdi, ...) -> [__]wb_update_bandwidth(wb, ...)
bdi_{max|min}_pause(bdi, ...) -> wb_{max|min}_pause(wb, ...)
bdi_dirty_limits(bdi, ...) -> wb_dirty_limits(wb, ...)
* Init/exits of the relocated fields are moved to bdi_wb_init/exit()
respectively. Note that explicit zeroing is dropped in the process
as wb's are cleared in entirety anyway.
* As there's still only one bdi_writeback per backing_dev_info, all
uses of bdi->stat[] are mechanically replaced with bdi->wb.stat[]
introducing no behavior changes.
v2: Typo in description fixed as suggested by Jan.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Since tracepoints use RCU for protection, they must not be called on
offline cpus. trace_mm_page_pcpu_drain can be called on an offline cpu
in this scenario caught by LOCKDEP:
===============================
[ INFO: suspicious RCU usage. ]
4.1.0-rc1+ #9 Not tainted
-------------------------------
include/trace/events/kmem.h:265 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
1 lock held by swapper/5/0:
#0: (&(&zone->lock)->rlock){..-...}, at: [<c0000000002073b0>] .free_pcppages_bulk+0x70/0x920
stack backtrace:
CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.1.0-rc1+ #9
Call Trace:
.dump_stack+0x98/0xd4 (unreliable)
.lockdep_rcu_suspicious+0x108/0x170
.free_pcppages_bulk+0x60c/0x920
.free_hot_cold_page+0x208/0x280
.destroy_context+0x90/0xd0
.__mmdrop+0x58/0x160
.idle_task_exit+0xf0/0x100
.pnv_smp_cpu_kill_self+0x58/0x2c0
.cpu_die+0x34/0x50
.arch_cpu_idle_dead+0x20/0x40
.cpu_startup_entry+0x708/0x7a0
.start_secondary+0x36c/0x3a0
start_secondary_prolog+0x10/0x14
Fix this by converting mm_page_pcpu_drain trace point into
TRACE_EVENT_CONDITION where condition is cpu_online(smp_processor_id())
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since tracepoints use RCU for protection, they must not be called on
offline cpus. trace_mm_page_free can be called on an offline cpu in this
scenario caught by LOCKDEP:
===============================
[ INFO: suspicious RCU usage. ]
4.1.0-rc1+ #9 Not tainted
-------------------------------
include/trace/events/kmem.h:170 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/1/0.
stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.1.0-rc1+ #9
Call Trace:
.dump_stack+0x98/0xd4 (unreliable)
.lockdep_rcu_suspicious+0x108/0x170
.free_pages_prepare+0x494/0x680
.free_hot_cold_page+0x50/0x280
.destroy_context+0x90/0xd0
.__mmdrop+0x58/0x160
.idle_task_exit+0xf0/0x100
.pnv_smp_cpu_kill_self+0x58/0x2c0
.cpu_die+0x34/0x50
.arch_cpu_idle_dead+0x20/0x40
.cpu_startup_entry+0x708/0x7a0
.start_secondary+0x36c/0x3a0
start_secondary_prolog+0x10/0x14
Fix this by converting mm_page_free trace point into TRACE_EVENT_CONDITION
where condition is cpu_online(smp_processor_id())
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since tracepoints use RCU for protection, they must not be called on
offline cpus. trace_kmem_cache_free can be called on an offline cpu in
this scenario caught by LOCKDEP:
===============================
[ INFO: suspicious RCU usage. ]
4.1.0-rc1+ #9 Not tainted
-------------------------------
include/trace/events/kmem.h:148 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/1/0.
stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.1.0-rc1+ #9
Call Trace:
.dump_stack+0x98/0xd4 (unreliable)
.lockdep_rcu_suspicious+0x108/0x170
.kmem_cache_free+0x344/0x4b0
.__mmdrop+0x4c/0x160
.idle_task_exit+0xf0/0x100
.pnv_smp_cpu_kill_self+0x58/0x2c0
.cpu_die+0x34/0x50
.arch_cpu_idle_dead+0x20/0x40
.cpu_startup_entry+0x708/0x7a0
.start_secondary+0x36c/0x3a0
start_secondary_prolog+0x10/0x14
Fix this by converting kmem_cache_free trace point into
TRACE_EVENT_CONDITION where condition is cpu_online(smp_processor_id())
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
bdi_unregister() now contains very little functionality.
It contains a "WARN_ON" if bdi->dev is NULL. This warning is of no
real consequence as bdi->dev isn't needed by anything else in the function,
and it triggers if
blk_cleanup_queue() -> bdi_destroy()
is called before bdi_unregister, which happens since
Commit: 6cd18e711d ("block: destroy bdi before blockdev is unregistered.")
So this isn't wanted.
It also calls bdi_set_min_ratio(). This needs to be called after
writes through the bdi have all been flushed, and before the bdi is destroyed.
Calling it early is better than calling it late as it frees up a global
resource.
Calling it immediately after bdi_wb_shutdown() in bdi_destroy()
perfectly fits these requirements.
So bdi_unregister() can be discarded with the important content moved to
bdi_destroy(), as can the
writeback_bdi_unregister
event which is already not used.
Reported-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org (v4.0)
Fixes: c4db59d31e ("fs: don't reassign dirty inodes to default_backing_dev_info")
Fixes: 6cd18e711d ("block: destroy bdi before blockdev is unregistered.")
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
The timer_start event now shows whether the timer is
deferrable in case of a low-res timer. The debug_activate
function now includes a deferrable flag while calling
the trace_timer_start event.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com>
[jstultz: Fixed minor whitespace and grammer tweaks
pointed out by Ingo]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Currently people use TASK_INTERRUPTIBLE to idle kthreads and wait for
'work' because TASK_UNINTERRUPTIBLE contributes to the loadavg. Having
all idle kthreads contribute to the loadavg is somewhat silly.
Now mostly this works OK, because kthreads have all their signals
masked. However there's a few sites where this is causing problems and
TASK_UNINTERRUPTIBLE should be used, except for that loadavg issue.
This patch adds TASK_NOLOAD which, when combined with
TASK_UNINTERRUPTIBLE avoids the loadavg accounting.
As most of imagined usage sites are loops where a thread wants to
idle, waiting for work, a helper TASK_IDLE is introduced.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The function ftrace_get_offsets_##call()
is used to find the offset into dynamically allocated trace event fields
for printing. It has nothing to do with function tracing. Rename it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The function ftrace_define_fields_##call()
is used to define how to process the trace_event fields. It has nothing to
do with function tracing. Rename it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The structure ftrace_event_type_funcs_##call
is used to define how the trace_events will be printed. It has nothing to
do with function tracing. Rename it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The structure ftrace_data_offset_##call is
used to find the offsets of dynamically allocated fields in trace_events.
It has nothing to do with function tracing. Rename it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The ftrace_raw_##call structures are built
by macros for trace events. They have nothing to do with function tracing.
Rename them.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The ftrace_trigger_soft_disabled() tests if a
trace_event is soft disabled (called but not traced), and returns true if
it is. It has nothing to do with function tracing and should be renamed.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The FTRACE_EVENT_FL_* flags are flags to
do with the trace_event files in the tracefs directory. They are not related
to function tracing. Rename them to a more descriptive name.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The ftrace_output_*() and ftrace_raw_output_*()
functions represent the trace_event code. Rename them to just trace_output
or trace_raw_output.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The ftrace_event_buffer functions and data
structures are for trace_events and not for function hooks. Rename them
to trace_event_buffer*.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The structures ftrace_event_call and
ftrace_event_class have nothing to do with the function hooks, and are
really trace_event structures. Rename ftrace_event_* to trace_event_*.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The structure ftrace_event_file is really
about trace events and not "ftrace". Rename it to trace_event_file.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The functions (un)register_ftrace_event() is
really about trace_events, and the name should be register_trace_event()
instead.
Also renamed ftrace_event_reg() to trace_event_reg() for the same reason.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The functions ftrace_print_*() are not part of
the function infrastructure, and the names can be confusing. Rename them
to be trace_print_*().
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The term "ftrace" is really the infrastructure of the function hooks,
and not the trace events. Rename ftrace_event.h to trace_events.h to
represent the trace_event infrastructure and decouple the term ftrace
from it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The trace_event.h file is for the generic trace event code. Move
the perf related code into its own trace header file perf.h
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The name "ftrace" really refers to the function hook infrastructure. It
is not about the TRACE_EVENT() macros. The file trace/ftrace.h was originally
written to be mostly focused toward the "ftrace" code (that in kernel/trace/)
but ended up being generic and used by perf and others.
Rename the file to be less confusing about what infrastructure it belongs to.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This v4l2_buffer flag can be used by drivers to mark a capture buffer
as the last generated buffer, for example after a V4L2_DEC_CMD_STOP
command was issued.
Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Kamil Debski <k.debski@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Add trace events for the power allocator governor and the power actor
interface of the cpu cooling device.
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
The intel_pstate driver is difficult to debug and investigate without tsc.
Also, it is likely use of tsc, and some version of C0 percentage,
will be re-introdcued in futute.
There have also been occasions where it is desirebale to know, and
confirm, the previous target pstate.
This patch brings back tsc, adds previous target pstate,
and adds both to the trace data collection.
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull fourth vfs update from Al Viro:
"d_inode() annotations from David Howells (sat in for-next since before
the beginning of merge window) + four assorted fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RCU pathwalk breakage when running into a symlink overmounting something
fix I_DIO_WAKEUP definition
direct-io: only inc/dec inode->i_dio_count for file systems
fs/9p: fix readdir()
VFS: assorted d_backing_inode() annotations
VFS: fs/inode.c helpers: d_inode() annotations
VFS: fs/cachefiles: d_backing_inode() annotations
VFS: fs library helpers: d_inode() annotations
VFS: assorted weird filesystems: d_inode() annotations
VFS: normal filesystems (and lustre): d_inode() annotations
VFS: security/: d_inode() annotations
VFS: security/: d_backing_inode() annotations
VFS: net/: d_inode() annotations
VFS: net/unix: d_backing_inode() annotations
VFS: kernel/: d_inode() annotations
VFS: audit: d_backing_inode() annotations
VFS: Fix up some ->d_inode accesses in the chelsio driver
VFS: Cachefiles should perform fs modifications on the top layer only
VFS: AF_UNIX sockets should call mknod on the top layer only
drivers and updates to existing ones for feature enhancements and bug
fixes. There is more churn than usual in the framework core due to the
change to introduce per-user unique struct clk pointers in 4.0. This
caused several regressions to surface, some of which were sent as fixes
to 4.0. New generic clock drivers were added for GPIO- and PWM-based
clock controllers. Additionally the common clk-divider code recieved
several fixes to the way it rounds rates.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVNcIIAAoJEKI6nJvDJaTU3a8QAM+fjhDMY5xpI6VIbxZaA2aR
VUofw9/rdAtP1UdwtlSKBvCqpwwqt/U7zlMWU9v+UvTjYdHIf9SIDQoJnd+uEtwL
roz/kNeB7WOVyxwbTJ2B5fjvPSN+mq8Rm8ANDcL8ZOGxxtt2Mip1IWMAlx2XUnwG
tYZhB7EfKzLHZRblOdn2Q4U/4T+KXOFTSO+Gb9o2J0I2sJLI0NRXhcl9Fcoo8KVz
G0ACWa0F1WKsbqzBATnhtYiKkuC3BeiS2eMuTVTlkP+Gd6YQ2f1zWLeBfXEiPGZb
q0p/qTrUFLHbRoJMMuWaUfaBxb8PeUfM6yllxrzvRxPJU25pbj8OW/O5ZAe9xP8G
S17sQ2nhEoWZW9hqbuA39IcLGa6RjT+TD+z3kmXQ9ZvCVDN2Oqqb/4ZNViwAvQq7
t67EfV7hGXty3Q58tS4XE9hHfwY+9YqMDLNIS/ED+hP8rcxTmiLlAIyk+qbT3b0l
Q+375Ar7iCgihPPHYxeM5Qe1+Vsfh4NjR9thdAbT245MB3f90ULb+GNP/izUDOgA
c/Ot6pStVFEUxTol6RlcLb85PugzrkoBOF/8ZLySdMLhALjPwaFcQZ1sFdcKUKlE
tt7sZKQgbbCfqYGS9K264uUfWbdmZh05zhtkH0xUjyQpyIcnrYQsSIIEEnlbYnPp
0D55nooSGROKeud+gyrx
=2LMr
-----END PGP SIGNATURE-----
Merge tag 'clk-for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clock framework updates from Michael Turquette:
"The changes to the common clock framework for 4.0 are mostly new clock
drivers and updates to existing ones for feature enhancements and bug
fixes.
There is more churn than usual in the framework core due to the change
to introduce per-user unique struct clk pointers in 4.0. This caused
several regressions to surface, some of which were sent as fixes to
4.0. New generic clock drivers were added for GPIO- and PWM-based
clock controllers.
Additionally the common clk-divider code recieved several fixes to the
way it rounds rates"
* tag 'clk-for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (91 commits)
clk: check ->determine/round_rate() return value in clk_calc_new_rates
clk: at91: usb: propagate rate modification to the parent clk
clk: samsung: exynos4: Disable ARMCLK down feature on Exynos4210 SoC
clk: don't use __initconst for non-const arrays
clk: at91: change to using endian agnositc IO
clk: clk-gpio-gate: Fix active low
clk: Add PWM clock driver
clk: Add clock driver for mb86s7x
clk: pxa: pxa3xx: add missing os timer clock
clk: tegra: Use the proper parent for plld_dsi
clk: tegra: Use generic tegra_osc_clk_init() on Tegra114
clk: tegra: Model oscillator as clock
clk: tegra: Add peripheral registers for bank Y
clk: tegra: Register the proper number of resets
clk: tegra: Remove needless initializations
clk: tegra: Use consistent indentation
clk: tegra: Various whitespace cleanups
clk: tegra: Enable HDA to HDMI clocks on Tegra124
clk: tegra: Fix a bunch of sparse warnings
clk: tegra: Fix typo tabel -> table
...
Pull perf updates from Ingo Molnar:
"This update has mostly fixes, but also other bits:
- perf tooling fixes
- PMU driver fixes
- Intel Broadwell PMU driver HW-enablement for LBR callstacks
- a late coming 'perf kmem' tool update that enables it to also
analyze page allocation data. Note, this comes with MM tracepoint
changes that we believe to not break anything: because it changes
the formerly opaque 'struct page *' field that uniquely identifies
pages to 'pfn' which identifies pages uniquely too, but isn't as
opaque and can be used for other purposes as well"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/pt: Fix and clean up error handling in pt_event_add()
perf/x86/intel: Add Broadwell support for the LBR callstack
perf/x86/intel/rapl: Fix energy counter measurements but supporing per domain energy units
perf/x86/intel: Fix Core2,Atom,NHM,WSM cycles:pp events
perf/x86: Fix hw_perf_event::flags collision
perf probe: Fix segfault when probe with lazy_line to file
perf probe: Find compilation directory path for lazy matching
perf probe: Set retprobe flag when probe in address-based alternative mode
perf kmem: Analyze page allocator events also
tracing, mm: Record pfn instead of pointer to struct page
Pull f2fs updates from Jaegeuk Kim:
"New features:
- in-memory extent_cache
- fs_shutdown to test power-off-recovery
- use inline_data to store symlink path
- show f2fs as a non-misc filesystem
Major fixes:
- avoid CPU stalls on sync_dirty_dir_inodes
- fix some power-off-recovery procedure
- fix handling of broken symlink correctly
- fix missing dot and dotdot made by sudden power cuts
- handle wrong data index during roll-forward recovery
- preallocate data blocks for direct_io
... and a bunch of minor bug fixes and cleanups"
* tag 'for-f2fs-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (71 commits)
f2fs: pass checkpoint reason on roll-forward recovery
f2fs: avoid abnormal behavior on broken symlink
f2fs: flush symlink path to avoid broken symlink after POR
f2fs: change 0 to false for bool type
f2fs: do not recover wrong data index
f2fs: do not increase link count during recovery
f2fs: assign parent's i_mode for empty dir
f2fs: add F2FS_INLINE_DOTS to recover missing dot dentries
f2fs: fix mismatching lock and unlock pages for roll-forward recovery
f2fs: fix sparse warnings
f2fs: limit b_size of mapped bh in f2fs_map_bh
f2fs: persist system.advise into on-disk inode
f2fs: avoid NULL pointer dereference in f2fs_xattr_advise_get
f2fs: preallocate fallocated blocks for direct IO
f2fs: enable inline data by default
f2fs: preserve extent info for extent cache
f2fs: initialize extent tree with on-disk extent info of inode
f2fs: introduce __{find,grab}_extent_tree
f2fs: split set_data_blkaddr from f2fs_update_extent_cache
f2fs: enable fast symlink by utilizing inline data
...
This patch adds CP_RECOVERY to remain recovery information for checkpoint.
And, it makes sure writing checkpoint in this case.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Add trace events for cma_alloc() and cma_release().
The cma_alloc tracepoint is used both for successful and failed allocations,
in case of allocation failure pfn=-1UL is stored and printed.
Signed-off-by: Stefan Strogin <stefan.strogin@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mpn@google.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Cc: Thierry Reding <treding@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Merge first patchbomb from Andrew Morton:
- arch/sh updates
- ocfs2 updates
- kernel/watchdog feature
- about half of mm/
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (122 commits)
Documentation: update arch list in the 'memtest' entry
Kconfig: memtest: update number of test patterns up to 17
arm: add support for memtest
arm64: add support for memtest
memtest: use phys_addr_t for physical addresses
mm: move memtest under mm
mm, hugetlb: abort __get_user_pages if current has been oom killed
mm, mempool: do not allow atomic resizing
memcg: print cgroup information when system panics due to panic_on_oom
mm: numa: remove migrate_ratelimited
mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE
mm: split ET_DYN ASLR from mmap ASLR
s390: redefine randomize_et_dyn for ELF_ET_DYN_BASE
mm: expose arch_mmap_rnd when available
s390: standardize mmap_rnd() usage
powerpc: standardize mmap_rnd() usage
mips: extract logic for mmap_rnd()
arm64: standardize mmap_rnd() usage
x86: standardize mmap_rnd() usage
arm: factor out mmap ASLR into mmap_rnd
...
We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints.
Tracepoints have helper functions for the TP_printk() called
__print_symbolic() and __print_flags() that lets a numeric number be
displayed as a a human comprehensible text. What is placed in the
TP_printk() is also shown in the tracepoint format file such that
user space tools like perf and trace-cmd can parse the binary data
and express the values too. Unfortunately, the way the TRACE_EVENT()
macro works, anything placed in the TP_printk() will be shown pretty
much exactly as is. The problem arises when enums are used. That's
because unlike macros, enums will not be changed into their values
by the C pre-processor. Thus, the enum string is exported to the
format file, and this makes it useless for user space tools.
The TRACE_DEFINE_ENUM() solves this by converting the enum strings
in the TP_printk() format into their number, and that is what is
shown to user space. For example, the tracepoint tlb_flush currently
has this in its format file:
__print_symbolic(REC->reason,
{ TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
{ TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
{ TLB_LOCAL_SHOOTDOWN, "local shootdown" },
{ TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })
After adding:
TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);
Its format file will contain this:
__print_symbolic(REC->reason,
{ 0, "flush on task switch" },
{ 1, "remote shootdown" },
{ 2, "local shootdown" },
{ 3, "local mm shootdown" })
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJVLBTuAAoJEEjnJuOKh9ldjHMIALdRS755TXCZGOf0r7O2akOR
wMPeum7C+ae1mH+jCsJKUC0/jUfQKaMt/UxoHlipDgcGg8kD2jtGnGCw4Xlwvdsr
y4rFmcTRSl1mo0zDSsg6ujoupHlVYN0+JPjrd7S3cv/llJoY49zcanNLF7S2XLeM
dZCtWRLWYpBiWO68ai6AqJTnE/eGFIqBI048qb5Eg8dbK243SSeSIf9Ywhb+VsA+
aq6F7cWI/H6j4tbeza8tAN19dcwenDro5EfCDY8ARQHJu1f6Y3+DLf2imjkd6Aiu
JVAoGIjHIpI+djwCZC1u4gi4urjfOqYartrM3Q54tb3YWYqHeNqP2ASI2a4EpYk=
=Ixwt
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"Some clean ups and small fixes, but the biggest change is the addition
of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints.
Tracepoints have helper functions for the TP_printk() called
__print_symbolic() and __print_flags() that lets a numeric number be
displayed as a a human comprehensible text. What is placed in the
TP_printk() is also shown in the tracepoint format file such that user
space tools like perf and trace-cmd can parse the binary data and
express the values too. Unfortunately, the way the TRACE_EVENT()
macro works, anything placed in the TP_printk() will be shown pretty
much exactly as is. The problem arises when enums are used. That's
because unlike macros, enums will not be changed into their values by
the C pre-processor. Thus, the enum string is exported to the format
file, and this makes it useless for user space tools.
The TRACE_DEFINE_ENUM() solves this by converting the enum strings in
the TP_printk() format into their number, and that is what is shown to
user space. For example, the tracepoint tlb_flush currently has this
in its format file:
__print_symbolic(REC->reason,
{ TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
{ TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
{ TLB_LOCAL_SHOOTDOWN, "local shootdown" },
{ TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })
After adding:
TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);
Its format file will contain this:
__print_symbolic(REC->reason,
{ 0, "flush on task switch" },
{ 1, "remote shootdown" },
{ 2, "local shootdown" },
{ 3, "local mm shootdown" })"
* tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (27 commits)
tracing: Add enum_map file to show enums that have been mapped
writeback: Export enums used by tracepoint to user space
v4l: Export enums used by tracepoints to user space
SUNRPC: Export enums in tracepoints to user space
mm: tracing: Export enums in tracepoints to user space
irq/tracing: Export enums in tracepoints to user space
f2fs: Export the enums in the tracepoints to userspace
net/9p/tracing: Export enums in tracepoints to userspace
x86/tlb/trace: Export enums in used by tlb_flush tracepoint
tracing/samples: Update the trace-event-sample.h with TRACE_DEFINE_ENUM()
tracing: Allow for modules to convert their enums to values
tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values
tracing: Update trace-event-sample with TRACE_SYSTEM_VAR documentation
tracing: Give system name a pointer
brcmsmac: Move each system tracepoints to their own header
iwlwifi: Move each system tracepoints to their own header
mac80211: Move message tracepoints to their own header
tracing: Add TRACE_SYSTEM_VAR to xhci-hcd
tracing: Add TRACE_SYSTEM_VAR to kvm-s390
tracing: Add TRACE_SYSTEM_VAR to intel-sst
...
Pull libata updates from Tejun Heo:
- Hannes's patchset implements support for better error reporting
introduced by the new ATA command spec.
- the deperecated pci_ dma API usages have been replaced by dma_ ones.
- a bunch of hardware specific updates and some cleanups.
* 'for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ata: remove deprecated use of pci api
ahci: st: st_configure_oob must be called after IP is clocked.
ahci: st: Update the ahci_st DT documentation
ahci: st: Update the DT example for how to obtain the PHY.
sata_dwc_460ex: indent an if statement
libata: Add tracepoints
libata-eh: Set 'information' field for autosense
libata: Implement support for sense data reporting
libata: Implement NCQ autosense
libata: use status bit definitions in ata_dump_status()
ide,ata: Rename ATA_IDX to ATA_SENSE
libata: whitespace fixes in ata_to_sense_error()
libata: whitespace cleanup in ata_get_cmd_descript()
libata: use READ_LOG_DMA_EXT
libata: remove ATA_FLAG_LOWTAG
sata_dwc_460ex: re-use hsdev->dev instead of dwc_dev
sata_dwc_460ex: move to generic DMA driver
sata_dwc_460ex: join messages back
sata: xgene: add ACPI support for APM X-Gene SATA ports
ata: sata_mv: add proper definitions for LP_PHY_CTL register values
The struct page is opaque for userspace tools, so it'd be better to save
pfn in order to identify page frames.
The textual output of $debugfs/tracing/trace file remains unchanged and
only raw (binary) data format is changed - but thanks to libtraceevent,
userspace tools which deal with the raw data (like perf and trace-cmd)
can parse the format easily. So impact on the userspace will also be
minimal.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Based-on-patch-by: Joonsoo Kim <js1304@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1428298576-9785-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The enums used in tracepoints for __print_symbolic() do not have their
values shown in the tracepoint format files and this makes it difficult
for user space tools to convert the binary values to the strings they
are to represent.
Add TRACE_DEFINE_ENUM(x) macros to export the enum names to their values
to make the tracing output from user space tools more robust.
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Enums used by tracepoints for __print_symbolic() are shown in the
tracepoint format files with just their names and not their values.
This makes it difficult for user space tools to know how to convert the
binary data into their string representations.
By adding the use of TRACE_DEFINE_ENUM(), the enum names will be mapped
to their values and shown in the tracing file system to let tools
convert the data as necessary.
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The enums used in the tracepoints for __print_symbolic() have their
names shown in the tracepoint format files. User space tools do not know
how to convert those names into their values to be able to convert the
binary data.
Use TRACE_DEFINE_ENUM() to export the enum names to their values for
userspace to do the parsing correctly.
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The enums used in tracepoints with __print_symbolic() have their
names shown in the tracepoint format files and not their values.
This makes it difficult for user space tools to convert the binary
data to the strings as user space does not know what those enums
are about.
By having them use TRACE_DEFINE_ENUM(), the names of the enums will
be mapped to the values and shown to user space.
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The enums used by the softirq mapping is what is shown in the output
of the __print_symbolic() and not their values, that are needed
to map them to their strings. Export them to userspace with the
TRACE_DEFINE_ENUM() macro so that user space tools can map the enums
with their values.
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The tracepoints that use __print_symbolic() use enums as the value
to convert to strings. Unfortunately, the format files for these
tracepoints show the enum name and not their value. This causes some
userspace tools not to know how to convert __print_symbolic() to
their strings.
Add TRACE_DEFINE_ENUM() macros to export the enums used to userspace
to let those tools know what those enum values are.
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Pankaj Kumar <pankaj.km@samsung.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The tracepoints in the 9p code use a lot of enums for the __print_symbolic()
function. These enums are shown in the tracepoint format files, and user
space tools such as trace-cmd does not have the information to parse it.
Add helper macros to export the enums with TRACE_DEFINE_ENUM().
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Have the enums used in __print_symbolic() by the trace_tlb_flush()
tracepoint exported to userpace such that they can be parsed by
userspace tools.
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Cc: Dave Hansen <dave@sr71.net>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Several tracepoints use the helper functions __print_symbolic() or
__print_flags() and pass in enums that do the mapping between the
binary data stored and the value to print. This works well for reading
the ASCII trace files, but when the data is read via userspace tools
such as perf and trace-cmd, the conversion of the binary value to a
human string format is lost if an enum is used, as userspace does not
have access to what the ENUM is.
For example, the tracepoint trace_tlb_flush() has:
__print_symbolic(REC->reason,
{ TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
{ TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
{ TLB_LOCAL_SHOOTDOWN, "local shootdown" },
{ TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })
Which maps the enum values to the strings they represent. But perf and
trace-cmd do no know what value TLB_LOCAL_MM_SHOOTDOWN is, and would
not be able to map it.
With TRACE_DEFINE_ENUM(), developers can place these in the event header
files and ftrace will convert the enums to their values:
By adding:
TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);
$ cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/format
[...]
__print_symbolic(REC->reason,
{ 0, "flush on task switch" },
{ 1, "remote shootdown" },
{ 2, "local shootdown" },
{ 3, "local mm shootdown" })
The above is what userspace expects to see, and tools do not need to
be modified to parse them.
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Cc: Guilherme Cox <cox@computer.org>
Cc: Tony Luck <tony.luck@gmail.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Normally the compiler will use the same pointer for a string throughout
the file. But there's no guarantee of that happening. Later changes will
require that all events have the same pointer to the system string.
Name the system string and have all events point to it.
Testing this, it did not increases the size of the text, except for the
notes section, which should not harm the real size any.
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
New code will require TRACE_SYSTEM to be a valid C variable name,
but some tracepoints have TRACE_SYSTEM with '-' and not '_', so
it can not be used. Instead, add a TRACE_SYSTEM_VAR that can
give the tracing infrastructure a unique name for the trace system.
Link: http://lkml.kernel.org/r/20150402142831.GT6023@sirena.org.uk
Acked-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add some tracepoints for ata_qc_issue, ata_qc_complete, and
ata_eh_link_autopsy.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
The tracing events for regmap are confined to the regmap subsystem. It
also requires accessing an internal header. Instead of including the
internal header from a generic file location, move the tracing file
into the regmap directory.
Also rename the regmap tracing header to trace.h, as it is redundant to
keep the regmap.h name when it is in the regmap directory.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
It's useful to have tracepoints around operations that change the
hardware state so that we can debug clock hardware performance
and operations. Four basic types of events are supported: on/off
events for enable, disable, prepare, unprepare that only record
an event and a clock name, rate changing events for
clk_set_{min_,max_}rate{_range}(), phase changing events for
clk_set_phase() and parent changing events for clk_set_parent().
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Michael Turquette <mturquette@linaro.org>
This patch adds trace for lookup/update/shrink/destroy ops in rb-tree extent cache.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Pull lazytime mount option support from Al Viro:
"Lazytime stuff from tytso"
* 'lazytime' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ext4: add optimization for the lazytime mount option
vfs: add find_inode_nowait() function
vfs: add support for a lazytime mount option
Common: Optional support for adding a small amount of polling on each HLT
instruction executed in the guest (or equivalent for other architectures).
This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes
or TCP_RR netperf tests). This also has to be enabled manually for now,
but the plan is to auto-tune this in the future.
ARM/ARM64: the highlights are support for GICv3 emulation and dirty page
tracking
s390: several optimizations and bugfixes. Also a first: a feature
exposed by KVM (UUID and long guest name in /proc/sysinfo) before
it is available in IBM's hypervisor! :)
MIPS: Bugfixes.
x86: Support for PML (page modification logging, a new feature in
Broadwell Xeons that speeds up dirty page tracking), nested virtualization
improvements (nested APICv---a nice optimization), usual round of emulation
fixes. There is also a new option to reduce latency of the TSC deadline
timer in the guest; this needs to be tuned manually.
Some commits are common between this pull and Catalin's; I see you
have already included his tree.
ARM has other conflicts where functions are added in the same place
by 3.19-rc and 3.20 patches. These are not large though, and entirely
within KVM.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJU28rkAAoJEL/70l94x66DXqQH/1TDOfJIjW7P2kb0Sw7Fy1wi
cEX1KO/VFxAqc8R0E/0Wb55CXyPjQJM6xBXuFr5cUDaIjQ8ULSktL4pEwXyyv/s5
DBDkN65mriry2w5VuEaRLVcuX9Wy+tqLQXWNkEySfyb4uhZChWWHvKEcgw5SqCyg
NlpeHurYESIoNyov3jWqvBjr4OmaQENyv7t2c6q5ErIgG02V+iCux5QGbphM2IC9
LFtPKxoqhfeB2xFxTOIt8HJiXrZNwflsTejIlCl/NSEiDVLLxxHCxK2tWK/tUXMn
JfLD9ytXBWtNMwInvtFm4fPmDouv2VDyR0xnK2db+/axsJZnbxqjGu1um4Dqbak=
=7gdx
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM update from Paolo Bonzini:
"Fairly small update, but there are some interesting new features.
Common:
Optional support for adding a small amount of polling on each HLT
instruction executed in the guest (or equivalent for other
architectures). This can improve latency up to 50% on some
scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This
also has to be enabled manually for now, but the plan is to
auto-tune this in the future.
ARM/ARM64:
The highlights are support for GICv3 emulation and dirty page
tracking
s390:
Several optimizations and bugfixes. Also a first: a feature
exposed by KVM (UUID and long guest name in /proc/sysinfo) before
it is available in IBM's hypervisor! :)
MIPS:
Bugfixes.
x86:
Support for PML (page modification logging, a new feature in
Broadwell Xeons that speeds up dirty page tracking), nested
virtualization improvements (nested APICv---a nice optimization),
usual round of emulation fixes.
There is also a new option to reduce latency of the TSC deadline
timer in the guest; this needs to be tuned manually.
Some commits are common between this pull and Catalin's; I see you
have already included his tree.
Powerpc:
Nothing yet.
The KVM/PPC changes will come in through the PPC maintainers,
because I haven't received them yet and I might end up being
offline for some part of next week"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits)
KVM: ia64: drop kvm.h from installed user headers
KVM: x86: fix build with !CONFIG_SMP
KVM: x86: emulate: correct page fault error code for NoWrite instructions
KVM: Disable compat ioctl for s390
KVM: s390: add cpu model support
KVM: s390: use facilities and cpu_id per KVM
KVM: s390/CPACF: Choose crypto control block format
s390/kernel: Update /proc/sysinfo file with Extended Name and UUID
KVM: s390: reenable LPP facility
KVM: s390: floating irqs: fix user triggerable endless loop
kvm: add halt_poll_ns module parameter
kvm: remove KVM_MMIO_SIZE
KVM: MIPS: Don't leak FPU/DSP to guest
KVM: MIPS: Disable HTW while in guest
KVM: nVMX: Enable nested posted interrupt processing
KVM: nVMX: Enable nested virtual interrupt delivery
KVM: nVMX: Enable nested apic register virtualization
KVM: nVMX: Make nested control MSRs per-cpu
KVM: nVMX: Enable nested virtualize x2apic mode
KVM: nVMX: Prepare for using hardware MSR bitmap
...
Pull f2fs updates from Jaegeuk Kim:
"Major changes are to:
- add f2fs_io_tracer and F2FS_IOC_GETVERSION
- fix wrong acl assignment from parent
- fix accessing wrong data blocks
- fix wrong condition check for f2fs_sync_fs
- align start block address for direct_io
- add and refactor the readahead flows of FS metadata
- refactor atomic and volatile write policies
But most of patches are for clean-ups and minor bug fixes. Some of
them refactor old code too"
* tag 'for-f2fs-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (64 commits)
f2fs: use spinlock for segmap_lock instead of rwlock
f2fs: fix accessing wrong indexed data blocks
f2fs: avoid variable length array
f2fs: fix sparse warnings
f2fs: allocate data blocks in advance for f2fs_direct_IO
f2fs: introduce macros to convert bytes and blocks in f2fs
f2fs: call set_buffer_new for get_block
f2fs: check node page contents all the time
f2fs: avoid data offset overflow when lseeking huge file
f2fs: fix to use highmem for pages of newly created directory
f2fs: introduce a batched trim
f2fs: merge {invalidate,release}page for meta/node/data pages
f2fs: show the number of writeback pages in stat
f2fs: keep PagePrivate during releasepage
f2fs: should fail mount when trying to recover data on read-only dev
f2fs: split UMOUNT and FASTBOOT flags
f2fs: avoid write_checkpoint if f2fs is mounted readonly
f2fs: support norecovery mount option
f2fs: fix not to drop mount options when retrying fill_super
f2fs: merge flags in struct f2fs_sb_info
...
Pull backing device changes from Jens Axboe:
"This contains a cleanup of how the backing device is handled, in
preparation for a rework of the life time rules. In this part, the
most important change is to split the unrelated nommu mmap flags from
it, but also removing a backing_dev_info pointer from the
address_space (and inode), and a cleanup of other various minor bits.
Christoph did all the work here, I just fixed an oops with pages that
have a swap backing. Arnd fixed a missing export, and Oleg killed the
lustre backing_dev_info from staging. Last patch was from Al,
unexporting parts that are now no longer needed outside"
* 'for-3.20/bdi' of git://git.kernel.dk/linux-block:
Make super_blocks and sb_lock static
mtd: export new mtd_mmap_capabilities
fs: make inode_to_bdi() handle NULL inode
staging/lustre/llite: get rid of backing_dev_info
fs: remove default_backing_dev_info
fs: don't reassign dirty inodes to default_backing_dev_info
nfs: don't call bdi_unregister
ceph: remove call to bdi_unregister
fs: remove mapping->backing_dev_info
fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info
nilfs2: set up s_bdi like the generic mount_bdev code
block_dev: get bdev inode bdi directly from the block device
block_dev: only write bdev inode on close
fs: introduce f_op->mmap_capabilities for nommu mmap support
fs: kill BDI_CAP_SWAP_BACKED
fs: deduplicate noop_backing_dev_info
This time with:
* Generic page-table framework for ARM IOMMUs using the LPAE page-table
format, ARM-SMMU and Renesas IPMMU make use of it already.
* Break out of the IO virtual address allocator from the Intel IOMMU so
that it can be used by other DMA-API implementations too. The first
user will be the ARM64 common DMA-API implementation for IOMMUs
* Device tree support for Renesas IPMMU
* Various fixes and cleanups all over the place
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJU3MJOAAoJECvwRC2XARrjopUP+wachFx8vb00M4hlnlwL6FCn
DyIFkA1n4wL0muPhjcBI+LViEXrSxjr2TYoJEaBg+fiByWWQ1Hefg+KPz331Lo1D
+uo7WiOa1AB3pfkQiUN9IN6xx+o6ivhb3UQPiL4FHjggB/qz+KVxMM9nx0j8o0fQ
D9q6HLFiOIsFkra3xZaSuDGvYUBpcwyfn8FP1HVfvLlg1uxIGDcUJX3qU5UBpj9q
al/lPZ4A7rp+JLApV6WyouPiyVOZKikb5x920KeRNBem7a9fNBdgf+x7QbKpNXa1
5MaT5MarwGe8lJE4wtjOqRtsllhia+A1rg/6JbROPrlGetRFiuIh2sCKLvwOCko/
IjBHSutpaRT1lFoAG0TAnXQlvHRG/58XxOlP3eF613X/p8/cezuUaTyTIwZam9X3
j2GWwbUcBiHTxlu7bQDPz6a7cTf4w6wEALzYl18QrAFv+2LqlCfOo/LSlpStmjrF
kRN8DYaohlTULvmFneSr8rfGsnp5yPgIPvdmqiSwTz/Ih7kYPgfLy6+v6IAHUqZj
0n9oGs8eMqVvSzM2qqmyA9WGuQZRyhNjj4iDwn/he5YMw2kqxUQYGMpLnSu0Oi48
n4PqodtVol64jKLwaHZwyU8u71iyjUC5K9TDot/I2wlSRcTELJhxGh6c1sfDLyrO
u/htIszgKCgFvVrQoEZB
=dwrA
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"This time with:
- Generic page-table framework for ARM IOMMUs using the LPAE
page-table format, ARM-SMMU and Renesas IPMMU make use of it
already.
- Break out the IO virtual address allocator from the Intel IOMMU so
that it can be used by other DMA-API implementations too. The
first user will be the ARM64 common DMA-API implementation for
IOMMUs
- Device tree support for Renesas IPMMU
- Various fixes and cleanups all over the place"
* tag 'iommu-updates-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (36 commits)
iommu/amd: Convert non-returned local variable to boolean when relevant
iommu: Update my email address
iommu/amd: Use wait_event in put_pasid_state_wait
iommu/amd: Fix amd_iommu_free_device()
iommu/arm-smmu: Avoid build warning
iommu/fsl: Various cleanups
iommu/fsl: Use %pa to print phys_addr_t
iommu/omap: Print phys_addr_t using %pa
iommu: Make more drivers depend on COMPILE_TEST
iommu/ipmmu-vmsa: Fix IOMMU lookup when multiple IOMMUs are registered
iommu: Disable on !MMU builds
iommu/fsl: Remove unused fsl_of_pamu_ids[]
iommu/fsl: Fix section mismatch
iommu/ipmmu-vmsa: Use the ARM LPAE page table allocator
iommu: Fix trace_map() to report original iova and original size
iommu/arm-smmu: add support for iova_to_phys through ATS1PR
iopoll: Introduce memory-mapped IO polling macros
iommu/arm-smmu: don't touch the secure STLBIALL register
iommu/arm-smmu: make use of generic LPAE allocator
iommu: io-pgtable-arm: add non-secure quirk
...
o Several clean ups to the code
One such clean up was to convert to 64 bit time keeping, in the
ring buffer benchmark code.
o Adding of __print_array() helper macro for TRACE_EVENT()
o Updating the sample/trace_events/ to add samples of different ways to
make trace events. Lots of features have been added since the sample
code was made, and these features are mostly unknown. Developers
have been making their own hacks to do things that are already available.
o Performance improvements. Most notably, I found a performance bug where
a waiter that is waiting for a full page from the ring buffer will
see that a full page is not available, and go to sleep. The sched
event caused by it going to sleep would cause it to wake up again.
It would see that there was still not a full page, and go back to sleep
again, and that would wake it up again, until finally it would see a
full page. This change has been marked for stable.
Other improvements include removing global locks from fast paths.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJU3M+GAAoJEEjnJuOKh9ldpWQIAJTUzeVXlU0cf3bVn768VW7e
XS41WHF34l1tNevmKTh6fCPiw8+U0UMGLQt5WKtyaaARsZn2MlefLVuvHPKFlK2w
+qcI4OEVHH97Qgf9HWJSsYgnZaOnOE+TENqnokEgXMimRMuVcd/S4QaGxwJVDcjm
iBF5j2TaG4aGbx4a3J7KueoZ3K+39r3ut15hIGi/IZBZldQ1pt26ytafD/KA3CU3
BLRM2HLttAMsV1ds0EDLgZjSGICVetFcdOmI5Gwj7Qr3KrOTRPYJMNc8NdDL7Js9
v8VhujhFGvcCrhO/IKpVvd9yluz3RCF+Z7ihc+D/+1B3Nsm0PTwN3Fl5J+f89AA=
=u2Mm
-----END PGP SIGNATURE-----
Merge tag 'trace-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"The updates included in this pull request for ftrace are:
o Several clean ups to the code
One such clean up was to convert to 64 bit time keeping, in the
ring buffer benchmark code.
o Adding of __print_array() helper macro for TRACE_EVENT()
o Updating the sample/trace_events/ to add samples of different ways
to make trace events. Lots of features have been added since the
sample code was made, and these features are mostly unknown.
Developers have been making their own hacks to do things that are
already available.
o Performance improvements. Most notably, I found a performance bug
where a waiter that is waiting for a full page from the ring buffer
will see that a full page is not available, and go to sleep. The
sched event caused by it going to sleep would cause it to wake up
again. It would see that there was still not a full page, and go
back to sleep again, and that would wake it up again, until finally
it would see a full page. This change has been marked for stable.
Other improvements include removing global locks from fast paths"
* tag 'trace-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Do not wake up a splice waiter when page is not full
tracing: Fix unmapping loop in tracing_mark_write
tracing: Add samples of DECLARE_EVENT_CLASS() and DEFINE_EVENT()
tracing: Add TRACE_EVENT_FN example
tracing: Add TRACE_EVENT_CONDITION sample
tracing: Update the TRACE_EVENT fields available in the sample code
tracing: Separate out initializing top level dir from instances
tracing: Make tracing_init_dentry_tr() static
trace: Use 64-bit timekeeping
tracing: Add array printing helper
tracing: Remove newline from trace_printk warning banner
tracing: Use IS_ERR() check for return value of tracing_init_dentry()
tracing: Remove unneeded includes of debugfs.h and fs.h
tracing: Remove taking of trace_types_lock in pipe files
tracing: Add ref count to tracer for when they are being read by pipe
When studying page stealing, I noticed some weird looking decisions in
try_to_steal_freepages(). The first I assume is a bug (Patch 1), the
following two patches were driven by evaluation.
Testing was done with stress-highalloc of mmtests, using the
mm_page_alloc_extfrag tracepoint and postprocessing to get counts of how
often page stealing occurs for individual migratetypes, and what
migratetypes are used for fallbacks. Arguably, the worst case of page
stealing is when UNMOVABLE allocation steals from MOVABLE pageblock.
RECLAIMABLE allocation stealing from MOVABLE allocation is also not ideal,
so the goal is to minimize these two cases.
The evaluation of v2 wasn't always clear win and Joonsoo questioned the
results. Here I used different baseline which includes RFC compaction
improvements from [1]. I found that the compaction improvements reduce
variability of stress-highalloc, so there's less noise in the data.
First, let's look at stress-highalloc configured to do sync compaction,
and how these patches reduce page stealing events during the test. First
column is after fresh reboot, other two are reiterations of test without
reboot. That was all accumulater over 5 re-iterations (so the benchmark
was run 5x3 times with 5 fresh restarts).
Baseline:
3.19-rc4 3.19-rc4 3.19-rc4
5-nothp-1 5-nothp-2 5-nothp-3
Page alloc extfrag event 10264225 8702233 10244125
Extfrag fragmenting 10263271 8701552 10243473
Extfrag fragmenting for unmovable 13595 17616 15960
Extfrag fragmenting unmovable placed with movable 7989 12193 8447
Extfrag fragmenting for reclaimable 658 1840 1817
Extfrag fragmenting reclaimable placed with movable 558 1677 1679
Extfrag fragmenting for movable 10249018 8682096 10225696
With Patch 1:
3.19-rc4 3.19-rc4 3.19-rc4
6-nothp-1 6-nothp-2 6-nothp-3
Page alloc extfrag event 11834954 9877523 9774860
Extfrag fragmenting 11833993 9876880 9774245
Extfrag fragmenting for unmovable 7342 16129 11712
Extfrag fragmenting unmovable placed with movable 4191 10547 6270
Extfrag fragmenting for reclaimable 373 1130 923
Extfrag fragmenting reclaimable placed with movable 302 906 738
Extfrag fragmenting for movable 11826278 9859621 9761610
With Patch 2:
3.19-rc4 3.19-rc4 3.19-rc4
7-nothp-1 7-nothp-2 7-nothp-3
Page alloc extfrag event 4725990 3668793 3807436
Extfrag fragmenting 4725104 3668252 3806898
Extfrag fragmenting for unmovable 6678 7974 7281
Extfrag fragmenting unmovable placed with movable 2051 3829 4017
Extfrag fragmenting for reclaimable 429 1208 1278
Extfrag fragmenting reclaimable placed with movable 369 976 1034
Extfrag fragmenting for movable 4717997 3659070 3798339
With Patch 3:
3.19-rc4 3.19-rc4 3.19-rc4
8-nothp-1 8-nothp-2 8-nothp-3
Page alloc extfrag event 5016183 4700142 3850633
Extfrag fragmenting 5015325 4699613 3850072
Extfrag fragmenting for unmovable 1312 3154 3088
Extfrag fragmenting unmovable placed with movable 1115 2777 2714
Extfrag fragmenting for reclaimable 437 1193 1097
Extfrag fragmenting reclaimable placed with movable 330 969 879
Extfrag fragmenting for movable 5013576 4695266 3845887
In v2 we've seen apparent regression with Patch 1 for unmovable events,
this is now gone, suggesting it was indeed noise. Here, each patch
improves the situation for unmovable events. Reclaimable is improved by
patch 1 and then either the same modulo noise, or perhaps sligtly worse -
a small price for unmovable improvements, IMHO. The number of movable
allocations falling back to other migratetypes is most noisy, but it's
reduced to half at Patch 2 nevertheless. These are least critical as
compaction can move them around.
If we look at success rates, the patches don't affect them, that didn't change.
Baseline:
3.19-rc4 3.19-rc4 3.19-rc4
5-nothp-1 5-nothp-2 5-nothp-3
Success 1 Min 49.00 ( 0.00%) 42.00 ( 14.29%) 41.00 ( 16.33%)
Success 1 Mean 51.00 ( 0.00%) 45.00 ( 11.76%) 42.60 ( 16.47%)
Success 1 Max 55.00 ( 0.00%) 51.00 ( 7.27%) 46.00 ( 16.36%)
Success 2 Min 53.00 ( 0.00%) 47.00 ( 11.32%) 44.00 ( 16.98%)
Success 2 Mean 59.60 ( 0.00%) 50.80 ( 14.77%) 48.20 ( 19.13%)
Success 2 Max 64.00 ( 0.00%) 56.00 ( 12.50%) 52.00 ( 18.75%)
Success 3 Min 84.00 ( 0.00%) 82.00 ( 2.38%) 78.00 ( 7.14%)
Success 3 Mean 85.60 ( 0.00%) 82.80 ( 3.27%) 79.40 ( 7.24%)
Success 3 Max 86.00 ( 0.00%) 83.00 ( 3.49%) 80.00 ( 6.98%)
Patch 1:
3.19-rc4 3.19-rc4 3.19-rc4
6-nothp-1 6-nothp-2 6-nothp-3
Success 1 Min 49.00 ( 0.00%) 44.00 ( 10.20%) 44.00 ( 10.20%)
Success 1 Mean 51.80 ( 0.00%) 46.00 ( 11.20%) 45.80 ( 11.58%)
Success 1 Max 54.00 ( 0.00%) 49.00 ( 9.26%) 49.00 ( 9.26%)
Success 2 Min 58.00 ( 0.00%) 49.00 ( 15.52%) 48.00 ( 17.24%)
Success 2 Mean 60.40 ( 0.00%) 51.80 ( 14.24%) 50.80 ( 15.89%)
Success 2 Max 63.00 ( 0.00%) 54.00 ( 14.29%) 55.00 ( 12.70%)
Success 3 Min 84.00 ( 0.00%) 81.00 ( 3.57%) 79.00 ( 5.95%)
Success 3 Mean 85.00 ( 0.00%) 81.60 ( 4.00%) 79.80 ( 6.12%)
Success 3 Max 86.00 ( 0.00%) 82.00 ( 4.65%) 82.00 ( 4.65%)
Patch 2:
3.19-rc4 3.19-rc4 3.19-rc4
7-nothp-1 7-nothp-2 7-nothp-3
Success 1 Min 50.00 ( 0.00%) 44.00 ( 12.00%) 39.00 ( 22.00%)
Success 1 Mean 52.80 ( 0.00%) 45.60 ( 13.64%) 42.40 ( 19.70%)
Success 1 Max 55.00 ( 0.00%) 46.00 ( 16.36%) 47.00 ( 14.55%)
Success 2 Min 52.00 ( 0.00%) 48.00 ( 7.69%) 45.00 ( 13.46%)
Success 2 Mean 53.40 ( 0.00%) 49.80 ( 6.74%) 48.80 ( 8.61%)
Success 2 Max 57.00 ( 0.00%) 52.00 ( 8.77%) 52.00 ( 8.77%)
Success 3 Min 84.00 ( 0.00%) 81.00 ( 3.57%) 79.00 ( 5.95%)
Success 3 Mean 85.00 ( 0.00%) 82.40 ( 3.06%) 79.60 ( 6.35%)
Success 3 Max 86.00 ( 0.00%) 83.00 ( 3.49%) 80.00 ( 6.98%)
Patch 3:
3.19-rc4 3.19-rc4 3.19-rc4
8-nothp-1 8-nothp-2 8-nothp-3
Success 1 Min 46.00 ( 0.00%) 44.00 ( 4.35%) 42.00 ( 8.70%)
Success 1 Mean 50.20 ( 0.00%) 45.60 ( 9.16%) 44.00 ( 12.35%)
Success 1 Max 52.00 ( 0.00%) 47.00 ( 9.62%) 47.00 ( 9.62%)
Success 2 Min 53.00 ( 0.00%) 49.00 ( 7.55%) 48.00 ( 9.43%)
Success 2 Mean 55.80 ( 0.00%) 50.60 ( 9.32%) 49.00 ( 12.19%)
Success 2 Max 59.00 ( 0.00%) 52.00 ( 11.86%) 51.00 ( 13.56%)
Success 3 Min 84.00 ( 0.00%) 80.00 ( 4.76%) 79.00 ( 5.95%)
Success 3 Mean 85.40 ( 0.00%) 81.60 ( 4.45%) 80.40 ( 5.85%)
Success 3 Max 87.00 ( 0.00%) 83.00 ( 4.60%) 82.00 ( 5.75%)
While there's no improvement here, I consider reduced fragmentation events
to be worth on its own. Patch 2 also seems to reduce scanning for free
pages, and migrations in compaction, suggesting it has somewhat less work
to do:
Patch 1:
Compaction stalls 4153 3959 3978
Compaction success 1523 1441 1446
Compaction failures 2630 2517 2531
Page migrate success 4600827 4943120 5104348
Page migrate failure 19763 16656 17806
Compaction pages isolated 9597640 10305617 10653541
Compaction migrate scanned 77828948 86533283 87137064
Compaction free scanned 517758295 521312840 521462251
Compaction cost 5503 5932 6110
Patch 2:
Compaction stalls 3800 3450 3518
Compaction success 1421 1316 1317
Compaction failures 2379 2134 2201
Page migrate success 4160421 4502708 4752148
Page migrate failure 19705 14340 14911
Compaction pages isolated 8731983 9382374 9910043
Compaction migrate scanned 98362797 96349194 98609686
Compaction free scanned 496512560 469502017 480442545
Compaction cost 5173 5526 5811
As with v2, /proc/pagetypeinfo appears unaffected with respect to numbers
of unmovable and reclaimable pageblocks.
Configuring the benchmark to allocate like THP page fault (i.e. no sync
compaction) gives much noisier results for iterations 2 and 3 after
reboot. This is not so surprising given how [1] offers lower improvements
in this scenario due to less restarts after deferred compaction which
would change compaction pivot.
Baseline:
3.19-rc4 3.19-rc4 3.19-rc4
5-thp-1 5-thp-2 5-thp-3
Page alloc extfrag event 8148965 6227815 6646741
Extfrag fragmenting 8147872 6227130 6646117
Extfrag fragmenting for unmovable 10324 12942 15975
Extfrag fragmenting unmovable placed with movable 5972 8495 10907
Extfrag fragmenting for reclaimable 601 1707 2210
Extfrag fragmenting reclaimable placed with movable 520 1570 2000
Extfrag fragmenting for movable 8136947 6212481 6627932
Patch 1:
3.19-rc4 3.19-rc4 3.19-rc4
6-thp-1 6-thp-2 6-thp-3
Page alloc extfrag event 8345457 7574471 7020419
Extfrag fragmenting 8343546 7573777 7019718
Extfrag fragmenting for unmovable 10256 18535 30716
Extfrag fragmenting unmovable placed with movable 6893 11726 22181
Extfrag fragmenting for reclaimable 465 1208 1023
Extfrag fragmenting reclaimable placed with movable 353 996 843
Extfrag fragmenting for movable 8332825 7554034 6987979
Patch 2:
3.19-rc4 3.19-rc4 3.19-rc4
7-thp-1 7-thp-2 7-thp-3
Page alloc extfrag event 3512847 3020756 2891625
Extfrag fragmenting 3511940 3020185 2891059
Extfrag fragmenting for unmovable 9017 6892 6191
Extfrag fragmenting unmovable placed with movable 1524 3053 2435
Extfrag fragmenting for reclaimable 445 1081 1160
Extfrag fragmenting reclaimable placed with movable 375 918 986
Extfrag fragmenting for movable 3502478 3012212 2883708
Patch 3:
3.19-rc4 3.19-rc4 3.19-rc4
8-thp-1 8-thp-2 8-thp-3
Page alloc extfrag event 3181699 3082881 2674164
Extfrag fragmenting 3180812 3082303 2673611
Extfrag fragmenting for unmovable 1201 4031 4040
Extfrag fragmenting unmovable placed with movable 974 3611 3645
Extfrag fragmenting for reclaimable 478 1165 1294
Extfrag fragmenting reclaimable placed with movable 387 985 1030
Extfrag fragmenting for movable 3179133 3077107 2668277
The improvements for first iteration are clear, the rest is much noisier
and can appear like regression for Patch 1. Anyway, patch 2 rectifies it.
Allocation success rates are again unaffected so there's no point in
making this e-mail any longer.
[1] http://marc.info/?l=linux-mm&m=142166196321125&w=2
This patch (of 3):
When __rmqueue_fallback() is called to allocate a page of order X, it will
find a page of order Y >= X of a fallback migratetype, which is different
from the desired migratetype. With the help of try_to_steal_freepages(),
it may change the migratetype (to the desired one) also of:
1) all currently free pages in the pageblock containing the fallback page
2) the fallback pageblock itself
3) buddy pages created by splitting the fallback page (when Y > X)
These decisions take the order Y into account, as well as the desired
migratetype, with the goal of preventing multiple fallback allocations
that could e.g. distribute UNMOVABLE allocations among multiple
pageblocks.
Originally, decision for 1) has implied the decision for 3). Commit
47118af076 ("mm: mmzone: MIGRATE_CMA migration type added") changed that
(probably unintentionally) so that the buddy pages in case 3) are always
changed to the desired migratetype, except for CMA pageblocks.
Commit fef903efcf ("mm/page_allo.c: restructure free-page stealing code
and fix a bug") did some refactoring and added a comment that the case of
3) is intended. Commit 0cbef29a78 ("mm: __rmqueue_fallback() should
respect pageblock type") removed the comment and tried to restore the
original behavior where 1) implies 3), but due to the previous
refactoring, the result is instead that only 2) implies 3) - and the
conditions for 2) are less frequently met than conditions for 1). This
may increase fragmentation in situations where the code decides to steal
all free pages from the pageblock (case 1)), but then gives back the buddy
pages produced by splitting.
This patch restores the original intended logic where 1) implies 3).
During testing with stress-highalloc from mmtests, this has shown to
decrease the number of events where UNMOVABLE and RECLAIMABLE allocations
steal from MOVABLE pageblocks, which can lead to permanent fragmentation.
In some cases it has increased the number of events when MOVABLE
allocations steal from UNMOVABLE or RECLAIMABLE pageblocks, but these are
fixable by sync compaction and thus less harmful.
Note that evaluation has shown that the behavior introduced by
47118af076 for buddy pages in case 3) is actually even better than the
original logic, so the following patch will introduce it properly once
again. For stable backports of this patch it makes thus sense to only fix
versions containing 0cbef29a78.
[iamjoonsoo.kim@lge.com: tracepoint fix]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@vger.kernel.org> [3.13+ containing 0cbef29a78]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Compaction deferring logic is heavy hammer that block the way to the
compaction. It doesn't consider overall system state, so it could prevent
user from doing compaction falsely. In other words, even if system has
enough range of memory to compact, compaction would be skipped due to
compaction deferring logic. This patch add new tracepoint to understand
work of deferring logic. This will also help to check compaction success
and fail.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is not well analyzed that when/why compaction start/finish or not.
With these new tracepoints, we can know much more about start/finish
reason of compaction. I can find following bug with these tracepoint.
http://www.spinics.net/lists/linux-mm/msg81582.html
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It'd be useful to know current range where compaction work for detailed
analysis. With it, we can know pageblock where we actually scan and
isolate, and, how much pages we try in that pageblock and can guess why it
doesn't become freepage with pageblock order roughly.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We now have tracepoint for begin event of compaction and it prints start
position of both scanners, but, tracepoint for end event of compaction
doesn't print finish position of both scanners. It'd be also useful to
know finish position of both scanners so this patch add it. It will help
to find odd behavior or problem on compaction internal logic.
And mode is added to both begin/end tracepoint output, since according to
mode, compaction behavior is quite different.
And lastly, status format is changed to string rather than status number
for readability.
[akpm@linux-foundation.org: fix sparse warning]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To check the range that compaction is working, tracepoint print
start/end pfn of zone and start pfn of both scanner with decimal format.
Since we manage all pages in order of 2 and it is well represented by
hexadecimal, this patch change the tracepoint format from decimal to
hexadecimal. This would improve readability. For example, it makes us
easily notice whether current scanner try to compact previously
attempted pageblock or not.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch resolves the following warnings.
include/trace/events/f2fs.h:150:1: warning: expression using sizeof bool
include/trace/events/f2fs.h:180:1: warning: expression using sizeof bool
include/trace/events/f2fs.h:990:1: warning: expression using sizeof bool
include/trace/events/f2fs.h:990:1: warning: expression using sizeof bool
include/trace/events/f2fs.h:150:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
include/trace/events/f2fs.h:180:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
include/trace/events/f2fs.h:990:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
include/trace/events/f2fs.h:990:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
fs/f2fs/checkpoint.c:27:19: warning: symbol 'inode_entry_slab' was not declared. Should it be static?
fs/f2fs/checkpoint.c:577:15: warning: cast to restricted __le32
fs/f2fs/checkpoint.c:592:15: warning: cast to restricted __le32
fs/f2fs/trace.c:19:1: warning: symbol 'pids' was not declared. Should it be static?
fs/f2fs/trace.c:21:21: warning: symbol 'last_io' was not declared. Should it be static?
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds FASTBOOT flag into checkpoint as follows.
- CP_UMOUNT_FLAG is set when system is umounted.
- CP_FASTBOOT_FLAG is set when intermediate checkpoint having node summaries
was done.
So, if you get CP_UMOUNT_FLAG from checkpoint, the system was umounted cleanly.
Instead, if there was sudden-power-off, you can get CP_FASTBOOT_FLAG or nothing.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Currently, there are several variables with Boolean type as below:
struct f2fs_sb_info {
...
int s_dirty;
bool need_fsck;
bool s_closing;
...
bool por_doing;
...
}
For this there are some issues:
1. there are some space of f2fs_sb_info is wasted due to aligning after Boolean
type variables by compiler.
2. if we continuously add new flag into f2fs_sb_info, structure will be messed
up.
So in this patch, we try to:
1. switch s_dirty to Boolean type variable since it has two status 0/1.
2. merge s_dirty/need_fsck/s_closing/por_doing variables into s_flag.
3. introduce an enum type which can indicate different states of sbi.
4. use new introduced universal interfaces is_sbi_flag_set/{set,clear}_sbi_flag
to operate flags for sbi.
After that, above issues will be fixed.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Pull networking updates from David Miller:
1) More iov_iter conversion work from Al Viro.
[ The "crypto: switch af_alg_make_sg() to iov_iter" commit was
wrong, and this pull actually adds an extra commit on top of the
branch I'm pulling to fix that up, so that the pre-merge state is
ok. - Linus ]
2) Various optimizations to the ipv4 forwarding information base trie
lookup implementation. From Alexander Duyck.
3) Remove sock_iocb altogether, from CHristoph Hellwig.
4) Allow congestion control algorithm selection via routing metrics.
From Daniel Borkmann.
5) Make ipv4 uncached route list per-cpu, from Eric Dumazet.
6) Handle rfs hash collisions more gracefully, also from Eric Dumazet.
7) Add xmit_more support to r8169, e1000, and e1000e drivers. From
Florian Westphal.
8) Transparent Ethernet Bridging support for GRO, from Jesse Gross.
9) Add BPF packet actions to packet scheduler, from Jiri Pirko.
10) Add support for uniqu flow IDs to openvswitch, from Joe Stringer.
11) New NetCP ethernet driver, from Muralidharan Karicheri and Wingman
Kwok.
12) More sanely handle out-of-window dupacks, which can result in
serious ACK storms. From Neal Cardwell.
13) Various rhashtable bug fixes and enhancements, from Herbert Xu,
Patrick McHardy, and Thomas Graf.
14) Support xmit_more in be2net, from Sathya Perla.
15) Group Policy extensions for vxlan, from Thomas Graf.
16) Remove Checksum Offload support for vxlan, from Tom Herbert.
17) Like ipv4, support lockless transmit over ipv6 UDP sockets. From
Vlad Yasevich.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1494+1 commits)
crypto: fix af_alg_make_sg() conversion to iov_iter
ipv4: Namespecify TCP PMTU mechanism
i40e: Fix for stats init function call in Rx setup
tcp: don't include Fast Open option in SYN-ACK on pure SYN-data
openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set
ipv6: Make __ipv6_select_ident static
ipv6: Fix fragment id assignment on LE arches.
bridge: Fix inability to add non-vlan fdb entry
net: Mellanox: Delete unnecessary checks before the function call "vunmap"
cxgb4: Add support in cxgb4 to get expansion rom version via ethtool
ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version
net: dsa: Remove redundant phy_attach()
IB/mlx4: Reset flow support for IB kernel ULPs
IB/mlx4: Always use the correct port for mirrored multicast attachments
net/bonding: Fix potential bad memory access during bonding events
tipc: remove tipc_snprintf
tipc: nl compat add noop and remove legacy nl framework
tipc: convert legacy nl stats show to nl compat
tipc: convert legacy nl net id get to nl compat
tipc: convert legacy nl net id set to nl compat
...
Pull perf updates from Ingo Molnar:
"Kernel side changes:
- AMD range breakpoints support:
Extend breakpoint tools and core to support address range through
perf event with initial backend support for AMD extended
breakpoints.
The syntax is:
perf record -e mem:addr/len:type
For example set write breakpoint from 0x1000 to 0x1200 (0x1000 + 512)
perf record -e mem:0x1000/512:w
- event throttling/rotating fixes
- various event group handling fixes, cleanups and general paranoia
code to be more robust against bugs in the future.
- kernel stack overhead fixes
User-visible tooling side changes:
- Show precise number of samples in at the end of a 'record' session,
if processing build ids, since we will then traverse the whole
perf.data file and see all the PERF_RECORD_SAMPLE records,
otherwise stop showing the previous off-base heuristicly counted
number of "samples" (Namhyung Kim).
- Support to read compressed module from build-id cache (Namhyung
Kim)
- Enable sampling loads and stores simultaneously in 'perf mem'
(Stephane Eranian)
- 'perf diff' output improvements (Namhyung Kim)
- Fix error reporting for evsel pgfault constructor (Arnaldo Carvalho
de Melo)
Tooling side infrastructure changes:
- Cache eh/debug frame offset for dwarf unwind (Namhyung Kim)
- Support parsing parameterized events (Cody P Schafer)
- Add support for IP address formats in libtraceevent (David Ahern)
Plus other misc fixes"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
perf: Decouple unthrottling and rotating
perf: Drop module reference on event init failure
perf: Use POLLIN instead of POLL_IN for perf poll data in flag
perf: Fix put_event() ctx lock
perf: Fix move_group() order
perf: Fix event->ctx locking
perf: Add a bit of paranoia
perf symbols: Convert lseek + read to pread
perf tools: Use perf_data_file__fd() consistently
perf symbols: Support to read compressed module from build-id cache
perf evsel: Set attr.task bit for a tracking event
perf header: Set header version correctly
perf record: Show precise number of samples
perf tools: Do not use __perf_session__process_events() directly
perf callchain: Cache eh/debug frame offset for dwarf unwind
perf tools: Provide stub for missing pthread_attr_setaffinity_np
perf evsel: Don't rely on malloc working for sz 0
tools lib traceevent: Add support for IP address formats
perf ui/tui: Show fatal error message only if exists
perf tests: Fix typo in sample-parsing.c
...
out a real bug. During suspend and resume the tlb_flush tracepoint is
called when the CPU is going offline. As the CPU has been noted as offline,
RCU is ignoring that CPU, which means that it can not use RCU protected
locks. When tracepoints are activated, they require RCU locking, and
if RCU is ignoring a CPU that runs a tracepoint, there is a chance that
the tracepoint could cause corruption.
The solution was to change the tracepoint into a TRACE_EVENT_CONDITION()
which allows us to check a condition to determine if the tracepoint
should be called or not. If the condition is not met, the rcu protected
code will not be executed. By adding the condition
"cpu_online(smp_processor_id())", this will prevent the RCU protected
code from being executed if the CPU is marked offline.
After adding this, another bug was discovered. As RCU checks rcu callers,
if a rcu call is not done, there is no check (obviously). We found that
tracepoints could be added in RCU ignored locations and not have lockdep
complain until the tracepoint is activated. This missed places where
tracepoints were added in places they should not have been. To fix this,
code was added in 3.18 that if lockdep is enabled, any tracepoint will
still call the rcu checks even if the tracepoint is not enabled. The bug
here, is that the check does not take the CONDITION into account. As the
condition may prevent tracepoints from being activated in RCU ignored
areas (as the one patch does), we get false positives when we enable
lockdep and hit a tracepoint that the condition prevents it from being
called in a RCU ignored location. The fix for this is to add the
CONDITION to the rcu checks, even if the tracepoint is not enabled.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJU1rQfAAoJEEjnJuOKh9ld19UH/juFLZFjpYBgtRmbCZa/54Zk
i2Fa3U8jQe8MHHEYOjCLT9MQTYo/42btmJhr7kWKtIoUgDEli4lkOpbs+H0qar5y
Vv9+1cLeNFQzgIE3nwV7cjAw7Jufoyzd1lstDqIQvcmzZnQ5sNyyVeigMcxGv8Ls
4FyqzG6zCVgiDL4LyYNHdNcMr6qLs3KTFDEqp+kQreeO7R1r3ZEpq3JoWaEUgoPP
qrYv/rqVosLBUGA0pd7RmiGOxhjeKm15qz1GkiPeeus6DDWC6bvPC8cAc/FfkXH0
hYpoQghSZVnXGy0LzVsd44gj7tYx1FHEpYy1s8G6d5WJcNOGZ6OoZOdOZMyjPVw=
=PeL2
-----END PGP SIGNATURE-----
Merge tag 'trace-fixes-v3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull ftrace fixes from Steven Rostedt:
"During testing Sedat Dilek hit a "suspicious RCU usage" splat that
pointed out a real bug. During suspend and resume the tlb_flush
tracepoint is called when the CPU is going offline. As the CPU has
been noted as offline, RCU is ignoring that CPU, which means that it
can not use RCU protected locks. When tracepoints are activated, they
require RCU locking, and if RCU is ignoring a CPU that runs a
tracepoint, there is a chance that the tracepoint could cause
corruption.
The solution was to change the tracepoint into a
TRACE_EVENT_CONDITION() which allows us to check a condition to
determine if the tracepoint should be called or not. If the condition
is not met, the rcu protected code will not be executed. By adding
the condition "cpu_online(smp_processor_id())", this will prevent the
RCU protected code from being executed if the CPU is marked offline.
After adding this, another bug was discovered. As RCU checks rcu
callers, if a rcu call is not done, there is no check (obviously). We
found that tracepoints could be added in RCU ignored locations and not
have lockdep complain until the tracepoint is activated. This missed
places where tracepoints were added in places they should not have
been. To fix this, code was added in 3.18 that if lockdep is enabled,
any tracepoint will still call the rcu checks even if the tracepoint
is not enabled. The bug here, is that the check does not take the
CONDITION into account. As the condition may prevent tracepoints from
being activated in RCU ignored areas (as the one patch does), we get
false positives when we enable lockdep and hit a tracepoint that the
condition prevents it from being called in a RCU ignored location.
The fix for this is to add the CONDITION to the rcu checks, even if
the tracepoint is not enabled"
* tag 'trace-fixes-v3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
x86/tlb/trace: Do not trace on CPU that is offline
tracing: Add condition check to RCU lockdep checks
When taking a CPU down for suspend and resume, a tracepoint may be called
when the CPU has been designated offline. As tracepoints require RCU for
protection, they must not be called if the current CPU is offline.
Unfortunately, trace_tlb_flush() is called in this scenario as was noted
by LOCKDEP:
...
Disabling non-boot CPUs ...
intel_pstate CPU 1 exiting
===============================
smpboot: CPU 1 didn't die...
[ INFO: suspicious RCU usage. ]
3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
-------------------------------
include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 0
no locks held by swapper/1/0.
stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1
Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
0000000000000001 ffff88011a44fe18 ffffffff817e370d 0000000000000011
ffff88011a448290 ffff88011a44fe48 ffffffff810d6847 ffff8800c66b9600
0000000000000001 ffff88011a44c000 ffffffff81cb3900 ffff88011a44fe78
Call Trace:
[<ffffffff817e370d>] dump_stack+0x4c/0x65
[<ffffffff810d6847>] lockdep_rcu_suspicious+0xe7/0x120
[<ffffffff810b71a5>] idle_task_exit+0x205/0x2c0
[<ffffffff81054c4e>] play_dead_common+0xe/0x50
[<ffffffff81054ca5>] native_play_dead+0x15/0x140
[<ffffffff8102963f>] arch_cpu_idle_dead+0xf/0x20
[<ffffffff810cd89e>] cpu_startup_entry+0x37e/0x580
[<ffffffff81053e20>] start_secondary+0x140/0x150
intel_pstate CPU 2 exiting
...
By converting the tlb_flush tracepoint to a TRACE_EVENT_CONDITION where the
condition is cpu_online(smp_processor_id()), we can avoid calling RCU protected
code when the CPU is offline.
Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com
Cc: stable@vger.kernel.org # 3.17+
Fixes: d17d8f9ded "x86/mm: Add tracepoints for TLB flushes"
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Hansen <dave@sr71.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add an optimization for the MS_LAZYTIME mount option so that we will
opportunistically write out any inodes with the I_DIRTY_TIME flag set
in a particular inode table block when we need to update some inode in
that inode table block anyway.
Also add some temporary code so that we can set the lazytime mount
option without needing a modified /sbin/mount program which can set
MS_LAZYTIME. We can eventually make this go away once util-linux has
added support.
Google-Bug-Id: 18297052
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add a new mount option which enables a new "lazytime" mode. This mode
causes atime, mtime, and ctime updates to only be made to the
in-memory version of the inode. The on-disk times will only get
updated when (a) if the inode needs to be updated for some non-time
related change, (b) if userspace calls fsync(), syncfs() or sync(), or
(c) just before an undeleted inode is evicted from memory.
This is OK according to POSIX because there are no guarantees after a
crash unless userspace explicitly requests via a fsync(2) call.
For workloads which feature a large number of random write to a
preallocated file, the lazytime mount option significantly reduces
writes to the inode table. The repeated 4k writes to a single block
will result in undesirable stress on flash devices and SMR disk
drives. Even on conventional HDD's, the repeated writes to the inode
table block will trigger Adjacent Track Interference (ATI) remediation
latencies, which very negatively impact long tail latencies --- which
is a very big deal for web serving tiers (for example).
Google-Bug-Id: 18297052
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If a trace event contains an array, there is currently no standard
way to format this for text output. Drivers are currently hacking
around this by a) local hacks that use the trace_seq functionailty
directly, or b) just not printing that information. For fixed size
arrays, formatting of the elements can be open-coded, but this gets
cumbersome for arrays of non-trivial size.
These approaches result in non-standard content of the event format
description delivered to userspace, so userland tools needs to be
taught to understand and parse each array printing method
individually.
This patch implements a __print_array() helper that tracepoint
implementations can use instead of reinventing it. A simple C-style
syntax is used to delimit the array and its elements {like,this}.
So that the helpers can be used with large static arrays as well as
dynamic arrays, they take a pointer and element count: they can be
used with __get_dynamic_array() for use with dynamic arrays.
Link: http://lkml.kernel.org/r/1422449335-8289-2-git-send-email-javi.merino@arm.com
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Conflicts:
arch/arm/boot/dts/imx6sx-sdb.dts
net/sched/cls_bpf.c
Two simple sets of overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that default_backing_dev_info is not used for writeback purposes we can
git rid of it easily:
- instead of using it's name for tracing unregistered bdi we just use
"unknown"
- btrfs and ceph can just assign the default read ahead window themselves
like several other filesystems already do.
- we can assign noop_backing_dev_info as the default one in alloc_super.
All filesystems already either assigned their own or
noop_backing_dev_info.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Now that we got rid of the bdi abuse on character devices we can always use
sb->s_bdi to get at the backing_dev_info for a file, except for the block
device special case. Export inode_to_bdi and replace uses of
mapping->backing_dev_info with it to prepare for the removal of
mapping->backing_dev_info.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Currently map and unmap are implemented as events under a
common trace class declaration. The common class forces
trace_unmap() to require a bogus physical address argument
that it doesn't use. Changing unmap to report unmapped size
will provide useful information for debugging. Remove common
map_unmap trace class and change map and unmap into separate
events as opposed to events under the same class to allow for
differences in the reporting information. In addition, map and
unmap are changed to handle size value as size_t instead of int
to match the passed size value and avoid overflow.
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
sparse complains about
include/trace/events/kvm.h:163:1: error: directive in argument list
include/trace/events/kvm.h:167:1: error: directive in argument list
include/trace/events/kvm.h:169:1: error: directive in argument list
and sparse is right. Preprocessing directives in an argument of a
macro are undefined behaviour as of C99 6.10.3p11.
Lets use an indirection to fix this.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Both Linus (most recent) and Steve (a while ago) reported that perf
related callbacks have massive stack bloat.
The problem is that software events need a pt_regs in order to
properly report the event location and unwind stack. And because we
could not assume one was present we allocated one on stack and filled
it with minimal bits required for operation.
Now, pt_regs is quite large, so this is undesirable. Furthermore it
turns out that most sites actually have a pt_regs pointer available,
making this even more onerous, as the stack space is pointless waste.
This patch addresses the problem by observing that software events
have well defined nesting semantics, therefore we can use static
per-cpu storage instead of on-stack.
Linus made the further observation that all but the scheduler callers
of perf_sw_event() have a pt_regs available, so we change the regular
perf_sw_event() to require a valid pt_regs (where it used to be
optional) and add perf_sw_event_sched() for the scheduler.
We have a scheduler specific call instead of a more generic _noregs()
like construct because we can assume non-recursion from the scheduler
and thereby simplify the code further (_noregs would have to put the
recursion context call inline in order to assertain which __perf_regs
element to use).
One last note on the implementation of perf_trace_buf_prepare(); we
allow .regs = NULL for those cases where we already have a pt_regs
pointer available and do not need another.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Link: http://lkml.kernel.org/r/20141216115041.GW3337@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The same macros are used for rx as well. So rename it.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cleanup parameters for trace_f2fs_submit_{read_,write_,page_,page_m}bio with fio
as one parameter.
Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds missing parameter _type_ for trace_f2fs_submit_page_bio, then
use DECLARE_EVENT_CLASS/DEFINE_EVENT_CONDITION pair to cleanup some trace event
code related to f2fs_submit_page_{m,}bio.
Additionally, after we remove redundant code, size of code can be reduced:
text data bss dec hex filename
176787 8712 56 185555 2d4d3 f2fs.ko.org
174408 8648 56 183112 2cb48 f2fs.ko
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This is a much shorter set of patches that were on the go but didn't make it
in to the early pull request for the merge window. It's really a set of bug
fixes plus some final cleanup work on the new tag queue API.
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJUlaYEAAoJEDeqqVYsXL0MmXAH/2UUcE11p0KBHMR4cAn76xrG
9093ZT9VZ4LH/Z7PbgwIWC4YHDqVpwA1+Trj1mh8PxiZz2SopWe27O2lQMRS5VUc
MN28kbmK3L0jQj+OUez10Da6k0hU/KL8TlWT765MxFDKCaAuPZ4u541tyZEIGmLL
olOQrn/fSlu+18QqqZ+D2rMaK7kGH6ZgbOadnRfYGkLjU4YeAMEC9L7UgnDxHiaN
gZozoARkGeAnDJERVETRTtAiOXGRH8sGCpue0yYlhZXpAQ9cFUkS/hMqDWnaVC2S
0x0w34RvbxSqO0gPT0K5XLoMiFyg04vnZ2xBVFBsANQTSEjQJO8USNAa4r74hf8=
=D3eN
-----END PGP SIGNATURE-----
Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI update from James Bottomley:
"This is a much shorter set of patches that were on the go but didn't
make it in to the early pull request for the merge window. It's
really a set of bug fixes plus some final cleanup work on the new tag
queue API"
* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
storvsc: ring buffer failures may result in I/O freeze
ipr: set scsi_level correctly for disk arrays
ipr: add support for async scanning to speed up boot
scsi_debug: fix missing "break;" in SDEBUG_UA_CAPACITY_CHANGED case
scsi_debug: take sdebug_host_list_lock when changing capacity
scsi_debug: improve driver description in Kconfig
scsi_debug: fix compare and write errors
qla2xxx: fix race in handling rport deletion during recovery causes panic
scsi: blacklist RSOC for Microsoft iSCSI target devices
scsi: fix random memory corruption with scsi-mq + T10 PI
Revert "[SCSI] mpt3sas: Remove phys on topology change"
Revert "[SCSI] mpt2sas: Remove phys on topology change."
esas2r: Correct typos of "validate" in a comment
fc: FCP_PTA_SIMPLE is 0
ibmvfc: remove unused tag variable
scsi: remove MSG_*_TAG defines
scsi: remove scsi_set_tag_type
scsi: remove scsi_get_tag_type
scsi: never drop to untagged mode during queue ramp down
scsi: remove ->change_queue_type method
removal. This is possible by using a simple atomic_t for the counter,
rather than our fancy per-cpu counter: it turns out that no one is doing
a module increment per net packet, so the slowdown should be in the noise.
Also, script fixed for new git version.
Cheers,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUk3cQAAoJENkgDmzRrbjxr44P/25ZBYmKZZ3XM3flt2o0LCti
1Px+MRbWuXhueWQOYZSXOO3c2ENNuV3siaU4jQZqnxslpdvT4rVsVFkYuwva2vHT
hqpoq1Hz++yjFJArjERFOdoZ1gxkBbZQQGYm8esToAqU3b2Z74SrU48dPwp65q/1
r6hbXdWSiKALEBZeW2coi+QVCL/oxE8hmNqDO1mpe82aEKu0xIVpTdU5vAfBIj8/
Z95U2bx+CjiP7khhSjBGtltLqxL6QXw1m2eg1gO9nf1gJNI0/dAY6IJmFbGz+7Bt
CAyc9BRsB40Em8G7d7wr4FsURcLfmYNdjtx79j+Rot5PkVIi+Ztv7C1QYlMQESPa
ESddUMySOmKlzTm50w3ZLvV1ZTRU8TjmttSkzQYZ3csCLkKUgfeL9SAxU9KGoA2l
jFxrvDcWEHtuU1D/FeYyOofNaD/BflPfdhj4WAm9XnPPi+THEu7fulWJaIP4glHh
8TpYNbinXuZqXO4nJ41Ad5utbSbBQa4fFBUuViWRTU0TtWJT2HVqn/XoYJ5mnPEz
IbYh31rQDKFJKzePfscWrJ6XzoF59yGiAVcWcI3HS7aT8bFZGapAQu9mNCVu+cLF
uRxWrukHG7d8YeYrAtbVXWfxArR155V9QJN55hQ1nKLq2M03gNvYTtAPw2yEsfuw
u3Fk/KkV1RfaiFurjoG/
=rDum
-----END PGP SIGNATURE-----
Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module updates from Rusty Russell:
"The exciting thing here is the getting rid of stop_machine on module
removal. This is possible by using a simple atomic_t for the counter,
rather than our fancy per-cpu counter: it turns out that no one is
doing a module increment per net packet, so the slowdown should be in
the noise"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
param: do not set store func without write perm
params: cleanup sysfs allocation
kernel:module Fix coding style errors and warnings.
module: Remove stop_machine from module unloading
module: Replace module_ref with atomic_t refcnt
lib/bug: Use RCU list ops for module_bug_list
module: Unlink module with RCU synchronizing instead of stop_machine
module: Wait for RCU synchronizing before releasing a module
Pull nfsd updates from Bruce Fields:
"A comparatively quieter cycle for nfsd this time, but still with two
larger changes:
- RPC server scalability improvements from Jeff Layton (using RCU
instead of a spinlock to find idle threads).
- server-side NFSv4.2 ALLOCATE/DEALLOCATE support from Anna
Schumaker, enabling fallocate on new clients"
* 'for-3.19' of git://linux-nfs.org/~bfields/linux: (32 commits)
nfsd4: fix xdr4 count of server in fs_location4
nfsd4: fix xdr4 inclusion of escaped char
sunrpc/cache: convert to use string_escape_str()
sunrpc: only call test_bit once in svc_xprt_received
fs: nfsd: Fix signedness bug in compare_blob
sunrpc: add some tracepoints around enqueue and dequeue of svc_xprt
sunrpc: convert to lockless lookup of queued server threads
sunrpc: fix potential races in pool_stats collection
sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free it
sunrpc: require svc_create callers to pass in meaningful shutdown routine
sunrpc: have svc_wake_up only deal with pool 0
sunrpc: convert sp_task_pending flag to use atomic bitops
sunrpc: move rq_cachetype field to better optimize space
sunrpc: move rq_splice_ok flag into rq_flags
sunrpc: move rq_dropme flag into rq_flags
sunrpc: move rq_usedeferral flag to rq_flags
sunrpc: move rq_local field to rq_flags
sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to it
nfsd: minor off by one checks in __write_versions()
sunrpc: release svc_pool_map reference when serv allocation fails
...
Pull drm updates from Dave Airlie:
"Highlights:
- AMD KFD driver merge
This is the AMD HSA interface for exposing a lowlevel interface for
GPGPU use. They have an open source userspace built on top of this
interface, and the code looks as good as it was going to get out of
tree.
- Initial atomic modesetting work
The need for an atomic modesetting interface to allow userspace to
try and send a complete set of modesetting state to the driver has
arisen, and been suffering from neglect this past year. No more,
the start of the common code and changes for msm driver to use it
are in this tree. Ongoing work to get the userspace ioctl finished
and the code clean will probably wait until next kernel.
- DisplayID 1.3 and tiled monitor exposed to userspace.
Tiled monitor property is now exposed for userspace to make use of.
- Rockchip drm driver merged.
- imx gpu driver moved out of staging
Other stuff:
- core:
panel - MIPI DSI + new panels.
expose suggested x/y properties for virtual GPUs
- i915:
Initial Skylake (SKL) support
gen3/4 reset work
start of dri1/ums removal
infoframe tracking
fixes for lots of things.
- nouveau:
tegra k1 voltage support
GM204 modesetting support
GT21x memory reclocking work
- radeon:
CI dpm fixes
GPUVM improvements
Initial DPM fan control
- rcar-du:
HDMI support added
removed some support for old boards
slave encoder driver for Analog Devices adv7511
- exynos:
Exynos4415 SoC support
- msm:
a4xx gpu support
atomic helper conversion
- tegra:
iommu support
universal plane support
ganged-mode DSI support
- sti:
HDMI i2c improvements
- vmwgfx:
some late fixes.
- qxl:
use suggested x/y properties"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (969 commits)
drm: sti: fix module compilation issue
drm/i915: save/restore GMBUS freq across suspend/resume on gen4
drm: sti: correctly cleanup CRTC and planes
drm: sti: add HQVDP plane
drm: sti: add cursor plane
drm: sti: enable auxiliary CRTC
drm: sti: fix delay in VTG programming
drm: sti: prepare sti_tvout to support auxiliary crtc
drm: sti: use drm_crtc_vblank_{on/off} instead of drm_vblank_{on/off}
drm: sti: fix hdmi avi infoframe
drm: sti: remove event lock while disabling vblank
drm: sti: simplify gdp code
drm: sti: clear all mixer control
drm: sti: remove gpio for HDMI hot plug detection
drm: sti: allow to change hdmi ddc i2c adapter
drm/doc: Document drm_add_modes_noedid() usage
drm/i915: Remove '& 0xffff' from the mask given to WA_REG()
drm/i915: Invert the mask and val arguments in wa_add() and WA_REG()
drm: Zero out DRM object memory upon cleanup
drm/i915/bdw: Fix the write setting up the WIZ hashing mode
...
fixes, which should improve CPU utilization and potential soft lockups
under heavy memory pressure, and Eric Whitney's bigalloc fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJUiRUwAAoJENNvdpvBGATwltQP/3sjHtFw+RUvKgQ8vX9M2THk
4b9j0ja0mrD3ObTXUxdDuOh1q09MsfSUiOYK6KZOav3nO/dRODqZnWgXz/zJt3LC
R97s4velgzZi3F2ijnLiCo5RVZahN9xs8bUHZ85orMIr5wogwGdaUpnoqZSg0Ehr
PIFnTNORyNXBwEm3XPjUmENTdyq9FZ8DsS6ACFzgFi79QTSyJFEM4LAl2XaqwMGV
fVhNwnOGIyT8lHZAtDcobkaC86NjakmpW2Ip3p9/UEQtynh16UeVXKEO3K7CcQ+L
YJRDNnSIlGpR1OJp+v6QJPUd8q4fc/8JW9AxxsLak0eqkszuB+MxoQXOCFV5AWaf
jrs4TV3y0hCuB4OwuYUpnfcU1o+O7p39MqXMv8SA1ZBPbijN/LQSMErFtXj2oih6
3gJHUWLwELGeR+d9JlI29zxhOeOIotX255UBgj2oasQ0X3BW3qAgQ4LmP3QY90Pm
BUmxiMoIWB9N3kU4XQGf+Kyy8JeMLJj0frHDxI3XLz+B+IlWCCkBH6y3AD/a13kS
HHMMLOwHGEs0lYEKsm89dkcij5GuKd8eKT8Q0+CvKD9Z6HPdYvQxoazmF87Q6j/7
ZmshaVxtWaLpNbDaXVg+IgZifJAN0+mVzVHRhY9TSjx8k9qLdSgSEqYWjkSjx9Ij
nNB2zVrHZDMvZ7MCZy85
=ZrTc
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Lots of bugs fixes, including Zheng and Jan's extent status shrinker
fixes, which should improve CPU utilization and potential soft lockups
under heavy memory pressure, and Eric Whitney's bigalloc fixes"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (26 commits)
ext4: ext4_da_convert_inline_data_to_extent drop locked page after error
ext4: fix suboptimal seek_{data,hole} extents traversial
ext4: ext4_inline_data_fiemap should respect callers argument
ext4: prevent fsreentrance deadlock for inline_data
ext4: forbid journal_async_commit in data=ordered mode
jbd2: remove unnecessary NULL check before iput()
ext4: Remove an unnecessary check for NULL before iput()
ext4: remove unneeded code in ext4_unlink
ext4: don't count external journal blocks as overhead
ext4: remove never taken branch from ext4_ext_shift_path_extents()
ext4: create nojournal_checksum mount option
ext4: update comments regarding ext4_delete_inode()
ext4: cleanup GFP flags inside resize path
ext4: introduce aging to extent status tree
ext4: cleanup flag definitions for extent status tree
ext4: limit number of scanned extents in status tree shrinker
ext4: move handling of list of shrinkable inodes into extent status code
ext4: change LRU to round-robin in extent status tree shrinker
ext4: cache extent hole in extent status tree for ext4_da_map_blocks()
ext4: fix block reservation for bigalloc filesystems
...
This became a fairly large pull request. In addition to the usual
driver updates / fixes, there have been a high amount of cleanups in
ASoC area, as well as control API helpers and kernel documentations
fixes touching through the whole tree.
In the driver side, the biggest changes are the support for new Intel
SoC found on new x86 machines, and the updates of FireWire dice and
oxfw drivers.
Some remarkable items are below:
* ALSA core
- PCM mmap code cleanup, removal of arch-dependent codes
- PCM xrun injection support
- PCM hwptr tracepoint support
- Refactoring of snd_pcm_action(), simplification of PCM locking
- Robustified sequecner auto-load functionality
- New control API helpers and lots of cleanups along with them
- Lots of kerneldoc fixes and cleanups
* USB-audio
- The mixer resume code was largely rewritten, and the devices with
quirks are resumed properly.
- New hardware support: Focusrite Scarlett, Digidesign Mbox1,
Denon/Marantz DACs, Zoom R16/24
* FireWire
- DICE driver updates with better duplex and sync support, including
MIDI support
- New OXFW driver for Oxford Semiconductor FW970/971 chipset,
including the previous LaCie Speakers device. Fullduplex and MIDI
support included as well as DICE driver.
* HD-audio
- Refactoring the driver-caps quirk handling in snd-hda-intel
- More consistent control names representing the topology better
- Fixups: HP mute LED with ALC268 codec, Ideapad S210 built-in mic
fix, ASUS Z99He laptop EAPD
* ASoC
- Conversion of AC'97 drivers to use regmap, bringing us closer to
the removal of the ASoC level I/O code
- Clean up a lot of old drivers that were open coding things that
have subsequently been implemented in the core
- Some DAPM performance improvements
- Removal of the now seldom used CODEC mutex
- Lots of updates for the newer Intel SoC support, including support
for the DSP and some Cherrytrail and Braswell machine drivers
- Support for Samsung boards using rt5631 as the CODEC
- Removal of the obsolete AFEB9260 machine driver
- Driver support for the TI TS3A227E headset driver used in some
Chrombeooks
* Others
- ASIHPI driver update and cleanups
- Lots of dev_*() printk conversions
- Lots of trivial cleanups for the codes spotted by Coccinelle
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJUiYaqAAoJEGwxgFQ9KSmkeo0P/2aDx2w8iVi8n7Og/7VBubkm
VZkk08IOpP3h1ojyQRsBQPI0H5AquqQTZN1TJUDcy+6PD9vckYYcag9JWhA+0RBr
I+BfTMLB3E4umIkzOjxeoyOzheL7GoZ+eZYEm8DkAhaue+cFhjNJz+S6g8ENkxJ9
lSjErXQxyiowc39I0v1WBZcuq6glX1psEsVup9U8m7KhNx6lexj28A2MkqicW4hs
DZE6pYrk57W7y3+/NWxaBiglrItvScBAPpPqoyDm9zuDNTmAtGjf1uMRmRyHe30Z
iunHXki8Fc2yBBapmfYrcLC2jyIyZykcxniF8Hd4nXUvddisFUEFFhNmB6v392d0
4/NXSqTnsq48vm0Ezjia2LySWKZZVQtam8t9262BKHcosKYObxirekD6vijSoWO8
ZWoXa+U1oWSFEoOAFDsu6GFqFHFRi5VhqBgIaPEIxrT2MQGHL3KU1bp8CJi/5CTU
pNh0wC9SMtnSJJXBIP/nYH81WQxaik3c4eiHFPN4+0McBZQiIaIqMG6x+iiVNvPB
MNLLVAzk0QiWeCmSo8OBdjOV0/T+pfQ7lrTCn2B1jdJi1CkAO8m2SwQrG4PpRx8k
lUTBd4zTx5DYR+yPF69OyoCQg0XKjW9g62Qo5rmxrQreiidROZOBS1bljWzIPeft
otupLmK5kz67n3eB2eto
=sB6v
-----END PGP SIGNATURE-----
Merge tag 'sound-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"This became a fairly large pull request. In addition to the usual
driver updates / fixes, there have been a high amount of cleanups in
ASoC area, as well as control API helpers and kernel documentations
fixes touching through the whole tree.
In the driver side, the biggest changes are the support for new Intel
SoC found on new x86 machines, and the updates of FireWire dice and
oxfw drivers.
Some remarkable items are below:
ALSA core:
- PCM mmap code cleanup, removal of arch-dependent codes
- PCM xrun injection support
- PCM hwptr tracepoint support
- Refactoring of snd_pcm_action(), simplification of PCM locking
- Robustified sequecner auto-load functionality
- New control API helpers and lots of cleanups along with them
- Lots of kerneldoc fixes and cleanups
USB-audio:
- The mixer resume code was largely rewritten, and the devices with
quirks are resumed properly.
- New hardware support: Focusrite Scarlett, Digidesign Mbox1,
Denon/Marantz DACs, Zoom R16/24
FireWire:
- DICE driver updates with better duplex and sync support, including
MIDI support
- New OXFW driver for Oxford Semiconductor FW970/971 chipset,
including the previous LaCie Speakers device. Fullduplex and MIDI
support included as well as DICE driver.
HD-audio:
- Refactoring the driver-caps quirk handling in snd-hda-intel
- More consistent control names representing the topology better
- Fixups: HP mute LED with ALC268 codec, Ideapad S210 built-in mic
fix, ASUS Z99He laptop EAPD
ASoC:
- Conversion of AC'97 drivers to use regmap, bringing us closer to
the removal of the ASoC level I/O code
- Clean up a lot of old drivers that were open coding things that
have subsequently been implemented in the core
- Some DAPM performance improvements
- Removal of the now seldom used CODEC mutex
- Lots of updates for the newer Intel SoC support, including support
for the DSP and some Cherrytrail and Braswell machine drivers
- Support for Samsung boards using rt5631 as the CODEC
- Removal of the obsolete AFEB9260 machine driver
- Driver support for the TI TS3A227E headset driver used in some
Chrombeooks
Others:
- ASIHPI driver update and cleanups
- Lots of dev_*() printk conversions
- Lots of trivial cleanups for the codes spotted by Coccinelle"
* tag 'sound-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (594 commits)
ALSA: pcxhr: NULL dereference on probe failure
ALSA: lola: NULL dereference on probe failure
ALSA: hda - Add "eapd" model string for AD1986A codec
ALSA: hda - Add EAPD fixup for ASUS Z99He laptop
ALSA: oxfw: Add hwdep interface
ALSA: oxfw: Add support for capture/playback MIDI messages
ALSA: oxfw: add support for capturing PCM samples
ALSA: oxfw: Add support AMDTP in-stream
ALSA: oxfw: Add support for Behringer/Mackie devices
ALSA: oxfw: Change the way to start stream
ALSA: oxfw: Add proc interface for debugging purpose
ALSA: oxfw: Change the way to make PCM rules/constraints
ALSA: oxfw: Add support for AV/C stream format command to get/set supported stream formation
ALSA: oxfw: Change the way to name card
ALSA: dice: Add support for MIDI capture/playback
ALSA: dice: Add support for capturing PCM samples
ALSA: dice: Support for non SYT-Match sampling clock source mode
ALSA: dice: Add support for duplex streams with synchronization
ALSA: dice: Change the way to start stream
ALSA: jack: Add dummy snd_jack_set_key() definition
...
to the trace_seq code. It also removed the return values to the
trace_seq_*() functions and use trace_seq_has_overflowed() to see if
the buffer filled up or not. This is similar to work being done to the
seq_file code as well in another tree.
Some of the other goodies include:
o Added some "!" (NOT) logic to the tracing filter.
o Fixed the frame pointer logic to the x86_64 mcount trampolines
o Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems.
That is, the ftrace trampoline can be dynamically allocated
and be called directly by functions that only have a single hook
to them.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJUhbLGAAoJEEjnJuOKh9ldRV4H/3NcLbgGB2iu96la1zdYE6pG
Q7cDJMxXK80YIIL70h9G0IItcD4t62LMb72lfBnMGRj3msgFb3AgISW57EuI0Pxk
xk24wuIPoTG2S7v9sc3SboNFwO8qbtIjxD2OBmqIUrGo2sZIiGjyj3gX7mCY3uzL
WB2bUOSFz/22OgaANinR5EELHA3pZZCf54Vz1K9ndmtK0xp0j1a7xJShD6TrMdYv
mZ3zH5ViIhW4A3mdcMceh6fy2JLQAiEKF0uPTvcMMz7NlVul0mxyL/+10P7AE/3R
Ehw4fzmm4NDshPDtBOkKH0LsppgXzuItFuQUTpact3JlqTg++bV6onSsrkt1hlY=
=Z7Cm
-----END PGP SIGNATURE-----
Merge tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"There was a lot of clean ups and minor fixes. One of those clean ups
was to the trace_seq code. It also removed the return values to the
trace_seq_*() functions and use trace_seq_has_overflowed() to see if
the buffer filled up or not. This is similar to work being done to
the seq_file code as well in another tree.
Some of the other goodies include:
- Added some "!" (NOT) logic to the tracing filter.
- Fixed the frame pointer logic to the x86_64 mcount trampolines
- Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems.
That is, the ftrace trampoline can be dynamically allocated and be
called directly by functions that only have a single hook to them"
* tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (55 commits)
tracing: Truncated output is better than nothing
tracing: Add additional marks to signal very large time deltas
Documentation: describe trace_buf_size parameter more accurately
tracing: Allow NOT to filter AND and OR clauses
tracing: Add NOT to filtering logic
ftrace/fgraph/x86: Have prepare_ftrace_return() take ip as first parameter
ftrace/x86: Get rid of ftrace_caller_setup
ftrace/x86: Have save_mcount_regs macro also save stack frames if needed
ftrace/x86: Add macro MCOUNT_REG_SIZE for amount of stack used to save mcount regs
ftrace/x86: Simplify save_mcount_regs on getting RIP
ftrace/x86: Have save_mcount_regs store RIP in %rdi for first parameter
ftrace/x86: Rename MCOUNT_SAVE_FRAME and add more detailed comments
ftrace/x86: Move MCOUNT_SAVE_FRAME out of header file
ftrace/x86: Have static tracing also use ftrace_caller_setup
ftrace/x86: Have static function tracing always test for function graph
kprobes: Add IPMODIFY flag to kprobe_ftrace_ops
ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict
kprobes/ftrace: Recover original IP if pre_handler doesn't change it
tracing/trivial: Fix typos and make an int into a bool
tracing: Deletion of an unnecessary check before iput()
...
Highlights include:
Features:
- NFSv4.2 client support for hole punching and preallocation.
- Further RPC/RDMA client improvements.
- Add more RPC transport debugging tracepoints.
- Add RPC debugging tools in debugfs.
Bugfixes:
- Stable fix for layoutget error handling
- Fix a change in COMMIT behaviour resulting from the recent io code updates
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUhRVTAAoJEGcL54qWCgDyfeUP/RoFo3ImTMbGxfcPJqoELjcO
lZbQ+27pOE/whFDkWgiOVTwlgGct5a0WRo7GCZmpYJA4q1kmSv4ngTb3nMTCUztt
xMJ0mBr0BqttVs+ouKiVPm3cejQXedEhttwWcloIXS8lNenlpL29Zlrx2NHdU8UU
13+souocj0dwIyTYYS/4Lm9KpuCYnpDBpP5ShvQjVaMe/GxJo6GyZu70c7FgwGNz
Nh9onzZV3mz1elhfizlV38aVA7KWVXtLWIqOFIKlT2fa4nWB8Hc07miR5UeOK0/h
r+icnF2qCQe83MbjOxYNxIKB6uiA/4xwVc90X4AQ7F0RX8XPWHIQWG5tlkC9jrCQ
3RGzYshWDc9Ud2mXtLMyVQxHVVYlFAe1WtdP8ZWb1oxDInmhrarnWeNyECz9xGKu
VzIDZzeq9G8slJXATWGRfPsYr+Ihpzcen4QQw58cakUBcqEJrYEhlEOfLovM71k3
/S/jSHBAbQqiw4LPMw87bA5A6+ZKcVSsNE0XCtNnhmqFpLc1kKRrl5vaN+QMk5tJ
v4/zR0fPqH7SGAJWYs4brdfahyejEo0TwgpDs7KHmu1W9zQ0LCVTaYnQuUmQjta6
WyYwIy3TTibdfR191O0E3NOW82Q/k/NBD6ySvabN9HqQ9eSk6+rzrWAslXCbYohb
BJfzcQfDdx+lsyhjeTx9
=wOP3
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.19-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Features:
- NFSv4.2 client support for hole punching and preallocation.
- Further RPC/RDMA client improvements.
- Add more RPC transport debugging tracepoints.
- Add RPC debugging tools in debugfs.
Bugfixes:
- Stable fix for layoutget error handling
- Fix a change in COMMIT behaviour resulting from the recent io code
updates"
* tag 'nfs-for-3.19-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (28 commits)
sunrpc: add a debugfs rpc_xprt directory with an info file in it
sunrpc: add debugfs file for displaying client rpc_task queue
nfs: Add DEALLOCATE support
nfs: Add ALLOCATE support
NFS: Clean up nfs4_init_callback()
NFS: SETCLIENTID XDR buffer sizes are incorrect
SUNRPC: serialize iostats updates
xprtrdma: Display async errors
xprtrdma: Enable pad optimization
xprtrdma: Re-write rpcrdma_flush_cqs()
xprtrdma: Refactor tasklet scheduling
xprtrdma: unmap all FMRs during transport disconnect
xprtrdma: Cap req_cqinit
xprtrdma: Return an errno from rpcrdma_register_external()
nfs: define nfs_inc_fscache_stats and using it as possible
nfs: replace nfs_add_stats with nfs_inc_stats when add one
NFS: Deletion of unnecessary checks before the function call "nfs_put_client"
sunrpc: eliminate RPC_TRACEPOINTS
sunrpc: eliminate RPC_DEBUG
lockd: eliminate LOCKD_DEBUG
...
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle are:
- 'Nested Sleep Debugging', activated when CONFIG_DEBUG_ATOMIC_SLEEP=y.
This instruments might_sleep() checks to catch places that nest
blocking primitives - such as mutex usage in a wait loop. Such
bugs can result in hard to debug races/hangs.
Another category of invalid nesting that this facility will detect
is the calling of blocking functions from within schedule() ->
sched_submit_work() -> blk_schedule_flush_plug().
There's some potential for false positives (if secondary blocking
primitives themselves are not ready yet for this facility), but the
kernel will warn once about such bugs per bootup, so the warning
isn't much of a nuisance.
This feature comes with a number of fixes, for problems uncovered
with it, so no messages are expected normally.
- Another round of sched/numa optimizations and refinements, for
CONFIG_NUMA_BALANCING=y.
- Another round of sched/dl fixes and refinements.
Plus various smaller fixes and cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
sched: Add missing rcu protection to wake_up_all_idle_cpus
sched/deadline: Introduce start_hrtick_dl() for !CONFIG_SCHED_HRTICK
sched/numa: Init numa balancing fields of init_task
sched/deadline: Remove unnecessary definitions in cpudeadline.h
sched/cpupri: Remove unnecessary definitions in cpupri.h
sched/deadline: Fix rq->dl.pushable_tasks bug in push_dl_task()
sched/fair: Fix stale overloaded status in the busiest group finding logic
sched: Move p->nr_cpus_allowed check to select_task_rq()
sched/completion: Document when to use wait_for_completion_io_*()
sched: Update comments about CLONE_NEWUTS and CLONE_NEWIPC
sched/fair: Kill task_struct::numa_entry and numa_group::task_list
sched: Refactor task_struct to use numa_faults instead of numa_* pointers
sched/deadline: Don't check CONFIG_SMP in switched_from_dl()
sched/deadline: Reschedule from switched_from_dl() after a successful pull
sched/deadline: Push task away if the deadline is equal to curr during wakeup
sched/deadline: Add deadline rq status print
sched/deadline: Fix artificial overrun introduced by yield_task_dl()
sched/rt: Clean up check_preempt_equal_prio()
sched/core: Use dl_bw_of() under rcu_read_lock_sched()
sched: Check if we got a shallowest_idle_cpu before searching for least_loaded_cpu
...
Pull RCU updates from Ingo Molnar:
"These are the main changes in this cycle:
- Streamline RCU's use of per-CPU variables, shifting from "cpu"
arguments to functions to "this_"-style per-CPU variable
accessors.
- signal-handling RCU updates.
- real-time updates.
- torture-test updates.
- miscellaneous fixes.
- documentation updates"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
rcu: Fix FIXME in rcu_tasks_kthread()
rcu: More info about potential deadlocks with rcu_read_unlock()
rcu: Optimize cond_resched_rcu_qs()
rcu: Add sparse check for RCU_INIT_POINTER()
documentation: memory-barriers.txt: Correct example for reorderings
documentation: Add atomic_long_t to atomic_ops.txt
documentation: Additional restriction for control dependencies
documentation: Document RCU self test boot params
rcutorture: Fix rcu_torture_cbflood() memory leak
rcutorture: Remove obsolete kversion param in kvm.sh
rcutorture: Remove stale test configurations
rcutorture: Enable RCU self test in configs
rcutorture: Add early boot self tests
torture: Run Linux-kernel binary out of results directory
cpu: Avoid puts_pending overflow
rcu: Remove "cpu" argument to rcu_cleanup_after_idle()
rcu: Remove "cpu" argument to rcu_prepare_for_idle()
rcu: Remove "cpu" argument to rcu_needs_cpu()
rcu: Remove "cpu" argument to rcu_note_context_switch()
rcu: Remove "cpu" argument to rcu_preempt_check_callbacks()
...
These were useful when I was tracking down a race condition between
svc_xprt_do_enqueue and svc_get_next_xprt.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Testing has shown that the pool->sp_lock can be a bottleneck on a busy
server. Every time data is received on a socket, the server must take
that lock in order to dequeue a thread from the sp_threads list.
Address this problem by eliminating the sp_threads list (which contains
threads that are currently idle) and replacing it with a RQ_BUSY flag in
svc_rqst. This allows us to walk the sp_all_threads list under the
rcu_read_lock and find a suitable thread for the xprt by doing a
test_and_set_bit.
Note that we do still have a potential atomicity problem however with
this approach. We don't want svc_xprt_do_enqueue to set the
rqst->rq_xprt pointer unless a test_and_set_bit of RQ_BUSY returned
zero (which indicates that the thread was idle). But, by the time we
check that, the bit could be flipped by a waking thread.
To address this, we acquire a new per-rqst spinlock (rq_lock) and take
that before doing the test_and_set_bit. If that returns false, then we
can set rq_xprt and drop the spinlock. Then, when the thread wakes up,
it must set the bit under the same spinlock and can trust that if it was
already set then the rq_xprt is also properly set.
With this scheme, the case where we have an idle thread no longer needs
to take the highly contended pool->sp_lock at all, and that removes the
bottleneck.
That still leaves one issue: What of the case where we walk the whole
sp_all_threads list and don't find an idle thread? Because the search is
lockess, it's possible for the queueing to race with a thread that is
going to sleep. To address that, we queue the xprt and then search again.
If we find an idle thread at that point, we can't attach the xprt to it
directly since that might race with a different thread waking up and
finding it. All we can do is wake the idle thread back up and let it
attempt to find the now-queued xprt.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Tested-by: Chris Worley <chris.worley@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
...also make the manipulation of sp_all_threads list use RCU-friendly
functions.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Tested-by: Chris Worley <chris.worley@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In a later patch, we're going to need some atomic bit flags. Since that
field will need to be an unsigned long, we mitigate that space
consumption by migrating some other bitflags to the new field. Start
with the rq_secure flag.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
For SPI drivers use the message definitions from scsi.h, and for target
drivers introduce a new TCM_*_TAG namespace.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com
In this commit we discard the lru algorithm for inodes with extent
status tree because it takes significant effort to maintain a lru list
in extent status tree shrinker and the shrinker can take a long time to
scan this lru list in order to reclaim some objects.
We replace the lru ordering with a simple round-robin. After that we
never need to keep a lru list. That means that the list needn't be
sorted if the shrinker can not reclaim any objects in the first round.
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Currently extent status tree doesn't cache extent hole when a write
looks up in extent tree to make sure whether a block has been allocated
or not. In this case, we don't put extent hole in extent cache because
later this extent might be removed and a new delayed extent might be
added back. But it will cause a defect when we do a lot of writes. If
we don't put extent hole in extent cache, the following writes also need
to access extent tree to look at whether or not a block has been
allocated. It brings a cache miss. This commit fixes this defect.
Also if the inode doesn't have any extent, this extent hole will be
cached as well.
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
For bigalloc filesystems we have to check whether newly requested inode
block isn't already part of a cluster for which we already have delayed
allocation reservation. This check happens in ext4_ext_map_blocks() and
that function sets EXT4_MAP_FROM_CLUSTER if that's the case. However if
ext4_da_map_blocks() finds in extent cache information about the block,
we don't call into ext4_ext_map_blocks() and thus we always end up
getting new reservation even if the space for cluster is already
reserved. This results in overreservation and premature ENOSPC reports.
Fix the problem by checking for existing cluster reservation already in
ext4_da_map_blocks(). That simplifies the logic and actually allows us
to get rid of the EXT4_MAP_FROM_CLUSTER flag completely.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
SPC-3 defines SERVICE ACTION IN(12) and SERVICE ACTION IN(16).
So rename SERVICE_ACTION_IN to SERVICE_ACTION_IN_16 to be
consistent with SPC and to allow for better distinction.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Tested-by: Robert Elliott <elliott@hp.com>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Add tracepoints inside the main loop on xs_tcp_data_recv that allow
us to keep an eye on what's happening during each phase of it.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
...so we can keep track of when calls are sent and replies received.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
...just around svc_send, svc_recv and svc_process for now.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If the trace_seq of ftrace_raw_output_prep() is full this function
returns TRACE_TYPE_PARTIAL_LINE, otherwise it returns zero.
The problem is that TRACE_TYPE_PARTIAL_LINE happens to be zero!
The thing is, the caller of ftrace_raw_output_prep() expects a
success to be zero. Change that to expect it to be
TRACE_TYPE_HANDLED.
Link: http://lkml.kernel.org/r/20141114112522.GA2988@dhcp128.suse.cz
Reminded-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Adding a trace_seq_has_overflowed() which returns true if the trace_seq
had too much written into it allows us to simplify the code.
Instead of checking the return value of every call to trace_seq_printf()
and friends, they can all be called normally, and at the end we can
return !trace_seq_has_overflowed() instead.
Several functions also return TRACE_TYPE_PARTIAL_LINE when the trace_seq
overflowed and TRACE_TYPE_HANDLED otherwise. Another helper function
was created called trace_handle_return() which takes a trace_seq and
returns these enums. Using this helper function also simplifies the
code.
This change also makes it possible to remove the return values of
trace_seq_printf() and friends. They should instead just be
void functions.
Link: http://lkml.kernel.org/r/20141114011410.365183157@goodmis.org
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The highlights in this pull request are:
* IOMMU support: The Tegra DRM driver can now deal with discontiguous
buffers if an IOMMU exists in the system. That means it can allocate
using drm_gem_get_pages() and will map them into IOVA space via the
IOMMU API. Similarly, non-contiguous PRIME buffers can be imported
from a different driver, which allows better integration with gk20a
(nouveau) and less hacks.
* Universal planes: This is precursory work for atomic modesetting and
will allow hardware cursor support to be implemented on pre-Tegra114
where RGB cursors were not supported.
* DSI ganged-mode support: The DSI controller can now gang up with a
second DSI controller to drive high resolution DSI panels.
Besides those bigger changes there is a slew of fixes, cleanups, plugged
memory leaks and so on.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJUZM29AAoJEN0jrNd/PrOhd1EP/iGBGppcPiYhFI6CC2V5IyGO
j4GaNU656QQj0RNS3RH0Oby0oHdQum2rFNtHnkGYjoXFiSznId3OwVQ1+Y1s5804
BkPSR1Q3fyIfsQdGA9DEkVGuyavCEbJ9yOalIBLda456nxfkPFBJdNjq5AJDT2N1
J54MSRtV3fV5Uerd7WbmiNdLyuly4Gyyb7ApotOQEsfYvaGgobdpMRGyp38tvYbD
pNDZ69iYBSJmaVaF1a/NxFw3/25CSHakY5J95R9eXK1Y3BKDBhqHo7b1L1XMt1L5
yKEy+eqjnnB7/itszjKG3dnMHunKsch9C+nyxR4xKMf036Pesz65tMbg07Pd0cIy
oYZMDGdm380d0mu41LydN7zK/ZZf6bBfcZallnxk1CSEQB6BcMZhOmQP2aa8r9rU
VdaNGlNio7XAjVGDsd8Y652y27NH7VJTpx3nxXB0f7eyGg7AlfLKxOFehDE+beVJ
OAzRQrHJ63vOIAUg21G84W4cvpsVSG4FomgRTXC8Se6WcwP3TWD5MmOzLYNjbFnb
ayuIiIfNtyu2KJU60hCOqWQg05UcWIYRkvxmdnQQcFyItmw4qJzh9ep7ebAqTx0t
0p0y5/O7KGYKS1pB7o1XJtL84N7SPiNGB3fdwiGryl9Z7hypuhKS7/lRBDTiiTAd
Ok1HHSRDxTaiGhrN3TKH
=v4QW
-----END PGP SIGNATURE-----
Merge tag 'drm/tegra/for-3.19-rc1' of git://people.freedesktop.org/~tagr/linux into drm-next
drm/tegra: Changes for v3.19-rc1
The highlights in this pull request are:
* IOMMU support: The Tegra DRM driver can now deal with discontiguous
buffers if an IOMMU exists in the system. That means it can allocate
using drm_gem_get_pages() and will map them into IOVA space via the
IOMMU API. Similarly, non-contiguous PRIME buffers can be imported
from a different driver, which allows better integration with gk20a
(nouveau) and less hacks.
* Universal planes: This is precursory work for atomic modesetting and
will allow hardware cursor support to be implemented on pre-Tegra114
where RGB cursors were not supported.
* DSI ganged-mode support: The DSI controller can now gang up with a
second DSI controller to drive high resolution DSI panels.
Besides those bigger changes there is a slew of fixes, cleanups, plugged
memory leaks and so on.
* tag 'drm/tegra/for-3.19-rc1' of git://people.freedesktop.org/~tagr/linux: (44 commits)
drm/tegra: gem: Check before freeing CMA memory
drm/tegra: fb: Add error codes to error messages
drm/tegra: fb: Properly release GEM objects on failure
drm/tegra: Detach panel when a connector is removed
drm/tegra: Plug memory leak
drm/tegra: gem: Use more consistent data types
drm/tegra: fb: Do not destroy framebuffer
drm/tegra: gem: dumb: pitch and size are outputs
drm/tegra: Enable the hotplug interrupt only when necessary
drm/tegra: dc: Universal plane support
drm/tegra: dc: Registers are 32 bits wide
drm/tegra: dc: Factor out DC, window and cursor commit
drm/tegra: Add IOMMU support
drm/tegra: Fix error handling cleanup
drm/tegra: gem: Use dma_mmap_writecombine()
drm/tegra: gem: Remove redundant drm_gem_free_mmap_offset()
drm/tegra: gem: Cleanup tegra_bo_create_with_handle()
drm/tegra: gem: Extract tegra_bo_alloc_object()
drm/tegra: dsi: Set up PHY_TIMING & BTA_TIMING registers earlier
drm/tegra: dsi: Replace 1000000 by USEC_PER_SEC
...
Rather than cast to a u32 use the struct host1x_bo pointers directly.
This avoid annoying warnings for 64-bit builds.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Replace module_ref per-cpu complex reference counter with
an atomic_t simple refcnt. This is for code simplification.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This function has no more non regmap user, which means we can remove the
implementation of the function and associated functions and structure
fields.
For convenience we keep a static inline version of the function that
forwards calls to regcache_sync() unconditionally.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Pull two RCU fixes from Paul E. McKenney:
" - Complete the work of commit dd56af42bd (rcu: Eliminate deadlock
between CPU hotplug and expedited grace periods), which was
intended to allow synchronize_sched_expedited() to be safely
used when holding locks acquired by CPU-hotplug notifiers.
This commit makes the put_online_cpus() avoid the deadlock
instead of just handling the get_online_cpus().
- Complete the work of commit 35ce7f29a4 (rcu: Create rcuo
kthreads only for onlined CPUs), which was intended to allow
RCU to avoid allocating unneeded kthreads on systems where the
firmware says that there are more CPUs than are really present.
This commit makes rcu_barrier() aware of the mismatch, so that
it doesn't hang waiting for non-existent CPUs. "
Signed-off-by: Ingo Molnar <mingo@kernel.org>
PREEMPT_RCU and TREE_PREEMPT_RCU serve the same function after
TINY_PREEMPT_RCU has been removed. This patch removes TREE_PREEMPT_RCU
and uses PREEMPT_RCU config option in its place.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Commit 35ce7f29a4 (rcu: Create rcuo kthreads only for onlined CPUs)
avoids creating rcuo kthreads for CPUs that never come online. This
fixes a bug in many instances of firmware: Instead of lying about their
age, these systems instead lie about the number of CPUs that they have.
Before commit 35ce7f29a4, this could result in huge numbers of useless
rcuo kthreads being created.
It appears that experience indicates that I should have told the
people suffering from this problem to fix their broken firmware, but
I instead produced what turned out to be a partial fix. The missing
piece supplied by this commit makes sure that rcu_barrier() knows not to
post callbacks for no-CBs CPUs that have not yet come online, because
otherwise rcu_barrier() will hang on systems having firmware that lies
about the number of CPUs.
It is tempting to simply have rcu_barrier() refuse to post a callback on
any no-CBs CPU that does not have an rcuo kthread. This unfortunately
does not work because rcu_barrier() is required to wait for all pending
callbacks. It is therefore required to wait even for those callbacks
that cannot possibly be invoked. Even if doing so hangs the system.
Given that posting a callback to a no-CBs CPU that does not yet have an
rcuo kthread can hang rcu_barrier(), It is tempting to report an error
in this case. Unfortunately, this will result in false positives at
boot time, when it is perfectly legal to post callbacks to the boot CPU
before the scheduler has started, in other words, before it is legal
to invoke rcu_barrier().
So this commit instead has rcu_barrier() avoid posting callbacks to
CPUs having neither rcuo kthread nor pending callbacks, and has it
complain bitterly if it finds CPUs having no rcuo kthread but some
pending callbacks. And when rcu_barrier() does find CPUs having no rcuo
kthread but pending callbacks, as noted earlier, it has no choice but
to hang indefinitely.
Reported-by: Yanko Kaneti <yaneti@declera.com>
Reported-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Reported-by: Meelis Roos <mroos@linux.ee>
Reported-by: Eric B Munson <emunson@akamai.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Eric B Munson <emunson@akamai.com>
Tested-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Tested-by: Yanko Kaneti <yaneti@declera.com>
Tested-by: Kevin Fenzi <kevin@scrye.com>
Tested-by: Meelis Roos <mroos@linux.ee>
task_preempt_count() has nothing to do with the actual preempt counter,
thread_info->saved_preempt_count is only valid right after switch_to().
__trace_sched_switch_state() can use preempt_count(), prev is still the
current task when trace_sched_switch() is called.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[ Added BUG_ON(). ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20141007195108.GB28002@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull thermal management updates from Zhang Rui:
"Sorry that I missed the merge window as there is a bug found in the
last minute, and I have to fix it and wait for the code to be tested
in linux-next tree for a few days. Now the buggy patch has been
dropped entirely from my next branch. Thus I hope those changes can
still be merged in 3.18-rc2 as most of them are platform thermal
driver changes.
Specifics:
- introduce ACPI INT340X thermal drivers.
Newer laptops and tablets may have thermal sensors and other
devices with thermal control capabilities that are exposed for the
OS to use via the ACPI INT340x device objects. Several drivers are
introduced to expose the temperature information and cooling
ability from these objects to user-space via the normal thermal
framework.
From: Lu Aaron, Lan Tianyu, Jacob Pan and Zhang Rui.
- introduce a new thermal governor, which just uses a hysteresis to
switch abruptly on/off a cooling device. This governor can be used
to control certain fan devices that can not be throttled but just
switched on or off. From: Peter Feuerer.
- introduce support for some new thermal interrupt functions on
i.MX6SX, in IMX thermal driver. From: Anson, Huang.
- introduce tracing support on thermal framework. From: Punit
Agrawal.
- small fixes in OF thermal and thermal step_wise governor"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (25 commits)
Thermal: int340x thermal: select ACPI fan driver
Thermal: int3400_thermal: use acpi_thermal_rel parsing APIs
Thermal: int340x_thermal: expose acpi thermal relationship tables
Thermal: introduce int3403 thermal driver
Thermal: introduce INT3402 thermal driver
Thermal: move the KELVIN_TO_MILLICELSIUS macro to thermal.h
ACPI / Fan: support INT3404 thermal device
ACPI / Fan: add ACPI 4.0 style fan support
ACPI / fan: convert to platform driver
ACPI / fan: use acpi_device_xxx_power instead of acpi_bus equivelant
ACPI / fan: remove no need check for device pointer
ACPI / fan: remove unused macro
Thermal: int3400 thermal: register to thermal framework
Thermal: int3400 thermal: add capability to detect supporting UUIDs
Thermal: introduce int3400 thermal driver
ACPI: add ACPI_TYPE_LOCAL_REFERENCE support to acpi_extract_package()
ACPI: make acpi_create_platform_device() an external API
thermal: step_wise: fix: Prevent from binary overflow when trend is dropping
ACPI: introduce ACPI int340x thermal scan handler
thermal: Added Bang-bang thermal governor
...
optimizations.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJUPlLCAAoJENNvdpvBGATwpN8P/jnbDL1RqM9ZEAWfbDhvYumR
Fi59b3IDzSJHuuJeP0nTblVbbWclpO9ljCd18ttsHr8gBXA0ViaEU0XvWbpHIwPN
1fr1/Ovd0wvBdIVdLlaLXTR9skH4lbkiXxv/tkfjVCOSpzqiKID98Z72e/gUjB7Z
8xjAn/mTCnXKnhqMGzi8RC2MP1wgY//ErR21bj6so/8RC8zu4P6JuVj/hI6s0y5i
IPtAmjhdM7nxnS0wJwj7dLT0yNDftDh69qE6CgIwyK+Xn/SZFgYwE6+l02dj3DET
ZcAzTT9ToTMJdWtMu+5Y4LY8ObJ5xqMPbMoUclQ3DWe6nZicvtcBVCjfG/J8pFlY
IFD0nfh/OpX9cQMwJ+5Y8P4TrMiqM+FfuLfu+X83gLyrAyIazwoaZls2lxlEyC0w
M25oAqeKGUeVakVlmDZlVyBf05cu5m62x1rRvpcwMXMNhJl8/xwsSdhdYGeJfbO0
0MfL1n6GmvHvouMXKNsXlat/w3QVaQWVRzqdF9x7Q730fSHC/zxVGO+Po3jz2fBd
fBdfE14BIIU7nkyBVy0CZG5SDmQW4YACocOv/ATmII9j76F9eZQ3zsA8J1x+dLmJ
dP1Uxvsn1C3HW8Ua239j0XUJncglb06iEId0ywdkmWcc1rbzsyZ/NzXN/QBdZmqB
9g4GKAXAyh15PeBTJ5K/
=vWic
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"A large number of cleanups and bug fixes, with some (minor) journal
optimizations"
[ This got sent to me before -rc1, but was stuck in my spam folder. - Linus ]
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (67 commits)
ext4: check s_chksum_driver when looking for bg csum presence
ext4: move error report out of atomic context in ext4_init_block_bitmap()
ext4: Replace open coded mdata csum feature to helper function
ext4: delete useless comments about ext4_move_extents
ext4: fix reservation overflow in ext4_da_write_begin
ext4: add ext4_iget_normal() which is to be used for dir tree lookups
ext4: don't orphan or truncate the boot loader inode
ext4: grab missed write_count for EXT4_IOC_SWAP_BOOT
ext4: optimize block allocation on grow indepth
ext4: get rid of code duplication
ext4: fix over-defensive complaint after journal abort
ext4: fix return value of ext4_do_update_inode
ext4: fix mmap data corruption when blocksize < pagesize
vfs: fix data corruption when blocksize < pagesize for mmaped data
ext4: fold ext4_nojournal_sops into ext4_sops
ext4: support freezing ext2 (nojournal) file systems
ext4: fold ext4_sync_fs_nojournal() into ext4_sync_fs()
ext4: don't check quota format when there are no quota files
jbd2: simplify calling convention around __jbd2_journal_clean_checkpoint_list
jbd2: avoid pointless scanning of checkpoint lists
...
Pull RCU updates from Ingo Molnar:
"The main changes in this cycle were:
- changes related to No-CBs CPUs and NO_HZ_FULL
- RCU-tasks implementation
- torture-test updates
- miscellaneous fixes
- locktorture updates
- RCU documentation updates"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits)
workqueue: Use cond_resched_rcu_qs macro
workqueue: Add quiescent state between work items
locktorture: Cleanup header usage
locktorture: Cannot hold read and write lock
locktorture: Fix __acquire annotation for spinlock irq
locktorture: Support rwlocks
rcu: Eliminate deadlock between CPU hotplug and expedited grace periods
locktorture: Document boot/module parameters
rcutorture: Rename rcutorture_runnable parameter
locktorture: Add test scenario for rwsem_lock
locktorture: Add test scenario for mutex_lock
locktorture: Make torture scripting account for new _runnable name
locktorture: Introduce torture context
locktorture: Support rwsems
locktorture: Add infrastructure for torturing read locks
torture: Address race in module cleanup
locktorture: Make statistics generic
locktorture: Teach about lock debugging
locktorture: Support mutexes
locktorture: Add documentation
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUNZK4AAoJEAAOaEEZVoIVI08P/iM7eaIVRnqaqtWw/JBzxiba
EMDlJYUBSlv6lYk9s8RJT4bMmcmGAKSYzVAHSoPahzNcqTDdFLeDTLGxJ8uKBbjf
d1qRRdH1yZHGUzCvJq3mEendjfXn435Y3YburUxjLfmzrzW7EbMvndiQsS5dhAm9
PEZ+wrKF/zFL7LuXa1YznYrbqOD/GRsJAXGEWc3kNwfS9avephVG/RI3GtpI2PJj
RY1mf8P7+WOlrShYoEuUo5aqs01MnU70LbqGHzY8/QKH+Cb0SOkCHZPZyClpiA+G
MMJ+o2XWcif3BZYz+dobwz/FpNZ0Bar102xvm2E8fqByr/T20JFjzooTKsQ+PtCk
DetQptrU2gtyZDKtInJUQSDPrs4cvA13TW+OEB1tT8rKBnmyEbY3/TxBpBTB9E6j
eb/V3iuWnywR3iE+yyvx24Qe7Pov6deM31s46+Vj+GQDuWmAUJXemhfzPtZiYpMT
exMXTyDS3j+W+kKqHblfU5f+Bh1eYGpG2m43wJVMLXKV7NwDf8nVV+Wea962ga+w
BAM3ia4JRVgRWJBPsnre3lvGT5kKPyfTZsoG+kOfRxiorus2OABoK+SIZBZ+c65V
Xh8VH5p3qyCUBOynXlHJWFqYWe2wH0LfbPrwe9dQwTwON51WF082EMG5zxTG0Ymf
J2z9Shz68zu0ok8cuSlo
=Hhee
-----END PGP SIGNATURE-----
Merge tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux
Pull file locking related changes from Jeff Layton:
"This release is a little more busy for file locking changes than the
last:
- a set of patches from Kinglong Mee to fix the lockowner handling in
knfsd
- a pile of cleanups to the internal file lease API. This should get
us a bit closer to allowing for setlease methods that can block.
There are some dependencies between mine and Bruce's trees this cycle,
and I based my tree on top of the requisite patches in Bruce's tree"
* tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux: (26 commits)
locks: fix fcntl_setlease/getlease return when !CONFIG_FILE_LOCKING
locks: flock_make_lock should return a struct file_lock (or PTR_ERR)
locks: set fl_owner for leases to filp instead of current->files
locks: give lm_break a return value
locks: __break_lease cleanup in preparation of allowing direct removal of leases
locks: remove i_have_this_lease check from __break_lease
locks: move freeing of leases outside of i_lock
locks: move i_lock acquisition into generic_*_lease handlers
locks: define a lm_setup handler for leases
locks: plumb a "priv" pointer into the setlease routines
nfsd: don't keep a pointer to the lease in nfs4_file
locks: clean up vfs_setlease kerneldoc comments
locks: generic_delete_lease doesn't need a file_lock at all
nfsd: fix potential lease memory leak in nfs4_setlease
locks: close potential race in lease_get_mtime
security: make security_file_set_fowner, f_setown and __f_setown void return
locks: consolidate "nolease" routines
locks: remove lock_may_read and lock_may_write
lockd: rip out deferred lock handling from testlock codepath
NFSD: Get reference of lockowner when coping file_lock
...
Pull btrfs updates from Chris Mason:
"The largest set of changes here come from Miao Xie. He's cleaning up
and improving read recovery/repair for raid, and has a number of
related fixes.
I've merged another set of fsync fixes from Filipe, and he's also
improved the way we handle metadata write errors to make sure we force
the FS readonly if things go wrong.
Otherwise we have a collection of fixes and cleanups. Dave Sterba
gets a cookie for removing the most lines (thanks Dave)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (139 commits)
btrfs: Fix compile error when CONFIG_SECURITY is not set.
Btrfs: fix compiles when CONFIG_BTRFS_FS_RUN_SANITY_TESTS is off
btrfs: Make btrfs handle security mount options internally to avoid losing security label.
Btrfs: send, don't delay dir move if there's a new parent inode
btrfs: add more superblock checks
Btrfs: fix race in WAIT_SYNC ioctl
Btrfs: be aware of btree inode write errors to avoid fs corruption
Btrfs: remove redundant btrfs_verify_qgroup_counts declaration.
btrfs: fix shadow warning on cmp
Btrfs: fix compilation errors under DEBUG
Btrfs: fix crash of btrfs_release_extent_buffer_page
Btrfs: add missing end_page_writeback on submit_extent_page failure
btrfs: Fix the wrong condition judgment about subset extent map
Btrfs: fix build_backref_tree issue with multiple shared blocks
Btrfs: cleanup error handling in build_backref_tree
btrfs: move checks for DUMMY_ROOT into a helper
btrfs: new define for the inline extent data start
btrfs: kill extent_buffer_page helper
btrfs: drop constant param from btrfs_release_extent_buffer_page
btrfs: hide typecast to definition of BTRFS_SEND_TRANS_STUB
...
This time it's a relatively calm update batch, but the amount isn't
too small in the end. Here we go over some highlights:
- ALSA core
- One major change is the support of nonatomic PCM operations.
This allows the trigger and other callbacks to call schedule(),
which would be useful for mailbox type communications. Already
some drivers (Digigram ones) have been converted to use together
with threaded irqs as an example.
- Improvement / fixes of DSD PCM format support
- HD-audio
- Large volume of rewrites are found in Realtek codec driver for
converting Dell and HP quirks to generic forms.
- Inverted dmic code cleanup from David.
- Realtek COEF access has been optimized.
- Now HD-audio jack infrastructure allows multiple callbacks, which
fixes / simplifies the jack-dependent power controls on STAC/IDT
and VIA codecs.
- Many additional device-specific fixups as usual
- A few deadcode cleanups, CA0132 code cleanup, etc.
- ASoC
- More componentization work from Lars-Peter, this time mainly
cleaning up the suspend and bias level transition callbacks.
- Real system support for the Intel drivers and a bunch of fixes
and enhancements for the associated CODEC drivers, this is going
to need a lot quirks over time due to the lack of any firmware
description of the boards.
- Jack detect support for simple card from Dylan Reid.
- A bunch of small fixes and enhancements for the Freescale
drivers.
- New drivers for Analog Devices SSM4567, Cirrus Logic CS35L32,
Everest Semiconductor ES8328 and Freescale cards using the ASRC
in newer i.MX processors.
- A few simple-card fixes, mostly cleanups but also a fix for
- interaction between GPIO 0 and simple-card.
- Misc
- Virtuoso / Oxygen updates by Clemens
- USB-audio: Yamaha MOTIF XF MIDI port name fixes
- Conversion of kernel messages to standard dev_*() in ctxfi
driver.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJUNrU8AAoJEGwxgFQ9KSmkxZYQAI7DgkrCx2S1dIHij99jtJGz
FjhFSO/x8Jj0lgXkoCLRHXFgtq3iYjbyS9s0aokIpvAewD9SreVE977DsMqqZVJz
9FPOkv4keuxyJZ46mxJpYswDeazCjEYNFVbkYHhwsCiiyce8HyWMpe38tWrQfwSV
loJYbnEfjpTxFc4JPaQK3pIICRofQCZJonWv20K25pm7L8yG29jtqFsMQWjDCONb
ZVNwnvW61gl6ouuHincGGqVtj8pmkgKlU0l0bMgRNflRqRusrpQdobW56OEoM13H
Tq7xMp5Yxzg7j9sM/QzL+VAksHc1u1aBzg8XZKXjk9PsmH26h1gq98W2BDKQkMzF
U7MQaUks4x+apJcVVDoi5+15AOsyGoxNq9ahc0fe4ADTMSe94or78GaKptWMR+NK
pA2pX2zwvool4TYj+AtcK8SNwfVeBjSua9eNnNpaNTKuwPIX6Vch0O6jaEbQZSaC
92JYhqiC6HsW5tbhN3afTmeHxelBCpQfWPLVtgEl/eIhY3B72/1ZXWCCqwY+Ur8E
D3OCtuAjFnzvzr/gdHZWEnMu3HGt/xqOMVE0EHTQWokQpX2E3IF724YcttAzQakw
wS1ppeWSO5l+TkplqcqurEA7Bq1mN6bO/q9UK+iduIiYmvtNI3fDPTlXXy2SxRUz
QuIEpsIKuZFFumFksQd9
=S4IQ
-----END PGP SIGNATURE-----
Merge tag 'sound-3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"This time it's a relatively calm update batch, but the amount isn't
too small in the end. Here we go over some highlights:
ALSA core:
- One major change is the support of nonatomic PCM operations. This
allows the trigger and other callbacks to call schedule(), which
would be useful for mailbox type communications. Already some
drivers (Digigram ones) have been converted to use together with
threaded irqs as an example.
- Improvement / fixes of DSD PCM format support
HD-audio:
- Large volume of rewrites are found in Realtek codec driver for
converting Dell and HP quirks to generic forms.
- Inverted dmic code cleanup from David.
- Realtek COEF access has been optimized.
- Now HD-audio jack infrastructure allows multiple callbacks, which
fixes / simplifies the jack-dependent power controls on STAC/IDT
and VIA codecs.
- Many additional device-specific fixups as usual
- A few deadcode cleanups, CA0132 code cleanup, etc.
ASoC:
- More componentization work from Lars-Peter, this time mainly
cleaning up the suspend and bias level transition callbacks.
- Real system support for the Intel drivers and a bunch of fixes and
enhancements for the associated CODEC drivers, this is going to
need a lot quirks over time due to the lack of any firmware
description of the boards.
- Jack detect support for simple card from Dylan Reid.
- A bunch of small fixes and enhancements for the Freescale drivers.
- New drivers for Analog Devices SSM4567, Cirrus Logic CS35L32,
Everest Semiconductor ES8328 and Freescale cards using the ASRC in
newer i.MX processors.
- A few simple-card fixes, mostly cleanups but also a fix for
interaction between GPIO 0 and simple-card.
Misc:
- Virtuoso / Oxygen updates by Clemens
- USB-audio: Yamaha MOTIF XF MIDI port name fixes
- Conversion of kernel messages to standard dev_*() in ctxfi driver"
* tag 'sound-3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (251 commits)
ASoC: mc13783: Ensure we only try to dereference valid of_nodes
ASoC: rockchip-i2s: fix infinite loop in rockchip_snd_txctrl
ALSA: hda - Add dock port support to Thinkpad L440 (71aa:501e)
ALSA: Allow pass NULL dev for snd_pci_quirk_lookup()
ASoC: imx-es8328: Fix of_node_put() call with uninitialized object
ASoC: soc-pcm: fix sig_bits determination in soc_pcm_apply_msb()
ASoC: simple-card: Initialize headphone and mic GPIO numbers
ASoC: imx-es8328: Fix missing return code in imx_es8328_probe()
ALSA: hda - Add dock support for Thinkpad T440 (17aa:2212)
ALSA: usb: caiaq: check for cdev->n_streams > 1
ASoC: 88pm860x-codec: Fix possibly missing string termination
ASoC: core: fix use after free in snd_soc_remove_platform()
ASoC: soc-dapm: fix use after free
ALSA: hda - Make the inv dmic handling for Realtek use generic parser
ALSA: hda - Add Inverted Internal mic for Samsung Ativ book 9 (NP900X3G)
ALSA: hda - Add inverted internal mic for Asus Aspire 4830T
ASoC: Intel: byt-rt5640: fix coccinelle warnings
ASoC: fsl_esai doc: Add "fsl,vf610-esai" as compatible string
ASoC: da732x: Remove unnecessary KERN_ERR in pr_err()
ASoC: simple-card: Fix detect gpio documentation.
...
Pull f2fs updates from Jaegeuk Kim:
"This patch-set introduces a couple of new features such as large
sector size, FITRIM, and atomic/volatile writes.
Several patches enhance power-off recovery and checkpoint routines.
The fsck.f2fs starts to support fixing corrupted partitions with
recovery hints provided by this patch-set.
Summary:
- retain some recovery information for fsck.f2fs
- enhance checkpoint speed
- enhance flush command management
- bug fix for lseek
- tune in-place-update policies
- enhance roll-forward speed
- revisit all the roll-forward and fsync rules
- support larget sector size
- support FITRIM
- support atomic and volatile writes
And several clean-ups and bug fixes are included"
* tag 'f2fs-for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (42 commits)
f2fs: support volatile operations for transient data
f2fs: support atomic writes
f2fs: remove unused return value
f2fs: clean up f2fs_ioctl functions
f2fs: potential shift wrapping buf in f2fs_trim_fs()
f2fs: call f2fs_unlock_op after error was handled
f2fs: check the use of macros on block counts and addresses
f2fs: refactor flush_nat_entries to remove costly reorganizing ops
f2fs: introduce FITRIM in f2fs_ioctl
f2fs: introduce cp_control structure
f2fs: use more free segments until SSR is activated
f2fs: change the ipu_policy option to enable combinations
f2fs: fix to search whole dirty segmap when get_victim
f2fs: fix to clean previous mount option when remount_fs
f2fs: skip punching hole in special condition
f2fs: support large sector size
f2fs: fix to truncate blocks past EOF in ->setattr
f2fs: update i_size when __allocate_data_block
f2fs: use MAX_BIO_BLOCKS(sbi)
f2fs: remove redundant operation during roll-forward recovery
...
Apart from the usual cleanups, here is the summary of new features:
- s390 moves closer towards host large page support
- PowerPC has improved support for debugging (both inside the guest and
via gdbstub) and support for e6500 processors
- ARM/ARM64 support read-only memory (which is necessary to put firmware
in emulated NOR flash)
- x86 has the usual emulator fixes and nested virtualization improvements
(including improved Windows support on Intel and Jailhouse hypervisor
support on AMD), adaptive PLE which helps overcommitting of huge guests.
Also included are some patches that make KVM more friendly to memory
hot-unplug, and fixes for rare caching bugs.
Two patches have trivial mm/ parts that were acked by Rik and Andrew.
Note: I will soon switch to a subkey for signing purposes. To verify
future signed pull requests from me, please update my key with
"gpg --recv-keys 9B4D86F2". You should see 3 new subkeys---the
one for signing will be a 2048-bit RSA key, 4E6B09D7.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJUL5sPAAoJEBvWZb6bTYbyfkEP/3MNhSyn6HCjPjtjLNPAl9KL
WpExZSUFL2+4CztpdGIsek1BeJYHmqv3+c5S+WvaWVA1aqh2R7FT1D1ErBLjgLQq
lq23IOr+XxmC3dXQUEEk+TlD+283UzypzEG4l4UD3JYg79fE3UrXAz82SeyewJDY
x7aPYhkZG3RHu+wAyMPasG6E3zS5LySdUtGWbiPwz5BejrhBJoJdeb2WIL/RwnUK
7ppSLB5EoFj/uMkuyeAAdAbdfSrhHA6faDZxNdxS9k9wGutrhhfUoQ49ONrKG4dV
sFo1tSPTVgRs8QFYUZ2fJUPBAmUVddsgqh2K9d0NftGTq7b8YszaCsfFrs2/Y4MU
YxssWEhxsfszerCu12bbAJrv6JBZYQ7TwGvI9L7P0iFU6IVw/djmukU4AkM9/e91
YS/cue/PN+9Pn2ccXzL9J7xRtZb8FsOuRsCXTCmbOwDkLmrKPDBN2t3RUbeF+Eam
ABrpWnLKX13kZSo4LKU+/niarzmPMp7odQfHVdr8ea0fiYLp4iN8puA20WaSPIgd
CLvm+RAvXe5Lm91L4mpFotJ2uFyK6QlIYJV4FsgeWv/0D0qppWQi0Utb/aCNHCgy
z8MyUMD48y7EpoQrFYr/7cddXIu0/NegnM8I1coVjIPEk4NfeebGUlCJ/V3D8wMG
BgEfS2x6jRc5zB3hjwDr
=iEVi
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"Fixes and features for 3.18.
Apart from the usual cleanups, here is the summary of new features:
- s390 moves closer towards host large page support
- PowerPC has improved support for debugging (both inside the guest
and via gdbstub) and support for e6500 processors
- ARM/ARM64 support read-only memory (which is necessary to put
firmware in emulated NOR flash)
- x86 has the usual emulator fixes and nested virtualization
improvements (including improved Windows support on Intel and
Jailhouse hypervisor support on AMD), adaptive PLE which helps
overcommitting of huge guests. Also included are some patches that
make KVM more friendly to memory hot-unplug, and fixes for rare
caching bugs.
Two patches have trivial mm/ parts that were acked by Rik and Andrew.
Note: I will soon switch to a subkey for signing purposes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (157 commits)
kvm: do not handle APIC access page if in-kernel irqchip is not in use
KVM: s390: count vcpu wakeups in stat.halt_wakeup
KVM: s390/facilities: allow TOD-CLOCK steering facility bit
KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode
arm/arm64: KVM: Report correct FSC for unsupported fault types
arm/arm64: KVM: Fix VTTBR_BADDR_MASK and pgd alloc
kvm: Fix kvm_get_page_retry_io __gup retval check
arm/arm64: KVM: Fix set_clear_sgi_pend_reg offset
kvm: x86: Unpin and remove kvm_arch->apic_access_page
kvm: vmx: Implement set_apic_access_page_addr
kvm: x86: Add request bit to reload APIC access page address
kvm: Add arch specific mmu notifier for page invalidation
kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static
kvm: Fix page ageing bugs
kvm/x86/mmu: Pass gfn and level to rmapp callback.
x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only
kvm: x86: use macros to compute bank MSRs
KVM: x86: Remove debug assertion of non-PAE reserved bits
kvm: don't take vcpu mutex for obviously invalid vcpu ioctls
kvm: Faults which trigger IO release the mmap_sem
...
Ensure that it's OK to pass in a NULL file_lock double pointer on
a F_UNLCK request and convert the vfs_setlease F_UNLCK callers to
do just that.
Finally, turn the BUG_ON in generic_setlease into a WARN_ON_ONCE
with an error return. That's a problem we can handle without
crashing the box if it occurs.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
This patch introduces FITRIM in f2fs_ioctl.
In this case, f2fs will issue small discards and prefree discards as many as
possible for the given area.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch add a new data structure to control checkpoint parameters.
Currently, it presents the reason of checkpoint such as is_umount and normal
sync.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
These new-lines add empty lines to trace output
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Callbacks don't have to do extra computation to learn what the caller
(lvm_handle_hva_range()) knows very well. Useful for
debugging/tracing/printk/future.
Signed-off-by: Andres Lagar-Cavilla <andreslc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The tracepoint of extent map doesn't parse @flag correctly, we set @flag via
set_bit(), so we need to parse it on a bit bias.
Also add the missing flag, EXTENT_FLAG_FS_MAPPING.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
Use %pf instead of %p, just same as kernel workqueue tracepoints.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Tracepoint trace_btrfs_normal_work_done never has an user, just cleanup it.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
Kernel workqueue's tracepoints print the address of work_struct, while btrfs
workqueue's tracepoints print the address of btrfs_work.
We need a connection between this two, for example when debuging, we usually
grep an address in the trace output. So it'd be better to also print
work_struct in btrfs workqueue's tracepoint.
Please note that we can only add this into those tracepoints whose work is still
available in memory because we need to reference the work.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
We want this to debug qgroup changes on live systems.
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Currently, we call ioapic_service() immediately when we find the irq is still
active during eoi broadcast. But for real hardware, there's some delay between
the EOI writing and irq delivery. If we do not emulate this behavior, and
re-inject the interrupt immediately after the guest sends an EOI and re-enables
interrupts, a guest might spend all its time in the ISR if it has a broken
handler for a level-triggered interrupt.
Such livelock actually happens with Windows guests when resuming from
hibernation.
As there's no way to recognize the broken handle from new raised ones, this patch
delays an interrupt if 10.000 consecutive EOIs found that the interrupt was
still high. The guest can then make a little forward progress, until a proper
IRQ handler is set or until some detection routine in the guest (such as
Linux's note_interrupt()) recognizes the situation.
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Zhang Haoyu <zhanghy@sangfor.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 96d3fd0d31 (rcu: Break call_rcu() deadlock involving scheduler
and perf) covered the case where __call_rcu_nocb_enqueue() needs to wake
the rcuo kthread due to the queue being initially empty, but did not
do anything for the case where the queue was overflowing. This commit
therefore also defers wakeup for the overflow case.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This patch fix spelling typo found in DocBook/networking.xml.
It is because the neworking.xml is generated from comments
in the source, I have to fix typo in comments within the source.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit adds some statictics in extent status tree shrinker. The
purpose to add these is that we want to collect more details when we
encounter a stall caused by extent status tree shrinker. Here we count
the following statictics:
stats:
the number of all objects on all extent status trees
the number of reclaimable objects on lru list
cache hits/misses
the last sorted interval
the number of inodes on lru list
average:
scan time for shrinking some objects
the number of shrunk objects
maximum:
the inode that has max nr. of objects on lru list
the maximum scan time for shrinking some objects
The output looks like below:
$ cat /proc/fs/ext4/sda1/es_shrinker_info
stats:
28228 objects
6341 reclaimable objects
5281/631 cache hits/misses
586 ms last sorted interval
250 inodes on lru list
average:
153 us scan time
128 shrunk objects
maximum:
255 inode (255 objects, 198 reclaimable)
125723 us max scan time
If the lru list has never been sorted, the following line will not be
printed:
586ms last sorted interval
If there is an empty lru list, the following lines also will not be
printed:
250 inodes on lru list
...
maximum:
255 inode (255 objects, 198 reclaimable)
0 us max scan time
Meanwhile in this commit a new trace point is defined to print some
details in __ext4_es_shrink().
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This commit improves the trace point of extents status tree. We rename
trace_ext4_es_shrink_enter in ext4_es_count() because it is also used
in ext4_es_scan() and we can not identify them from the result.
Further this commit fixes a variable name in trace point in order to
keep consistency with others.
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Pull more powerpc updates from Ben Herrenschmidt:
"Here are some more powerpc bits for 3.17, essentially fixes.
The biggest series, also aimed at -stable, is from Aneesh and is the
result of weeks and weeks of debugging to find out why the heck or THP
implementation was occasionally triggering multi-hit errors in our
level 1 TLB. It ended up being a combination of issues including
subtleties as to how we should invalidate those special 'MPSS' pages
we use to allow the use of 16M pages inside 4K/64K "base page size"
segments (you really have to love our MMU !)
Another interesting one in the "OMG" category is the series from
Michael adding memory barriers to spin_is_locked(). That's also the
result of many days of debugging to figure out why the semaphore code
would occasionally crash in ways that made no sense. It ended up
being some creative lock stacking that was defeated by the fact that
our locks allow a load inside the locked section to be re-ordered with
the load of the lock value itself (I'm still of two mind about whether
to kill that once and for all by putting a heavier barrier back into
our lock implementation...). The fixes come with a long explanation
in the cset comments, feel free to read it if you feel like having a
headache today"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (25 commits)
powerpc/thp: Add tracepoints to track hugepage invalidate
powerpc/mm: Use read barrier when creating real_pte
powerpc/thp: Use ACCESS_ONCE when loading pmdp
powerpc/thp: Invalidate with vpn in loop
powerpc/thp: Handle combo pages in invalidate
powerpc/thp: Invalidate old 64K based hash page mapping before insert of 4k pte
powerpc/thp: Don't recompute vsid and ssize in loop on invalidate
powerpc/thp: Add write barrier after updating the valid bit
powerpc: reorder per-cpu NUMA information's initialization
powerpc/perf/hv-24x7: Use kmem_cache_free
powerpc/pseries/hvcserver: Fix endian issue in hvcs_get_partner_info
powerpc: Hard disable interrupts in xmon
powerpc: remove duplicate definition of TEXASR_FS
powerpc/pseries: Avoid deadlock on removing ddw
powerpc/pseries: Failure on removing device node
powerpc/boot: Use correct zlib types for comparison
powerpc/powernv: Interface to register/unregister opal dump region
printk: Add function to return log buffer address and size
powerpc: Add POWER8 features to CPU_FTRS_POSSIBLE/ALWAYS
powerpc/ppc476: Disable BTAC
...
Pull block driver changes from Jens Axboe:
"Nothing out of the ordinary here, this pull request contains:
- A big round of fixes for bcache from Kent Overstreet, Slava Pestov,
and Surbhi Palande. No new features, just a lot of fixes.
- The usual round of drbd updates from Andreas Gruenbacher, Lars
Ellenberg, and Philipp Reisner.
- virtio_blk was converted to blk-mq back in 3.13, but now Ming Lei
has taken it one step further and added support for actually using
more than one queue.
- Addition of an explicit SG_FLAG_Q_AT_HEAD for block/bsg, to
compliment the the default behavior of adding to the tail of the
queue. From Douglas Gilbert"
* 'for-3.17/drivers' of git://git.kernel.dk/linux-block: (86 commits)
bcache: Drop unneeded blk_sync_queue() calls
bcache: add mutex lock for bch_is_open
bcache: Correct printing of btree_gc_max_duration_ms
bcache: try to set b->parent properly
bcache: fix memory corruption in init error path
bcache: fix crash with incomplete cache set
bcache: Fix more early shutdown bugs
bcache: fix use-after-free in btree_gc_coalesce()
bcache: Fix an infinite loop in journal replay
bcache: fix crash in bcache_btree_node_alloc_fail tracepoint
bcache: bcache_write tracepoint was crashing
bcache: fix typo in bch_bkey_equal_header
bcache: Allocate bounce buffers with GFP_NOWAIT
bcache: Make sure to pass GFP_WAIT to mempool_alloc()
bcache: fix uninterruptible sleep in writeback thread
bcache: wait for buckets when allocating new btree root
bcache: fix crash on shutdown in passthrough mode
bcache: fix lockdep warnings on shutdown
bcache allocator: send discards with correct size
bcache: Fix to remove the rcu_sched stalls.
...
Add tracepoint to track hugepage invalidate. This help us
in debugging difficult to track bugs.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
arm and arm64 architectures. It required some minor updates to the generic
tracepoint system, so it had to wait for me to implement them.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJT5N6DAAoJEKQekfcNnQGuv60H/2NXDO/kUtvdF0L7ewaGbDaO
sjGOXMHDDgF4fQixPsIYNHdra0iGSPL59NBjIaLsESFsB8SUOVqXSclV0MSiZJQc
1PgTduE19p2kEMsqw6F4l8Ir8hPrUT8V8pQScR9lUkww3ANpyTB6Bbg1rZHcmTYA
yAq20q85rfQrAGwbvvhg40UYF8/su0FMUAbt/a180kVL8yeQI2liAkNOJTMCVq35
PpL7if4dlqAhKMqne71ae080PIPOH34q2lmZX3/SbpRvT2tSkS4dkoSFtCAD4pvx
c2TKNOxEDDWlinN/305PXH2yQ87MTIm44SBaTu/WPllUSQoO//EKI7+13tNS8Qc=
=/VeP
-----END PGP SIGNATURE-----
Merge tag 'trace-ipi-tracepoints' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull IPI tracepoints for ARM from Steven Rostedt:
"Nicolas Pitre added generic tracepoints for tracing IPIs and updated
the arm and arm64 architectures. It required some minor updates to
the generic tracepoint system, so it had to wait for me to implement
them"
* tag 'trace-ipi-tracepoints' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ARM64: add IPI tracepoints
ARM: add IPI tracepoints
tracepoint: add generic tracepoint definitions for IPI tracing
tracing: Do not do anything special with tracepoint_string when tracing is disabled
The Inter Processor Interrupt is used to make another processor do a
specific action such as rescheduling tasks, signal a timer event or
execute something in another CPU's context. IRQs are already traceable
but IPIs were not. Tracing them is useful for monitoring IPI latency,
or to verify when they are the source of CPU wake-ups with power
management implications.
Three trace hooks are defined: ipi_raise, ipi_entry and ipi_exit. To make
them portable, a string is used to identify them and correlate related
events. Additionally, ipi_raise records a bitmask representing targeted
CPUs.
Link: http://lkml.kernel.org/p/1406318733-26754-3-git-send-email-nicolas.pitre@linaro.org
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
they had small conflicts (respectively within KVM documentation,
and with 3.16-rc changes). Since they were all within the subsystem,
I took care of them.
Stephen Rothwell reported some snags in PPC builds, but they are all
fixed now; the latest linux-next report was clean.
New features for ARM include:
- KVM VGIC v2 emulation on GICv3 hardware
- Big-Endian support for arm/arm64 (guest and host)
- Debug Architecture support for arm64 (arm32 is on Christoffer's todo list)
And for PPC:
- Book3S: Good number of LE host fixes, enable HV on LE
- Book3S HV: Add in-guest debug support
This release drops support for KVM on the PPC440. As a result, the
PPC merge removes more lines than it adds. :)
I also included an x86 change, since Davidlohr tied it to an independent
bug report and the reporter quickly provided a Tested-by; there was no
reason to wait for -rc2.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJT4iIJAAoJEBvWZb6bTYbyZqoP/3Wxy8NWPFJ8HGt81NHlGnDS
a9UbL7EibcOEG+aaKqmtBglTD5YDiGBDNCxxiSJaDHt+grLN4fsWIliJob1nJFoO
90f89EWN2XjeCrJXA5nUoeg5tpc5OoYKsiP6pTgzIwkP8vvs/H1+zpcTS/UmYsr/
qipVMMsM+zZeHWZcSbqjW88z7YqIn1sr5282wJ85cbyv4KGizb/G4dyPuDqLb6np
hkAD8Ah6VV2suQ2FSy7G2fg20R0vglUi60hkEHLoCBPVqJCl7SmC8MvxNbjBnP8S
J36R0R0u1wHYKzAGooLJGVOZ/o/gSiVqKX+++L2EvJBN+kuA6u/7fxLyBT+LwDAE
IF/Aln5rpg1fe+eywvhz86WljTVEQ8bO1zVsIQUPY+/ZOPedZHMwyvXft8ogbjSp
2m9OJ/3e8Aggh0OeHpCDoeow+QDUXvX0YdCw+2Yh0p+7VMXqkyp0QEiBu38jrusC
rB3VNifJbDSWLKdG9LfCAPHnxZD2XYEwv2WFBo6KQOGMGHfx0GXpCOL/jQihrhA6
HtEG5Bs3lvnHQemdpUZ58xojiABbMaUPdcnPXQQEp23WhZzrfLMLzqVG0VYnhSsC
9pi7MJj8c31rqx5WU2oRM28i/BvNxN0NCtkDpineO5s3f89Ws1xnwxqlm38AKP0J
irJQTYFEqec+GM9JK1rG
=hyQP
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull second round of KVM changes from Paolo Bonzini:
"Here are the PPC and ARM changes for KVM, which I separated because
they had small conflicts (respectively within KVM documentation, and
with 3.16-rc changes). Since they were all within the subsystem, I
took care of them.
Stephen Rothwell reported some snags in PPC builds, but they are all
fixed now; the latest linux-next report was clean.
New features for ARM include:
- KVM VGIC v2 emulation on GICv3 hardware
- Big-Endian support for arm/arm64 (guest and host)
- Debug Architecture support for arm64 (arm32 is on Christoffer's todo list)
And for PPC:
- Book3S: Good number of LE host fixes, enable HV on LE
- Book3S HV: Add in-guest debug support
This release drops support for KVM on the PPC440. As a result, the
PPC merge removes more lines than it adds. :)
I also included an x86 change, since Davidlohr tied it to an
independent bug report and the reporter quickly provided a Tested-by;
there was no reason to wait for -rc2"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (122 commits)
KVM: Move more code under CONFIG_HAVE_KVM_IRQFD
KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use
KVM: nVMX: Fix nested vmexit ack intr before load vmcs01
KVM: PPC: Enable IRQFD support for the XICS interrupt controller
KVM: Give IRQFD its own separate enabling Kconfig option
KVM: Move irq notifier implementation into eventfd.c
KVM: Move all accesses to kvm::irq_routing into irqchip.c
KVM: irqchip: Provide and use accessors for irq routing table
KVM: Don't keep reference to irq routing table in irqfd struct
KVM: PPC: drop duplicate tracepoint
arm64: KVM: fix 64bit CP15 VM access for 32bit guests
KVM: arm64: GICv3: mandate page-aligned GICV region
arm64: KVM: GICv3: move system register access to msr_s/mrs_s
KVM: PPC: PR: Handle FSCR feature deselects
KVM: PPC: HV: Remove generic instruction emulation
KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr
KVM: PPC: Remove DCR handling
KVM: PPC: Expose helper functions for data/inst faults
KVM: PPC: Separate loadstore emulation from priv emulation
KVM: PPC: Handle magic page in kvmppc_ld/st
...
Merge incoming from Andrew Morton:
- Various misc things.
- arch/sh updates.
- Part of ocfs2. Review is slow.
- Slab updates.
- Most of -mm.
- printk updates.
- lib/ updates.
- checkpatch updates.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (226 commits)
checkpatch: update $declaration_macros, add uninitialized_var
checkpatch: warn on missing spaces in broken up quoted
checkpatch: fix false positives for --strict "space after cast" test
checkpatch: fix false positive MISSING_BREAK warnings with --file
checkpatch: add test for native c90 types in unusual order
checkpatch: add signed generic types
checkpatch: add short int to c variable types
checkpatch: add for_each tests to indentation and brace tests
checkpatch: fix brace style misuses of else and while
checkpatch: add --fix option for a couple OPEN_BRACE misuses
checkpatch: use the correct indentation for which()
checkpatch: add fix_insert_line and fix_delete_line helpers
checkpatch: add ability to insert and delete lines to patch/file
checkpatch: add an index variable for fixed lines
checkpatch: warn on break after goto or return with same tab indentation
checkpatch: emit a warning on file add/move/delete
checkpatch: add test for commit id formatting style in commit log
checkpatch: emit fewer kmalloc_array/kcalloc conversion warnings
checkpatch: improve "no space after cast" test
checkpatch: allow multiple const * types
...
There've been many updates in ASoC side at this time, especially the
framework enhancement for multiple CODECs on a single DAI and more
componentization works. The only major change in ALSA core is the
addition of timestamp type in sw_params field. This should behave in
backward compatible way. Other than that, there are lots of small
changes and new drivers in wide range, including a large code cut in
HD-audio driver for deprecated static quirks. Some highlights are
below:
ALSA Core:
- Add the new timestamp type field to sw_params to choose
MONOTONIC_RAW type
HD-audio:
- Continued conversion to standard printk macros, generic code
cleanups
- Removal of obsoleted static quirk codes for Conexant and C-Media
codecs
- Fixups for HP Envy TS, Dell XPS 15, HP and Dell mute/mic LED,
Gigabyte BXBT-2807 mobo
- Intel Braswell support
ASoC:
- Support for multiple CODECs attached to a single DAI, enabling
systems with for example multiple DAC/speaker drivers on a single
link, contributed by Benoit Cousson based on work from Misael Lopez
Cruz
- Support for byte controls larger than 256 bytes based on the use of
TLVs contributed by Omair Mohammed Abdullah
- More componentisation work from Lars-Peter Clausen
- The remainder of the conversions of CODEC drivers to params_width()
by Mark Brown
- Drivers for Cirrus Logic CS4265, Freescale i.MX ASRC blocks, Realtek
RT286 and RT5670, Rockchip RK3xxx I2S controllers and Texas
Instruments TAS2552
- Lots of updates and fixes, especially to the DaVinci, Intel,
Freescale, Realtek, and rcar drivers
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJT4fj0AAoJEGwxgFQ9KSmkXQ0QAIiRmVg40aiJoEdOLGgzNZtq
r/nXj69AuB6JSy0hKbFyyijjCcRpyCCGvjDYlogjT75M3c35Npz/m85oZHx2tajD
SB5OA+QxO4EQ3C0GjITIRHJROm4MM8/rnbnNYTsWnEGRkobTFTl0rHbSkA85RGFt
0zZqqs1R0s/nO9PMQ+5PA5x9xVFiZs2COeCK0CFA9s2ACf/hbxJBRIqYpIFWOo78
9L41jBOFuC/hIb4qwjgmsCWbKe1KQysTAf+Wty0CKipJ6VhfCbPn1Qn1zXGeUOxc
mj4eZ6LpJTrVMr/UN02c5vgPOiaBrQ7fWZo3dVHLlIjC6cEI1tUvNYAin7CMEzx8
DUsvo9p30OheA+ijc9wKaYFY6YmmJZRtpnnMd39i0oPG+bhvoV7vjXjJSB1sLJt1
o82xLpVL4Th8H+DMDVwA7UIBvvZGZBusw1qsNGfcOPrmExi4ScGhA0gSOO6W2y1z
VQLRbiXB/HtJGxeqWL6RqJOcLBOlJNmsk4UZMOSCu2OZrWd5I8MuRrNWeHDqhX1H
+VDEJVhFmM21vMpnobzEPxWsMgTVIAVf3Thh+WgaPxL4Krh0vkpZsgZk16VVmy/o
OJJF3n41FND4n9zSjOe4MkuL8UCOUpKCaBdqj9K1s6UKwOEKuDNslyT/zqutRWK5
x1uApU5y+E4iQT/b7cmA
=RL72
-----END PGP SIGNATURE-----
Merge tag 'sound-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"There've been many updates in ASoC side at this time, especially the
framework enhancement for multiple CODECs on a single DAI and more
componentization works.
The only major change in ALSA core is the addition of timestamp type
in sw_params field. This should behave in backward compatible way.
Other than that, there are lots of small changes and new drivers in
wide range, including a large code cut in HD-audio driver for
deprecated static quirks. Some highlights are below:
ALSA Core:
- Add the new timestamp type field to sw_params to choose
MONOTONIC_RAW type
HD-audio:
- Continued conversion to standard printk macros, generic code
cleanups
- Removal of obsoleted static quirk codes for Conexant and C-Media
codecs
- Fixups for HP Envy TS, Dell XPS 15, HP and Dell mute/mic LED,
Gigabyte BXBT-2807 mobo
- Intel Braswell support
ASoC:
- Support for multiple CODECs attached to a single DAI, enabling
systems with for example multiple DAC/speaker drivers on a single
link, contributed by Benoit Cousson based on work from Misael Lopez
Cruz
- Support for byte controls larger than 256 bytes based on the use of
TLVs contributed by Omair Mohammed Abdullah
- More componentisation work from Lars-Peter Clausen
- The remainder of the conversions of CODEC drivers to params_width()
by Mark Brown
- Drivers for Cirrus Logic CS4265, Freescale i.MX ASRC blocks,
Realtek RT286 and RT5670, Rockchip RK3xxx I2S controllers and Texas
Instruments TAS2552
- Lots of updates and fixes, especially to the DaVinci, Intel,
Freescale, Realtek, and rcar drivers"
* tag 'sound-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (402 commits)
ALSA: usb-audio: Whitespace cleanups for sound/usb/midi.*
ALSA: usb-audio: Respond to suspend and resume callbacks for MIDI input
sound/oss/pss: Remove typedefs pss_mixerdata and pss_confdata
sound/oss/opl3: Remove typedef opl_devinfo
ALSA: fireworks: fix specifiers in format strings for propper output
ASoC: imx-audmux: Use uintptr_t for port numbers
ASoC: davinci: Enable menuconfig entry for McASP
ASoC: fsl_asrc: Don't access members of config before checking it
ASoC: fsl_sarc_dma: Check pair before using it
ASoC: adau1977: Fix truncation warning on 64 bit architectures
ALSA: virtuoso: add Xonar Essence STX II support
ALSA: riptide: fix %d confusingly prefixed with 0x in format strings
ALSA: fireworks: fix %d confusingly prefixed with 0x in format strings
ALSA: hda - add codec ID for Braswell display audio codec
ALSA: hda - add PCI IDs for Intel Braswell
ALSA: usb-audio: Adjust Gamecom 780 volume level
ALSA: usb-audio: improve dmesg source grepability
ASoC: rt5670: Fix duplicate const warnings
ASoC: rt5670: Staticise non-exported symbols
ASoC: Intel: update stream only on stream IPC msgs
...
This was formerly the series "Improve sequential read throughput" which
noted some major differences in performance of tiobench since 3.0.
While there are a number of factors, two that dominated were the
introduction of the fair zone allocation policy and changes to CFQ.
The behaviour of fair zone allocation policy makes more sense than
tiobench as a benchmark and CFQ defaults were not changed due to
insufficient benchmarking.
This series is what's left. It's one functional fix to the fair zone
allocation policy when used on NUMA machines and a reduction of overhead
in general. tiobench was used for the comparison despite its flaws as
an IO benchmark as in this case we are primarily interested in the
overhead of page allocator and page reclaim activity.
On UMA, it makes little difference to overhead
3.16.0-rc3 3.16.0-rc3
vanilla lowercost-v5
User 383.61 386.77
System 403.83 401.74
Elapsed 5411.50 5413.11
On a 4-socket NUMA machine it's a bit more noticable
3.16.0-rc3 3.16.0-rc3
vanilla lowercost-v5
User 746.94 802.00
System 65336.22 40852.33
Elapsed 27553.52 27368.46
This patch (of 6):
The LRU insertion and activate tracepoints take PFN as a parameter
forcing the overhead to the caller. Move the overhead to the tracepoint
fast-assign method to ensure the cost is only incurred when the
tracepoint is active.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The mm_migrate_pages trace event reports a reason for the migration,
typically as a symbolic string. The exception is the reason
MR_NUMA_MISPLACED for which it just displays the numeric value:
mm_migrate_pages: nr_succeeded=1 nr_failed=0 mode=MIGRATE_ASYNC
reason=0x5
This patch makes the output consistent by introducing a string value for
MR_NUMA_MISPLACED. The event is then reported as: mm_migrate_pages:
nr_succeeded=1 nr_failed=0 mode=MIGRATE_ASYNC reason=numa_misplaced
Signed-off-by: Max Asbock <masbock@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commits e4d57e1ee1 (KVM: Move irq notifier implementation into
eventfd.c, 2014-06-30) included the irq notifier code unconditionally
in eventfd.c, while it was under CONFIG_HAVE_KVM_IRQCHIP before.
Similarly, commit 297e21053a (KVM: Give IRQFD its own separate enabling
Kconfig option, 2014-06-30) moved code from CONFIG_HAVE_IRQ_ROUTING
to CONFIG_HAVE_KVM_IRQFD but forgot to move the pieces that used to be
under CONFIG_HAVE_KVM_IRQCHIP.
Together, this broke compilation without CONFIG_KVM_XICS. Fix by adding
or changing the #ifdefs so that they point at CONFIG_HAVE_KVM_IRQFD.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull f2fs updates from Jaegeuk Kim:
"This series includes patches to:
- add nobarrier mount option
- support tmpfile and rename2
- enhance the fdatasync behavior
- fix the error path
- fix the recovery routine
- refactor a part of the checkpoint procedure
- reduce some lock contentions"
* tag 'for-f2fs-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits)
f2fs: use for_each_set_bit to simplify the code
f2fs: add f2fs_balance_fs for expand_inode_data
f2fs: invalidate xattr node page when evict inode
f2fs: avoid skipping recover_inline_xattr after recover_inline_data
f2fs: add tracepoint for f2fs_direct_IO
f2fs: reduce competition among node page writes
f2fs: fix coding style
f2fs: remove redundant lines in allocate_data_block
f2fs: add tracepoint for f2fs_issue_flush
f2fs: avoid retrying wrong recovery routine when error was occurred
f2fs: test before set/clear bits
f2fs: fix wrong condition for unlikely
f2fs: enable in-place-update for fdatasync
f2fs: skip unnecessary data writes during fsync
f2fs: add info of appended or updated data writes
f2fs: use radix_tree for ino management
f2fs: add infra for ino management
f2fs: punch the core function for inode management
f2fs: add nobarrier mount option
f2fs: fix to put root inode in error path of fill_super
...
Here's the big driver-core pull request for 3.17-rc1.
Largest thing in here is the dma-buf rework and fence code, that touched
many different subsystems so it was agreed it should go through this
tree to handle merge issues. There's also some firmware loading
updates, as well as tests added, and a few other tiny changes, the
changelog has the details.
All have been in linux-next for a long time.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlPf1XcACgkQMUfUDdst+ylREACdHLXBa02yLrRzbrONJ+nARuFv
JuQAoMN49PD8K9iMQpXqKBvZBsu+iCIY
=w8OJ
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here's the big driver-core pull request for 3.17-rc1.
Largest thing in here is the dma-buf rework and fence code, that
touched many different subsystems so it was agreed it should go
through this tree to handle merge issues. There's also some firmware
loading updates, as well as tests added, and a few other tiny changes,
the changelog has the details.
All have been in linux-next for a long time"
* tag 'driver-core-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (32 commits)
ARM: imx: Remove references to platform_bus in mxc code
firmware loader: Fix _request_firmware_load() return val for fw load abort
platform: Remove most references to platform_bus device
test: add firmware_class loader test
doc: fix minor typos in firmware_class README
staging: android: Cleanup style issues
Documentation: devres: Sort managed interfaces
Documentation: devres: Add devm_kmalloc() et al
fs: debugfs: remove trailing whitespace
kernfs: kernel-doc warning fix
debugfs: Fix corrupted loop in debugfs_remove_recursive
stable_kernel_rules: Add pointer to netdev-FAQ for network patches
driver core: platform: add device binding path 'driver_override'
driver core/platform: remove unused implicit padding in platform_object
firmware loader: inform direct failure when udev loader is disabled
firmware: replace ALIGN(PAGE_SIZE) by PAGE_ALIGN
firmware: read firmware size using i_size_read()
firmware loader: allow disabling of udev as firmware loader
reservation: add suppport for read-only access using rcu
reservation: update api and add some helpers
...
Conflicts:
drivers/base/platform.c
Pull RAS updates from Ingo Molnar:
"The main changes in this cycle are:
- RAS tracing/events infrastructure, by Gong Chen.
- Various generalizations of the APEI code to make it available to
non-x86 architectures, by Tomasz Nowicki"
* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ras: Fix build warnings in <linux/aer.h>
acpi, apei, ghes: Factor out ioremap virtual memory for IRQ and NMI context.
acpi, apei, ghes: Make NMI error notification to be GHES architecture extension.
apei, mce: Factor out APEI architecture specific MCE calls.
RAS, extlog: Adjust init flow
trace, eMCA: Add a knob to adjust where to save event log
trace, RAS: Add eMCA trace event interface
RAS, debugfs: Add debugfs interface for RAS subsystem
CPER: Adjust code flow of some functions
x86, MCE: Robustify mcheck_init_device
trace, AER: Move trace into unified interface
trace, RAS: Add basic RAS trace event
x86, MCE: Kill CPU_POST_DEAD
Pull x86 mm changes from Ingo Molnar:
"The main change in this cycle is the rework of the TLB range flushing
code, to simplify, fix and consolidate the code. By Dave Hansen"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Set TLB flush tunable to sane value (33)
x86/mm: New tunable for single vs full TLB flush
x86/mm: Add tracepoints for TLB flushes
x86/mm: Unify remote INVLPG code
x86/mm: Fix missed global TLB flush stat
x86/mm: Rip out complicated, out-of-date, buggy TLB flushing
x86/mm: Clean up the TLB flushing code
x86/smep: Be more informative when signalling an SMEP fault
We don't have any good way to figure out what kinds of flushes
are being attempted. Right now, we can try to use the vm
counters, but those only tell us what we actually did with the
hardware (one-by-one vs full) and don't tell us what was actually
_requested_.
This allows us to select out "interesting" TLB flushes that we
might want to optimize (like the ranged ones) and ignore the ones
that we have very little control over (the ones at context
switch).
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140731154059.4C96CBA5@viggo.jf.intel.com
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Create a new event to trace when the temperature is above a trip
point. Use the trace-point when handling non-critical and critical
trip pionts.
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Introduce and use an event to trace when a cooling device's state is
updated. This is useful to follow the effect of governor decisions on
cooling devices.
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Create a new event to trace the temperature of a thermal zone. Using
this event trace the temperature changes of the thermal zone every-time
it is updated.
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
A fence can be attached to a buffer which is being filled or consumed
by hw, to allow userspace to pass the buffer without waiting to another
device. For example, userspace can call page_flip ioctl to display the
next frame of graphics after kicking the GPU but while the GPU is still
rendering. The display device sharing the buffer with the GPU would
attach a callback to get notified when the GPU's rendering-complete IRQ
fires, to update the scan-out address of the display, without having to
wake up userspace.
A driver must allocate a fence context for each execution ring that can
run in parallel. The function for this takes an argument with how many
contexts to allocate:
+ fence_context_alloc()
A fence is transient, one-shot deal. It is allocated and attached
to one or more dma-buf's. When the one that attached it is done, with
the pending operation, it can signal the fence:
+ fence_signal()
To have a rough approximation whether a fence is fired, call:
+ fence_is_signaled()
The dma-buf-mgr handles tracking, and waiting on, the fences associated
with a dma-buf.
The one pending on the fence can add an async callback:
+ fence_add_callback()
The callback can optionally be cancelled with:
+ fence_remove_callback()
To wait synchronously, optionally with a timeout:
+ fence_wait()
+ fence_wait_timeout()
When emitting a fence, call:
+ trace_fence_emit()
To annotate that a fence is blocking on another fence, call:
+ trace_fence_annotate_wait_on(fence, on_fence)
A default software-only implementation is provided, which can be used
by drivers attaching a fence to a buffer when they have no other means
for hw sync. But a memory backed fence is also envisioned, because it
is common that GPU's can write to, or poll on some memory location for
synchronization. For example:
fence = custom_get_fence(...);
if ((seqno_fence = to_seqno_fence(fence)) != NULL) {
dma_buf *fence_buf = seqno_fence->sync_buf;
get_dma_buf(fence_buf);
... tell the hw the memory location to wait ...
custom_wait_on(fence_buf, seqno_fence->seqno_ofs, fence->seqno);
} else {
/* fall-back to sw sync * /
fence_add_callback(fence, my_cb);
}
On SoC platforms, if some other hw mechanism is provided for synchronizing
between IP blocks, it could be supported as an alternate implementation
with it's own fence ops in a similar way.
enable_signaling callback is used to provide sw signaling in case a cpu
waiter is requested or no compatible hardware signaling could be used.
The intention is to provide a userspace interface (presumably via eventfd)
later, to be used in conjunction with dma-buf's mmap support for sw access
to buffers (or for userspace apps that would prefer to do their own
synchronization).
v1: Original
v2: After discussion w/ danvet and mlankhorst on #dri-devel, we decided
that dma-fence didn't need to care about the sw->hw signaling path
(it can be handled same as sw->sw case), and therefore the fence->ops
can be simplified and more handled in the core. So remove the signal,
add_callback, cancel_callback, and wait ops, and replace with a simple
enable_signaling() op which can be used to inform a fence supporting
hw->hw signaling that one or more devices which do not support hw
signaling are waiting (and therefore it should enable an irq or do
whatever is necessary in order that the CPU is notified when the
fence is passed).
v3: Fix locking fail in attach_fence() and get_fence()
v4: Remove tie-in w/ dma-buf.. after discussion w/ danvet and mlankorst
we decided that we need to be able to attach one fence to N dma-buf's,
so using the list_head in dma-fence struct would be problematic.
v5: [ Maarten Lankhorst ] Updated for dma-bikeshed-fence and dma-buf-manager.
v6: [ Maarten Lankhorst ] I removed dma_fence_cancel_callback and some comments
about checking if fence fired or not. This is broken by design.
waitqueue_active during destruction is now fatal, since the signaller
should be holding a reference in enable_signalling until it signalled
the fence. Pass the original dma_fence_cb along, and call __remove_wait
in the dma_fence_callback handler, so that no cleanup needs to be
performed.
v7: [ Maarten Lankhorst ] Set cb->func and only enable sw signaling if
fence wasn't signaled yet, for example for hardware fences that may
choose to signal blindly.
v8: [ Maarten Lankhorst ] Tons of tiny fixes, moved __dma_fence_init to
header and fixed include mess. dma-fence.h now includes dma-buf.h
All members are now initialized, so kmalloc can be used for
allocating a dma-fence. More documentation added.
v9: Change compiler bitfields to flags, change return type of
enable_signaling to bool. Rework dma_fence_wait. Added
dma_fence_is_signaled and dma_fence_wait_timeout.
s/dma// and change exports to non GPL. Added fence_is_signaled and
fence_enable_sw_signaling calls, add ability to override default
wait operation.
v10: remove event_queue, use a custom list, export try_to_wake_up from
scheduler. Remove fence lock and use a global spinlock instead,
this should hopefully remove all the locking headaches I was having
on trying to implement this. enable_signaling is called with this
lock held.
v11:
Use atomic ops for flags, lifting the need for some spin_lock_irqsaves.
However I kept the guarantee that after fence_signal returns, it is
guaranteed that enable_signaling has either been called to completion,
or will not be called any more.
Add contexts and seqno to base fence implementation. This allows you
to wait for less fences, by testing for seqno + signaled, and then only
wait on the later fence.
Add FENCE_TRACE, FENCE_WARN, and FENCE_ERR. This makes debugging easier.
An CONFIG_DEBUG_FENCE will be added to turn off the FENCE_TRACE
spam, and another runtime option can turn it off at runtime.
v12:
Add CONFIG_FENCE_TRACE. Add missing documentation for the fence->context
and fence->seqno members.
v13:
Fixup CONFIG_FENCE_TRACE kconfig description.
Move fence_context_alloc to fence.
Simplify fence_later.
Kill priv member to fence_cb.
v14:
Remove priv argument from fence_add_callback, oops!
v15:
Remove priv from documentation.
Explicitly include linux/atomic.h.
v16:
Add trace events.
Import changes required by android syncpoints.
v17:
Use wake_up_state instead of try_to_wake_up. (Colin Cross)
Fix up commit description for seqno_fence. (Rob Clark)
v18:
Rename release_fence to fence_release.
Move to drivers/dma-buf/.
Rename __fence_is_signaled and __fence_signal to *_locked.
Rename __fence_init to fence_init.
Make fence_default_wait return a signed long, and fix wait ops too.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com> #use smp_mb__before_atomic()
Acked-by: Sumit Semwal <sumit.semwal@linaro.org>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
AER uses a separate trace interface by now. To make it
consistent, move it into unified RAS trace interface.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The component struct already has a name and id field which are initialized to
the same values as the same fields in the CODEC and platform structs. So remove
them from the CODEC and platform structs and used the ones from the component
struct instead.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
Currently the __field() macro in TRACE_EVENT is only good for primitive
values, such as integers and pointers, but it fails on complex data types
such as structures or unions. This is because the __field() macro
determines if the variable is signed or not with the test of:
(((type)(-1)) < (type)1)
Unfortunately, that fails when type is a structure.
Since trace events should support structures as fields a new macro
is created for such a case called __field_struct() which acts exactly
the same as __field() does but it does not do the signed type check
and just uses a constant false for that answer.
Cc: Tony Luck <tony.luck@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
syscall_regfunc() and syscall_unregfunc() should set/clear
TIF_SYSCALL_TRACEPOINT system-wide, but do_each_thread() can race
with copy_process() and miss the new child which was not added to
the process/thread lists yet.
Change copy_process() to update the child's TIF_SYSCALL_TRACEPOINT
under tasklist.
Link: http://lkml.kernel.org/p/20140413185854.GB20668@redhat.com
Cc: stable@vger.kernel.org # 2.6.33
Fixes: a871bd33a6 "tracing: Add syscall tracepoints"
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Pull more scheduler updates from Ingo Molnar:
"Second round of scheduler changes:
- try-to-wakeup and IPI reduction speedups, from Andy Lutomirski
- continued power scheduling cleanups and refactorings, from Nicolas
Pitre
- misc fixes and enhancements"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Delete extraneous extern for to_ratio()
sched/idle: Optimize try-to-wake-up IPI
sched/idle: Simplify wake_up_idle_cpu()
sched/idle: Clear polling before descheduling the idle thread
sched, trace: Add a tracepoint for IPI-less remote wakeups
cpuidle: Set polling in poll_idle
sched: Remove redundant assignment to "rt_rq" in update_curr_rt(...)
sched: Rename capacity related flags
sched: Final power vs. capacity cleanups
sched: Remove remaining dubious usage of "power"
sched: Let 'struct sched_group_power' care about CPU capacity
sched/fair: Disambiguate existing/remaining "capacity" usage
sched/fair: Change "has_capacity" to "has_free_capacity"
sched/fair: Remove "power" from 'struct numa_stats'
sched: Fix signedness bug in yield_to()
sched/fair: Use time_after() in record_wakee()
sched/balancing: Reduce the rate of needless idle load balancing
sched/fair: Fix unlocked reads of some cfs_b->quota/period
- I didn't remember correctly that the Hans de Goede's ACPI video
patches actually didn't flip the video.use_native_backlight
default, although we had discussed that and decided to do that.
Since I said we would do that in the previous PM+ACPI pull
request, make that change for real now.
- ACPI bus check notifications for PCI host bridges don't cause
the bus below the host bridge to be checked for changes as they
should because of a mistake in the ACPI-based PCI hotplug (ACPIPHP)
subsystem that forgets to add hotplug contexts to PCI host bridge
ACPI device objects. Create hotplug contexts for PCI host bridges
too as appropriate.
- Revert recent cpufreq commit related to the big.LITTLE cpufreq
driver that breaks arm64 builds.
- Fix for a regression in the ppc-corenet cpufreq driver introduced
during the 3.15 cycle and causing the driver to use the remainder
from do_div instead of the quotient. From Ed Swarthout.
- Resets triggered by panic activate a BUG_ON() in vmalloc.c on
systems where the ACPI reset register is located in memory address
space. Fix from Randy Wright.
- Fix for a problem with cpufreq governors that decisions made by
them may be suboptimal due to the fact that deferrable timers are
used by them for CPU load sampling. From Srivatsa S Bhat.
- Fix for a problem with the Tegra cpufreq driver where the CPU
frequency is temporarily switched to a "stable" level that
is different from both the initial and target frequencies
during transitions which causes udelay() to expire earlier than
it should sometimes. From Viresh Kumar.
- New trace points and rework of some existing trace points for
system suspend/resume profiling from Todd Brandt.
- Assorted cpufreq fixes and cleanups from Stratos Karafotis and
Viresh Kumar.
- Copyright notice update for suspend-and-cpuhotplug.txt from
Srivatsa S Bhat.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJTmeBNAAoJEILEb/54YlRxFo0QAIfp74wZO9ZPcrR+6IO1AEUb
1qcVJYMFWvisG2JO9b7DUtxwgWHk8/NMgKv+bYxUAEni95mY7PqDTdJ+Qjk7DinJ
jVo+mzooaQg+KYGQ503YOtqsGhNFM3lE6Jw01wbLytTCetkNCkTgr//7btBbyRKn
13Ut3o2vH9n5EMoe1jql96onJH6AfBDEn7jc5Sk4rGL7MtKAMsWNTNSGVyLFA98l
sghO8ZR0AqnBzoedr1eBxzo6ujUqjfYlIcxowZycpJJVX02eN+KGUbOJao2+6RB+
J6wu/FoPv2VtJkNwSB8IMgZfqceecSIXeWBG5xC22cYbSQ/IDW2k72V+kLHUqd36
LhlYLIsIxJQovqOgPdKeP5o6OVFd4EheWBiCfNBrmYU+x2av6I6ZjTscz3Robaxh
AVG6yU8XR2GOpoVGW/+L7R2jZ1Qse1Io0r93hXvCsSXgMkq9HbueX3mZR605msfe
liDk+fym357cKQUreSH1XF0Q79C1wpEJ6rTz0Qi6ZxkKB+dAYE3oPA+V0+cWSxbK
WqaFjQwPtvrrduvLj5Z+qF/zRu4LXdTxiY59utBek/RoN6zUsMMpwsRCCdBfub2O
alBOHUPRaiUywkQtqu7yP9j7iciNxEn1/tXo97b/1qC3RrOwLWOgd8dhpWe0i0Gp
EmQkie8qCHXw5vCpaeUK
=0lht
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more ACPI and power management updates from Rafael Wysocki:
"These are fixups on top of the previous PM+ACPI pull request,
regression fixes (ACPI hotplug, cpufreq ppc-corenet), other bug fixes
(ACPI reset, cpufreq), new PM trace points for system suspend
profiling and a copyright notice update.
Specifics:
- I didn't remember correctly that the Hans de Goede's ACPI video
patches actually didn't flip the video.use_native_backlight
default, although we had discussed that and decided to do that.
Since I said we would do that in the previous PM+ACPI pull request,
make that change for real now.
- ACPI bus check notifications for PCI host bridges don't cause the
bus below the host bridge to be checked for changes as they should
because of a mistake in the ACPI-based PCI hotplug (ACPIPHP)
subsystem that forgets to add hotplug contexts to PCI host bridge
ACPI device objects. Create hotplug contexts for PCI host bridges
too as appropriate.
- Revert recent cpufreq commit related to the big.LITTLE cpufreq
driver that breaks arm64 builds.
- Fix for a regression in the ppc-corenet cpufreq driver introduced
during the 3.15 cycle and causing the driver to use the remainder
from do_div instead of the quotient. From Ed Swarthout.
- Resets triggered by panic activate a BUG_ON() in vmalloc.c on
systems where the ACPI reset register is located in memory address
space. Fix from Randy Wright.
- Fix for a problem with cpufreq governors that decisions made by
them may be suboptimal due to the fact that deferrable timers are
used by them for CPU load sampling. From Srivatsa S Bhat.
- Fix for a problem with the Tegra cpufreq driver where the CPU
frequency is temporarily switched to a "stable" level that is
different from both the initial and target frequencies during
transitions which causes udelay() to expire earlier than it should
sometimes. From Viresh Kumar.
- New trace points and rework of some existing trace points for
system suspend/resume profiling from Todd Brandt.
- Assorted cpufreq fixes and cleanups from Stratos Karafotis and
Viresh Kumar.
- Copyright notice update for suspend-and-cpuhotplug.txt from
Srivatsa S Bhat"
* tag 'pm+acpi-3.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / hotplug / PCI: Add hotplug contexts to PCI host bridges
PM / sleep: trace events for device PM callbacks
cpufreq: cpufreq-cpu0: remove dependency on THERMAL and REGULATOR
cpufreq: tegra: update comment for clarity
cpufreq: intel_pstate: Remove duplicate CPU ID check
cpufreq: Mark CPU0 driver with CPUFREQ_NEED_INITIAL_FREQ_CHECK flag
PM / Documentation: Update copyright in suspend-and-cpuhotplug.txt
cpufreq: governor: remove copy_prev_load from 'struct cpu_dbs_common_info'
cpufreq: governor: Be friendly towards latency-sensitive bursty workloads
PM / sleep: trace events for suspend/resume
cpufreq: ppc-corenet-cpu-freq: do_div use quotient
Revert "cpufreq: Enable big.LITTLE cpufreq driver on arm64"
cpufreq: Tegra: implement intermediate frequency callbacks
cpufreq: add support for intermediate (stable) frequencies
ACPI / video: Change the default for video.use_native_backlight to 1
ACPI: Fix bug when ACPI reset register is implemented in system memory
Adds two trace events which supply the same info that initcall_debug
provides, but via ftrace instead of dmesg. The existing initcall_debug
calls require the pm_print_times_enabled var to be set (either via
sysfs or via the kernel cmd line). The new trace events provide all the
same info as the initcall_debug prints but with less overhead, and also
with coverage of device prepare and complete device callbacks.
These events replace the device_pm_report_time event (which has been
removed). device_pm_callback_start is called first and provides the device
and callback info. device_pm_callback_end is called after with the
device name and error info. The time and pid are gathered from the trace
data headers.
Signed-off-by: Todd Brandt <todd.e.brandt@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch-set includes the following major enhancement patches.
o enhance wait_on_page_writeback
o support SEEK_DATA and SEEK_HOLE
o enhance readahead flows
o enhance IO flushes
o support fiemap
o add some tracepoints
The other bug fixes are as follows.
o fix to support a large volume > 2TB correctly
o recovery bug fix wrt fallocated space
o fix recursive lock on xattr operations
o fix some cases on the remount flow
And, there are a bunch of cleanups.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTleLYAAoJEEAUqH6CSFDSdhEP/iY5hTZ02jH4ZiFPP/Nee4hS
l0BHeZvrMjoccaWUDu0ZvIPC8BiJ7kLOgK+/VWS7LAfY1PY11ALEtYQOrW+RM47+
jMfULegTod/F8WS2B6dk31QMhAZldtnsYvA5PS1VV3S0rht+qbOz+PDejZFgSMc3
VVQ7OZzq30gMmtsw7oh3FHfeTu4xe/bxygdMRXgljQQD2MvorJiOb4MA+ovEDd9z
CZMMTQvRKjc0d8LPf0gOkZEvG63GR6klCgFMuiappUsua0O8IPIEhCyEGFrE66vS
fObVKpqNAsR2ABhS2grgn6Q7UUvn4xrF6jOwDH5tuw2yzmkQiMAWINwBdgnbEy1c
D5S89PQ8TkQK9KBSoU0v8WKWC4HzWZF4ZEd6eq9VxVTS8iT0w8DtNHXK518FVC34
82iqrLc0EhrcGNiW/7Nrc6WzHrWqorylCFN7atB3ruhVqeMh3MZsDU4Gq0WgB2oh
pF9XVBEpJQpV35DYSAPzLkm+2+mwHVNqwdY3HcvUs7IP90+wZlrWSRG2FEfFt/e8
6nwvbORrHMTI5VfdYlEPgpjuesFmYPZqEvRGdaDa1YrHqhvvgStEPT9qiq2qVn9+
tr0HjpNRDD/etkaE6ujYOYqdxuk3mm6RY68h+KSbNcY1/VTvt2bN2telwWuDsxIq
8yOsxV2x3JB/euDAJsSU
=xqsO
-----END PGP SIGNATURE-----
Merge tag 'for-f2fs-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, there is no special interesting feature, but we've
investigated a couple of tuning points with respect to the I/O flow.
Several major bug fixes and a bunch of clean-ups also have been made.
This patch-set includes the following major enhancement patches:
- enhance wait_on_page_writeback
- support SEEK_DATA and SEEK_HOLE
- enhance readahead flows
- enhance IO flushes
- support fiemap
- add some tracepoints
The other bug fixes are as follows:
- fix to support a large volume > 2TB correctly
- recovery bug fix wrt fallocated space
- fix recursive lock on xattr operations
- fix some cases on the remount flow
And, there are a bunch of cleanups"
* tag 'for-f2fs-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (52 commits)
f2fs: support f2fs_fiemap
f2fs: avoid not to call remove_dirty_inode
f2fs: recover fallocated space
f2fs: fix to recover data written by dio
f2fs: large volume support
f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages
f2fs: avoid overflow when large directory feathure is enabled
f2fs: fix recursive lock by f2fs_setxattr
MAINTAINERS: add a co-maintainer from samsung for F2FS
MAINTAINERS: change the email address for f2fs
f2fs: use inode_init_owner() to simplify codes
f2fs: avoid to use slab memory in f2fs_issue_flush for efficiency
f2fs: add a tracepoint for f2fs_read_data_page
f2fs: add a tracepoint for f2fs_write_{meta,node,data}_pages
f2fs: add a tracepoint for f2fs_write_{meta,node,data}_page
f2fs: add a tracepoint for f2fs_write_end
f2fs: add a tracepoint for f2fs_write_begin
f2fs: fix checkpatch warning
f2fs: deactivate inode page if the inode is evicted
f2fs: decrease the lock granularity during write_begin
...
to help out the rest of the kernel to ease their use of trace events.
The big change for this release is the allowing of other tracers,
such as the latency tracers, to be used in the trace instances and allow
for function or function graph tracing to be in the top level
simultaneously.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJTlbUqAAoJEKQekfcNnQGuP+8H+wTBG06beHsqe6XcaeXcKNkt
Mimm0O04oQdw89CBWeJvXyOwRTtiN4M/4hxHXBTDtChxM9oUyWw1o0IpSMMuQ16O
w9r3DfC8e1air+ufEuYWM0QNtyzHi8EfDSNia55ON5jvtkCZTXOEKZD+n8M9w28p
I7PVgr0PDztsCpethCpg0M8beK9zuQPWMzsHAQCsKI06Xl5z33kPIJR15Exh+Kr1
uVVTZW7JFVAPuSnteLSIx9pN6OjsVGzOZCljg+O+9/v/02u5nkMiS2nURxae86kg
RTSiRYT6Hvl/MCBhdss/w5kgSk6BYiZ0hXbLtwetvre+vQrOR5CnDw2DxZ7e+gU=
=oudH
-----END PGP SIGNATURE-----
Merge tag 'trace-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"Lots of tweaks, small fixes, optimizations, and some helper functions
to help out the rest of the kernel to ease their use of trace events.
The big change for this release is the allowing of other tracers, such
as the latency tracers, to be used in the trace instances and allow
for function or function graph tracing to be in the top level
simultaneously"
* tag 'trace-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits)
tracing: Fix memory leak on instance deletion
tracing: Fix leak of ring buffer data when new instances creation fails
tracing/kprobes: Avoid self tests if tracing is disabled on boot up
tracing: Return error if ftrace_trace_arrays list is empty
tracing: Only calculate stats of tracepoint benchmarks for 2^32 times
tracing: Convert stddev into u64 in tracepoint benchmark
tracing: Introduce saved_cmdlines_size file
tracing: Add __get_dynamic_array_len() macro for trace events
tracing: Remove unused variable in trace_benchmark
tracing: Eliminate double free on failure of allocation on boot up
ftrace/x86: Call text_ip_addr() instead of the duplicated code
tracing: Print max callstack on stacktrace bug
tracing: Move locking of trace_cmdline_lock into start/stop seq calls
tracing: Try again for saved cmdline if failed due to locking
tracing: Have saved_cmdlines use the seq_read infrastructure
tracing: Add tracepoint benchmark tracepoint
tracing: Print nasty banner when trace_printk() is in use
tracing: Add funcgraph_tail option to print function name after closing braces
tracing: Eliminate duplicate TRACE_GRAPH_PRINT_xx defines
tracing: Add __bitmask() macro to trace events to cpumasks and other bitmasks
...
collapse_range and zero_range fallocate functions. In addition,
improve the scalability of adding and remove inodes from the orphan
list.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJTk9x7AAoJENNvdpvBGATwQQ4QAN85xkNWWiq0feLGZjUVTre/
JUgRQWXZYVogAQckQoTDXqJt1qKYxO45A8oIoUMI4uzgcFJm7iJIZJAv3Hjd2ftz
48RVwjWHblmBz6e+CdmETpzJUaJr3KXbnk3EDQzagWg3Q64dBU/yT0c4foBO8wfX
FI1MNin70r5NGQv6Mp4xNUfMoU6liCrsMO2RWkyxY2rcmxy6tkpNO/NBAPwhmn0e
vwKHvnnqKM08Frrt6Lz3MpXGAJ+rhTSvmL+qSRXQn9BcbphdGa4jy+i3HbviRX4N
z77UZMgMbfK1V3YHm8KzmmbIHrmIARXUlCM7jp4HPSnb4qhyERrhVmGCJZ8civ6Q
3Cm9WwA93PQDfRX6Kid3K1tR/ql+ryac55o9SM990osrWp4C0IH+P/CdlSN0GspN
3pJTLHUVVcxF6gSnOD+q/JzM8Iudl87Rxb17wA+6eg3AJRaPoQSPJoqtwZ89ZwOz
RiZGuugFp7gDOxqo32lJ53fivO/e1zxXxu0dVHHjOnHBVWX063hlcibTg8kvFWg1
7bBvUkvgT5jR+UuDX81wPZ+c0kkmfk4gxT5sHg6RlMKeCYi3uuLmAYgla3AM4j9G
GeNNdVTmilH7wMgYB2wxd0C5HofgKgM5YFLZWc0FVSXMeFs5ST2kbLMXAZqzrKPa
szHFEJHIGZByXfkP/jix
=C1ZV
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Clean ups and miscellaneous bug fixes, in particular for the new
collapse_range and zero_range fallocate functions. In addition,
improve the scalability of adding and remove inodes from the orphan
list"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (25 commits)
ext4: handle symlink properly with inline_data
ext4: fix wrong assert in ext4_mb_normalize_request()
ext4: fix zeroing of page during writeback
ext4: remove unused local variable "stored" from ext4_readdir(...)
ext4: fix ZERO_RANGE test failure in data journalling
ext4: reduce contention on s_orphan_lock
ext4: use sbi in ext4_orphan_{add|del}()
ext4: use EXT_MAX_BLOCKS in ext4_es_can_be_merged()
ext4: add missing BUFFER_TRACE before ext4_journal_get_write_access
ext4: remove unnecessary double parentheses
ext4: do not destroy ext4_groupinfo_caches if ext4_mb_init() fails
ext4: make local functions static
ext4: fix block bitmap validation when bigalloc, ^flex_bg
ext4: fix block bitmap initialization under sparse_super2
ext4: find the group descriptors on a 1k-block bigalloc,meta_bg filesystem
ext4: avoid unneeded lookup when xattr name is invalid
ext4: fix data integrity sync in ordered mode
ext4: remove obsoleted check
ext4: add a new spinlock i_raw_lock to protect the ext4's raw inode
ext4: fix locking for O_APPEND writes
...
Adds trace events that give finer resolution into suspend/resume. These
events are graphed in the timelines generated by the analyze_suspend.py
script. They represent large areas of time consumed that are typical to
suspend and resume.
The event is triggered by calling the function "trace_suspend_resume"
with three arguments: a string (the name of the event to be displayed
in the timeline), an integer (case specific number, such as the power
state or cpu number), and a boolean (where true is used to denote the start
of the timeline event, and false to denote the end).
The suspend_resume trace event reproduces the data that the machine_suspend
trace event did, so the latter has been removed.
Signed-off-by: Todd Brandt <todd.e.brandt@intel.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
During compaction, update_nr_listpages() has been used to count remaining
non-migrated and free pages after a call to migrage_pages(). The
freepages counting has become unneccessary, and it turns out that
migratepages counting is also unnecessary in most cases.
The only situation when it's needed to count cc->migratepages is when
migrate_pages() returns with a negative error code. Otherwise, the
non-negative return value is the number of pages that were not migrated,
which is exactly the count of remaining pages in the cc->migratepages
list.
Furthermore, any non-zero count is only interesting for the tracepoint of
mm_compaction_migratepages events, because after that all remaining
unmigrated pages are put back and their count is set to 0.
This patch therefore removes update_nr_listpages() completely, and changes
the tracepoint definition so that the manual counting is done only when
the tracepoint is enabled, and only when migrate_pages() returns a
negative error code.
Furthermore, migrate_pages() and the tracepoints won't be called when
there's nothing to migrate. This potentially avoids some wasted cycles
and reduces the volume of uninteresting mm_compaction_migratepages events
where "nr_migrated=0 nr_failed=0". In the stress-highalloc mmtest, this
was about 75% of the events. The mm_compaction_isolate_migratepages event
is better for determining that nothing was isolated for migration, and
this one was just duplicating the info.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that we are doing NUMA-aware shrinking, and can have shrinkers
running in parallel, or working on individual nodes, it seems like we
should also be sticking the node in the output.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Chinner <david@fromorbit.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I was looking at a trace of the slab shrinkers (attachment in this comment):
https://bugs.freedesktop.org/show_bug.cgi?id=72742#c67
and noticed that "total_scan" can go negative in some cases. We
used to dump out the "total_scan" variable directly, but some of
the shrinker modifications along the way changed that.
This patch just dumps it out directly, again. It doesn't make
any sense to derive it from new_nr and nr any more since there
are now other shrinkers that can be running in parallel and
mucking with those values.
Here's an example of the negative numbers in the output:
> kswapd0-840 [000] 160.869398: mm_shrink_slab_end: i915_gem_inactive_scan+0x0 0xffff8800037cbc68: unused scan count 10 new scan count 39 total_scan 29 last shrinker return val 256
> kswapd0-840 [000] 160.869618: mm_shrink_slab_end: i915_gem_inactive_scan+0x0 0xffff8800037cbc68: unused scan count 39 new scan count 102 total_scan 63 last shrinker return val 256
> kswapd0-840 [000] 160.870031: mm_shrink_slab_end: i915_gem_inactive_scan+0x0 0xffff8800037cbc68: unused scan count 102 new scan count 47 total_scan -55 last shrinker return val 768
> kswapd0-840 [000] 160.870464: mm_shrink_slab_end: i915_gem_inactive_scan+0x0 0xffff8800037cbc68: unused scan count 47 new scan count 45 total_scan -2 last shrinker return val 768
> kswapd0-840 [000] 163.384144: mm_shrink_slab_end: i915_gem_inactive_scan+0x0 0xffff8800037cbc68: unused scan count 45 new scan count 56 total_scan 11 last shrinker return val 0
> kswapd0-840 [000] 163.384297: mm_shrink_slab_end: i915_gem_inactive_scan+0x0 0xffff8800037cbc68: unused scan count 56 new scan count 15 total_scan -41 last shrinker return val 256
> kswapd0-840 [000] 163.384414: mm_shrink_slab_end: i915_gem_inactive_scan+0x0 0xffff8800037cbc68: unused scan count 15 new scan count 117 total_scan 102 last shrinker return val 0
> kswapd0-840 [000] 163.384657: mm_shrink_slab_end: i915_gem_inactive_scan+0x0 0xffff8800037cbc68: unused scan count 117 new scan count 36 total_scan -81 last shrinker return val 512
> kswapd0-840 [000] 163.384880: mm_shrink_slab_end: i915_gem_inactive_scan+0x0 0xffff8800037cbc68: unused scan count 36 new scan count 111 total_scan 75 last shrinker return val 256
> kswapd0-840 [000] 163.385256: mm_shrink_slab_end: i915_gem_inactive_scan+0x0 0xffff8800037cbc68: unused scan count 111 new scan count 34 total_scan -77 last shrinker return val 768
> kswapd0-840 [000] 163.385598: mm_shrink_slab_end: i915_gem_inactive_scan+0x0 0xffff8800037cbc68: unused scan count 34 new scan count 122 total_scan 88 last shrinker return val 512
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Chinner <david@fromorbit.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently to allocate a page that should be charged to kmemcg (e.g.
threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page
allocated is then to be freed by free_memcg_kmem_pages. Apart from
looking asymmetrical, this also requires intrusion to the general
allocation path. So let's introduce separate functions that will
alloc/free pages charged to kmemcg.
The new functions are called alloc_kmem_pages and free_kmem_pages. They
should be used when the caller actually would like to use kmalloc, but
has to fall back to the page allocator for the allocation is large.
They only differ from alloc_pages and free_pages in that besides
allocating or freeing pages they also charge them to the kmem resource
counter of the current memory cgroup.
[sfr@canb.auug.org.au: export kmalloc_order() to modules]
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If a trace event uses a dynamic array for something other than a string
then there's currently no way the TP_printk() can figure out what size
it is. A __get_dynamic_array_len() is required to know the length.
This also simplifies the __get_bitmask() macro which required it as well,
but instead just hardcoded it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
At this time, majority of changes come from ASoC world while we got a
few new drivers in other places for FireWire and USB. There have been
lots of ASoC core cleanups / refactoring, but very little visible to
external users.
ASoC
- Support for specifying aux CODECs in DT
- Removal of the deprecated mux and enum macros
- More moves towards full componentisation
- Removal of some unused I/O code
- Lots of cleanups, fixes and enhancements to the davinci, Freescale,
Haswell and Realtek drivers
- Several drivers exposed directly in Kconfig for use with simple-card
- GPIO descriptor support for jacks
- More updates and fixes to the Freescale SSI, Intel and rsnd drivers
- New drivers for Cirrus CS42L56, Realtek RT5639, RT5642 and RT5651 and
ST STA350, Analog Devices ADAU1361, ADAU1381, ADAU1761 and ADAU1781,
and Realtek RT5677
HD-audio:
- Clean up Dell headset quirks
- Noise fixes for Dell and Sony laptops
- Thinkpad T440 dock fix
- Realtek codec updates (ALC293,ALC233,ALC3235)
- Tegra HD-audio HDMI support
FireWire-audio:
- FireWire audio stack enhancement (AMDTP, MIDI), support for incoming
isochronous stream and duplex streams with timestamp synchronization
- BeBoB-based devices support
- Fireworks-based device support
USB-audio:
- Behringer BCD2000 USB device support
Misc:
- Clean up of a few old drivers, atmel, fm801, etc
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJTjzW4AAoJEGwxgFQ9KSmkUrMP/1z43Kp+F9Y0v0VBH6oR/d4N
l9IyxBno/ABxfWloGFnRLEyzZyj2yG8A7inT0alVXJifHJN4iPOKBb5dPE9LMRvc
qLhJjMwznAirkuE8Wsk+IAoKuyXEI4m+KKEIXt5WJ3UyAo/j1lySZVMChzcTFFk/
oc2C6CciYrQLziaaL/K5zD9v9XdDr9koOaSHK/xjUOCbDlEBJu6T2IvRI/tkqJmy
8oRRhRteXZ9D959+ftntKrFVf10APQ4ZQbsX/pHboduaoozYAJSJGFhQNbh/UZnb
zwwwanNZvLwzn+rRXJJuzHF4jra34CuQFL2awsDP9Wck9E3YLmt4audNQ6LM6J8z
IVZs5IjMIL1ey1T2oRczLnv7EoDp0xdP38GqXnQ88j3zd+Ifi77idNw1ssU1aZ5B
LzEFEytT1UbEUkqom9qtIG+GId9hSmVmHQuLsc6Ayg7md0oBeJnBC05Xt5FATdrp
HseHYfSrNNDBFKyj8+j0TVtHc9Xf4SKziSVWz/PT0gaROzOsR2e46HC2Hvut+OFZ
rLLPXn9up5viQFxOTbO7sdYGCYa/iVH7IwB2oCP6Z5/I8+fhsU7aA4Hl+0wBikin
PDSwuchmRlNpHJ18YDonjzFtWA51wG4IlcNbQY4ywO/jFae06KYxQPTwvmJI0+oV
GXyKtjdBnQg8nnWJlS8J
=nxFA
-----END PGP SIGNATURE-----
Merge tag 'sound-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound into next
Pull sound updates from Takashi Iwai:
"At this time, majority of changes come from ASoC world while we got a
few new drivers in other places for FireWire and USB. There have been
lots of ASoC core cleanups / refactoring, but very little visible to
external users.
ASoC:
- Support for specifying aux CODECs in DT
- Removal of the deprecated mux and enum macros
- More moves towards full componentisation
- Removal of some unused I/O code
- Lots of cleanups, fixes and enhancements to the davinci, Freescale,
Haswell and Realtek drivers
- Several drivers exposed directly in Kconfig for use with
simple-card
- GPIO descriptor support for jacks
- More updates and fixes to the Freescale SSI, Intel and rsnd drivers
- New drivers for Cirrus CS42L56, Realtek RT5639, RT5642 and RT5651
and ST STA350, Analog Devices ADAU1361, ADAU1381, ADAU1761 and
ADAU1781, and Realtek RT5677
HD-audio:
- Clean up Dell headset quirks
- Noise fixes for Dell and Sony laptops
- Thinkpad T440 dock fix
- Realtek codec updates (ALC293,ALC233,ALC3235)
- Tegra HD-audio HDMI support
FireWire-audio:
- FireWire audio stack enhancement (AMDTP, MIDI), support for
incoming isochronous stream and duplex streams with timestamp
synchronization
- BeBoB-based devices support
- Fireworks-based device support
USB-audio:
- Behringer BCD2000 USB device support
Misc:
- Clean up of a few old drivers, atmel, fm801, etc"
* tag 'sound-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (480 commits)
ASoC: Fix wrong argument for card remove callbacks
ASoC: free jack GPIOs before the sound card is freed
ALSA: firewire-lib: Remove a comment about restriction of asynchronous operation
ASoC: cache: Fix error code when not using ASoC level cache
ALSA: hda/realtek - Fix COEF widget NID for ALC260 replacer fixup
ALSA: hda/realtek - Correction of fixup codes for PB V7900 laptop
ALSA: firewire-lib: Use IEC 61883-6 compliant labels for Raw Audio data
ASoC: add RT5677 CODEC driver
ASoC: intel: The Baytrail/MAX98090 driver depends on I2C
ASoC: rt5640: Add the function "get_clk_info" to RL6231 shared support
ASoC: rt5640: Add the function of the PLL clock calculation to RL6231 shared support
ASoC: rt5640: Add RL6231 class device shared support for RT5640, RT5645 and RT5651
ASoC: cache: Fix possible ZERO_SIZE_PTR pointer dereferencing error.
ASoC: Add helper functions to cast from DAPM context to CODEC/platform
ALSA: bebob: sizeof() vs ARRAY_SIZE() typo
ASoC: wm9713: correct mono out PGA sources
ALSA: synth: emux: soundfont.c: Cleaning up memory leak
ASoC: fsl: Remove dependencies of boards for SND_SOC_EUKREA_TLV320
ASoC: fsl-ssi: Use regmap
ASoC: fsl-ssi: reorder and document fsl_ssi_private
...
v2: add a __break_lease tracepoint for non-blocking case
Recently, I needed these to help track down a softlockup when recalling a
delegation, but they might be helpful in other situations as well.
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@poochiereds.net>
Being able to show a cpumask of events can be useful as some events
may affect only some CPUs. There is no standard way to record the
cpumask and converting it to a string is rather expensive during
the trace as traces happen in hotpaths. It would be better to record
the raw event mask and be able to parse it at print time.
The following macros were added for use with the TRACE_EVENT() macro:
__bitmask()
__assign_bitmask()
__get_bitmask()
To test this, I added this to the sched_migrate_task event, which
looked like this:
TRACE_EVENT(sched_migrate_task,
TP_PROTO(struct task_struct *p, int dest_cpu, const struct cpumask *cpus),
TP_ARGS(p, dest_cpu, cpus),
TP_STRUCT__entry(
__array( char, comm, TASK_COMM_LEN )
__field( pid_t, pid )
__field( int, prio )
__field( int, orig_cpu )
__field( int, dest_cpu )
__bitmask( cpumask, num_possible_cpus() )
),
TP_fast_assign(
memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
__entry->pid = p->pid;
__entry->prio = p->prio;
__entry->orig_cpu = task_cpu(p);
__entry->dest_cpu = dest_cpu;
__assign_bitmask(cpumask, cpumask_bits(cpus), num_possible_cpus());
),
TP_printk("comm=%s pid=%d prio=%d orig_cpu=%d dest_cpu=%d cpumask=%s",
__entry->comm, __entry->pid, __entry->prio,
__entry->orig_cpu, __entry->dest_cpu,
__get_bitmask(cpumask))
);
With the output of:
ksmtuned-3613 [003] d..2 485.220508: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=3 dest_cpu=2 cpumask=00000000,0000000f
migration/1-13 [001] d..5 485.221202: sched_migrate_task: comm=ksmtuned pid=3614 prio=120 orig_cpu=1 dest_cpu=0 cpumask=00000000,0000000f
awk-3615 [002] d.H5 485.221747: sched_migrate_task: comm=rcu_preempt pid=7 prio=120 orig_cpu=0 dest_cpu=1 cpumask=00000000,000000ff
migration/2-18 [002] d..5 485.222062: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=2 dest_cpu=3 cpumask=00000000,0000000f
Link: http://lkml.kernel.org/r/1399377998-14870-6-git-send-email-javi.merino@arm.com
Link: http://lkml.kernel.org/r/20140506132238.22e136d1@gandalf.local.home
Suggested-by: Javi Merino <javi.merino@arm.com>
Tested-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The first is a long standing bug that causes bogus data to show up
in the refcnt field of the module_refcnt tracepoint. It was
introduced by a merge conflict resolution back in 2.6.35-rc days.
The result should be refcnt = incs - decs, but instead it did
refcnt = incs + decs.
The second fix is to a bug that was introduced in this merge window
that allowed for a tracepoint funcs pointer to be used after it
was freed. Moving the location of where the probes are released
solved the problem.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJTa/GQAAoJEKQekfcNnQGuGGUIAJCkrDZdnliE6f6Ur8aXJoX7
gjkXRMmCjLM/X8yQc1H8YwDbSgaTQNmeyQvBbBZ1hUQBaMf5ft4KuFYMGvQRk3jp
ZheQVumSzsQfO+yp5dRmzJ6H2G0BCInxq9VZyufZkCPUGsMyiIc+7+SGHEfjMgmW
9XFWyfSr09thVlGanr+OTLXfwFm7GMD9nohLKXh9dhi/tO/gHq6lI83HK42Y1bWG
4fZWJjO5GgCVbW4RanB6yr9RIe8NESKl37JYsAZX61iAvT8/mqIYGWx0i/DEGN5Q
ap3WW5QPALLUlvUVgI9Um0KOrotbmKtnRwPeHYDmSQODwuKj5veiLXxL9XdHPLU=
=lt0T
-----END PGP SIGNATURE-----
Merge tag 'trace-fixes-v3.15-rc4-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"This contains two fixes.
The first is a long standing bug that causes bogus data to show up in
the refcnt field of the module_refcnt tracepoint. It was introduced
by a merge conflict resolution back in 2.6.35-rc days.
The result should be 'refcnt = incs - decs', but instead it did
'refcnt = incs + decs'.
The second fix is to a bug that was introduced in this merge window
that allowed for a tracepoint funcs pointer to be used after it was
freed. Moving the location of where the probes are released solved
the problem"
* tag 'trace-fixes-v3.15-rc4-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracepoint: Fix use of tracepoint funcs after rcu free
trace: module: Maintain a valid user count
The replacement of the 'count' variable by two variables 'incs' and
'decs' to resolve some race conditions during module unloading was done
in parallel with some cleanup in the trace subsystem, and was integrated
as a merge.
Unfortunately, the formula for this replacement was wrong in the tracing
code, and the refcount in the traces was not usable as a result.
Use 'count = incs - decs' to compute the user count.
Link: http://lkml.kernel.org/p/1393924179-9147-1-git-send-email-romain.izard.pro@gmail.com
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: stable@vger.kernel.org # 2.6.35
Fixes: c1ab9cab75 "merge conflict resolution"
Signed-off-by: Romain Izard <romain.izard.pro@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This patch adds a tracepoint for f2fs_read_data_page to trace when page is
readed by user.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This patch adds a tracepoint for f2fs_write_{meta,node,data}_pages to trace when
pages are fsyncing/flushing.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This patch adds a tracepoint for f2fs_write_{meta,node,data}_page to trace when
page is writting out.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This patch adds a tracepoint for f2fs_write_end to trace write op of user.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This patch adds a tracepoint for f2fs_write_begin to trace write op of user.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
In the following commit:
commit 57673c2b0b
Author: Rusty Russell <rusty@rustcorp.com.au>
Date: Mon Mar 31 14:39:57 2014 +1030
Use 'E' instead of 'X' for unsigned module taint flag.
One site has been forgotten in trace events module.h.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The ASoC framework is in the process of migrating all IO operations to regmap.
regmap has its own more sophisticated tracing infrastructure for IO operations,
which means that the ASoC level IO tracing becomes redundant, hence this patch
removes them. There are still a handful of ASoC drivers left that do not use
regmap yet, but hopefully the removal of the ASoC IO tracing will be an
additional incentive to switch to regmap.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
Currently in ext4 there is quite a mess when it comes to naming
unwritten extents. Sometimes we call it uninitialized and sometimes we
refer to it as unwritten.
The right name for the extent which has been allocated but does not
contain any written data is _unwritten_. Other file systems are
using this name consistently, even the buffer head state refers to it as
unwritten. We need to fix this confusion in ext4.
This commit changes every reference to an uninitialized extent (meaning
allocated but unwritten) to unwritten extent. This includes comments,
function names and variable names. It even covers abbreviation of the
word uninitialized (such as uninit) and some misspellings.
This commit does not change any of the code paths at all. This has been
confirmed by comparing md5sums of the assembly code of each object file
after all the function names were stripped from it.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Currently EXT4_MAP_UNINIT is used in dioread_nolock case to mark the
cases where we're using dioread_nolock and we're writing into either
unallocated, or unwritten extent, because we need to make sure that
any DIO write into that inode will wait for the extent conversion.
However EXT4_MAP_UNINIT is not only entirely misleading name but also
unnecessary because we can check for EXT4_MAP_UNWRITTEN in the
dioread_nolock case instead.
This commit removes EXT4_MAP_UNINIT flag.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
We had a number of new features in ext4 during this merge window
(ZERO_RANGE and COLLAPSE_RANGE fallocate modes, renameat, etc.) so
there were many more regression and bug fixes this time around. It
didn't help that xfstests hadn't been fully updated to fully stress
test COLLAPSE_RANGE until after -rc1.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJTVIEUAAoJENNvdpvBGATwnKkQANlzQv6BhgzCa0b5Iu0SkHeD
OuLAtPFYE5OVEK22oWT0H76gBi71RHLboHwThd+ZfEeEPvyfs42wY0J/PV/R9dHx
kwhU+MaDDzugfVj3gg29DpYNLQkL/evq0vlNbrRk5je877c2I8JbXV/aAoTVFZoH
NGOsagwBqWCsgL5nSOk/nEZSRX2AzSCkgmOVxylLzFoyTUkX3vZx8G8XtS1zRgbH
b1yOWIK1Ifj7tmBZ4HLpNiK6/NpHAHeHRFiaCQxY0hkLjUeMyVNJfZzXS/Fzp8DP
p1/nm5z9PaFj4nyBC1Wvh9Z6Lj0zQ0ap73LV+w4fHM1SZub3XY+hvyXj/8qMNaSc
lLIGwa2AZFpurbKKn6MZTi5CubVLZs6PZKzDgYURnEcJCgeMujMOxbKekcL5sP9E
Gb6Hh9I/f08HagCRox5O0W7f0/TBY5bFryx5kQQZUtpcRmnY3m7cohSkn6WriwTZ
zYApOZMZkFX5spSeYsfyi8K8wHij/5mXvm7qeqQ0Rj4Ehycd+7jwltOCVXAYN29+
zSKaBaxH2+V7zuGHSxjDFbOOlPotTFNzGmFh08DPTF4Vgnc9uMlLo0Oz8ADFDcT2
JZ4pAFTEREnHOATNl5bAEi8wNrU/Ln9IGhlYCYI9X5BQXjf9oPXcYwQT/lKCb07s
ks8ujfry1R/gjQGuv+LH
=gi42
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"These are regression and bug fixes for ext4.
We had a number of new features in ext4 during this merge window
(ZERO_RANGE and COLLAPSE_RANGE fallocate modes, renameat, etc.) so
there were many more regression and bug fixes this time around. It
didn't help that xfstests hadn't been fully updated to fully stress
test COLLAPSE_RANGE until after -rc1"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (31 commits)
ext4: disable COLLAPSE_RANGE for bigalloc
ext4: fix COLLAPSE_RANGE failure with 1KB block size
ext4: use EINVAL if not a regular file in ext4_collapse_range()
ext4: enforce we are operating on a regular file in ext4_zero_range()
ext4: fix extent merging in ext4_ext_shift_path_extents()
ext4: discard preallocations after removing space
ext4: no need to truncate pagecache twice in collapse range
ext4: fix removing status extents in ext4_collapse_range()
ext4: use filemap_write_and_wait_range() correctly in collapse range
ext4: use truncate_pagecache() in collapse range
ext4: remove temporary shim used to merge COLLAPSE_RANGE and ZERO_RANGE
ext4: fix ext4_count_free_clusters() with EXT4FS_DEBUG and bigalloc enabled
ext4: always check ext4_ext_find_extent result
ext4: fix error handling in ext4_ext_shift_extents
ext4: silence sparse check warning for function ext4_trim_extent
ext4: COLLAPSE_RANGE only works on extent-based files
ext4: fix byte order problems introduced by the COLLAPSE_RANGE patches
ext4: use i_size_read in ext4_unaligned_aio()
fs: disallow all fallocate operation on active swapfile
fs: move falloc collapse range check into the filesystem methods
...
In retrospect, this was a bad way to handle things, since it limited
testing of these patches. We should just get the VFS level changes
merged in first.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
design of tracepoints and how a user could register a tracepoint
and have that tracepoint not be activated but no error was shown.
The design was for an out of tree module but broke in tree users.
The clean up was to remove the saving of the hash table of tracepoint
names such that they can be enabled before they exist (enabling
a module tracepoint before that module is loaded). This added more
complexity than needed. The clean up was to remove that code and
just enable tracepoints that exist or fail if they do not.
This removed a lot of code as well as the complexity that it brought.
As a side effect, instead of registering a tracepoint by its name,
the tracepoint needs to be registered with the tracepoint descriptor.
This removes having to duplicate the tracepoint names that are
enabled.
The second patch was added that simplified the way modules were
searched for.
This cleanup required changes that were in the 3.15 queue as well as
some changes that were added late in the 3.14-rc cycle. This final
change waited till the two were merged in upstream and then the
change was added and full tests were run. Unfortunately, the
test found some errors, but after it was already submitted to the
for-next branch and not to be rebased. Sparse errors were detected
by Fengguang Wu's bot tests, and my internal tests discovered that
the anonymous union initialization triggered a bug in older gcc compilers.
Luckily, there was a bugzilla for the gcc bug which gave a work around
to the problem. The third and fourth patch handled the sparse error
and the gcc bug respectively.
A final patch was tagged along to fix a missing documentation for
the README file.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJTR+pwAAoJEKQekfcNnQGuvfoH/A4XZu4/1h2ZuKhzGi6lrrWr
+zHUQ+JmGiAYRziQFwr2t/gqJ2vmDfHJnbDjKi6Emx8JcxesHas6CQOWps4zEic0
dwYSQjvuGNGFIFt+7I0K1OxfVVdt2PQ2lVrB5WgYdbash5J4Bi+09QBv0RbUKheo
37dKSeN3pbsuQsR70OTVP8laG3dA9IbHW7PsKnxIEB5zeIUHUBME/QdPPj/CuJwk
wxZjXC2dbc3rdRlQjTVtWV3ZkGgZJB0k+JxjvZTA0N6u8Hj8LiFPuNawzf7ceBHx
gc++57+WuMW0f0X/ar5/+3UPGFQKMSvKmdxIQCnWXQz5seTYYKDEx7mTH22fxgg=
=OgeQ
-----END PGP SIGNATURE-----
Merge tag 'trace-3.15-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull more tracing updates from Steven Rostedt:
"This includes the final patch to clean up and fix the issue with the
design of tracepoints and how a user could register a tracepoint and
have that tracepoint not be activated but no error was shown.
The design was for an out of tree module but broke in tree users. The
clean up was to remove the saving of the hash table of tracepoint
names such that they can be enabled before they exist (enabling a
module tracepoint before that module is loaded). This added more
complexity than needed. The clean up was to remove that code and just
enable tracepoints that exist or fail if they do not.
This removed a lot of code as well as the complexity that it brought.
As a side effect, instead of registering a tracepoint by its name, the
tracepoint needs to be registered with the tracepoint descriptor.
This removes having to duplicate the tracepoint names that are
enabled.
The second patch was added that simplified the way modules were
searched for.
This cleanup required changes that were in the 3.15 queue as well as
some changes that were added late in the 3.14-rc cycle. This final
change waited till the two were merged in upstream and then the change
was added and full tests were run. Unfortunately, the test found some
errors, but after it was already submitted to the for-next branch and
not to be rebased. Sparse errors were detected by Fengguang Wu's bot
tests, and my internal tests discovered that the anonymous union
initialization triggered a bug in older gcc compilers. Luckily, there
was a bugzilla for the gcc bug which gave a work around to the
problem. The third and fourth patch handled the sparse error and the
gcc bug respectively.
A final patch was tagged along to fix a missing documentation for the
README file"
* tag 'trace-3.15-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Add missing function triggers dump and cpudump to README
tracing: Fix anonymous unions in struct ftrace_event_call
tracepoint: Fix sparse warnings in tracepoint.c
tracepoint: Simplify tracepoint module search
tracepoint: Use struct pointer instead of name hash for reg/unreg tracepoints
Pull i2c updates from Wolfram Sang:
"Here is the pull request from the i2c subsystem. It got a little
delayed because I needed to wait for a dependency to be included
(commit b424080a9e: "reset: Add optional resets and stubs"). Plus,
I had some email problems. All done now, the highlights are:
- drivers can now deprecate their use of i2c classes. That shouldn't
be used on embedded platforms anyhow and was often blindly
copy&pasted. This mechanism gives users time to switch away and
ultimately boot faster once the use of classes for those drivers is
gone for good.
- new drivers for QUP, Cadence, efm32
- tracepoint support for I2C and SMBus
- bigger cleanups for the mv64xxx, nomadik, and designware drivers
And the usual bugfixes, cleanups, feature additions. Most stuff has
been in linux-next for a while. Just some hot fixes and new drivers
were added a bit more recently."
* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (63 commits)
i2c: cadence: fix Kconfig dependency
i2c: Add driver for Cadence I2C controller
i2c: cadence: Document device tree bindings
Documentation: i2c: improve section about flags mangling the protocol
i2c: qup: use proper type fro clk_freq
i2c: qup: off by ones in qup_i2c_probe()
i2c: efm32: fix binding doc
MAINTAINERS: update I2C web resources
i2c: qup: New bus driver for the Qualcomm QUP I2C controller
i2c: qup: Add device tree bindings information
i2c: i2c-xiic: deprecate class based instantiation
i2c: i2c-sirf: deprecate class based instantiation
i2c: i2c-mv64xxx: deprecate class based instantiation
i2c: i2c-designware-platdrv: deprecate class based instantiation
i2c: i2c-davinci: deprecate class based instantiation
i2c: i2c-bcm2835: deprecate class based instantiation
i2c: mv64xxx: Fix reset controller handling
i2c: omap: fix usage of IS_ERR_VALUE with pm_runtime_get_sync
i2c: efm32: new bus driver
i2c: exynos5: remove unnecessary cast of void pointer
...
Fix the following sparse warnings:
CHECK kernel/tracepoint.c
kernel/tracepoint.c:184:18: warning: incorrect type in assignment (different address spaces)
kernel/tracepoint.c:184:18: expected struct tracepoint_func *tp_funcs
kernel/tracepoint.c:184:18: got struct tracepoint_func [noderef] <asn:4>*funcs
kernel/tracepoint.c:216:18: warning: incorrect type in assignment (different address spaces)
kernel/tracepoint.c:216:18: expected struct tracepoint_func *tp_funcs
kernel/tracepoint.c:216:18: got struct tracepoint_func [noderef] <asn:4>*funcs
kernel/tracepoint.c:392:24: error: return expression in void function
CC kernel/tracepoint.o
kernel/tracepoint.c: In function tracepoint_module_going:
kernel/tracepoint.c:491:6: warning: symbol 'syscall_regfunc' was not declared. Should it be static?
kernel/tracepoint.c:508:6: warning: symbol 'syscall_unregfunc' was not declared. Should it be static?
Link: http://lkml.kernel.org/r/1397049883-28692-1-git-send-email-mathieu.desnoyers@efficios.com
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Register/unregister tracepoint probes with struct tracepoint pointer
rather than tracepoint name.
This change, which vastly simplifies tracepoint.c, has been proposed by
Steven Rostedt. It also removes 8.8kB (mostly of text) to the vmlinux
size.
From this point on, the tracers need to pass a struct tracepoint pointer
to probe register/unregister. A probe can now only be connected to a
tracepoint that exists. Moreover, tracers are responsible for
unregistering the probe before the module containing its associated
tracepoint is unloaded.
text data bss dec hex filename
10443444 4282528 10391552 25117524 17f4354 vmlinux.orig
10434930 4282848 10391552 25109330 17f2352 vmlinux
Link: http://lkml.kernel.org/r/1396992381-23785-2-git-send-email-mathieu.desnoyers@efficios.com
CC: Ingo Molnar <mingo@kernel.org>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Frank Ch. Eigler <fche@redhat.com>
CC: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
[ SDR - fixed return val in void func in tracepoint_module_going() ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Starting from commit c4ad8f98be ("execve: use 'struct filename *' for
executable name passing") bprm->filename can not go away after
flush_old_exec(), so we do not need to save the binary name in
bprm->tcomm[] added by 96e02d1586 ("exec: fix use-after-free bug in
setup_new_exec()").
And there was never need for filename_to_taskname-like code, we can
simply do set_task_comm(kbasename(filename).
This patch has to change set_task_comm() and trace_task_rename() to
accept "const char *", but I think this change is also good.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
a staging driver; fix included. Greg KH said he'd take the patch
but hadn't as the merge window opened, so it's included here
to avoid breaking build.
Cheers,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJTQMH9AAoJENkgDmzRrbjxo4UP/jwlenP44v+RFpo/dn8Z8E2n
SREQscU5ZZKvuyFD6kUdvOz8YC/nTrJvXoVkMUF05GVbuvb8/8UPtT9ECVemd0rW
xNy4aFfv9rbrqRLBLpLK9LAgTuhwlbTgGxgL78zRn3hWmf1hBZWCY+cEvKM8l/+9
oEQdORL0sUpZh7iryAeGqbOrXT4gqJEvSLOFwiYTSo6ryzWIilmdXSUAh6s8MIEX
PR1+oH9J8B6J29lcXKMf8/sDI1EBUeSLdBmMCuN5Y7xpYxsQLroVx94kPbdBY+XK
ZRoYuUGSUJfGRZY46cFKApIGeF07z1DGoyXghbSWEQrI+23TMUmrKUg47LSukE4Y
yCUf8HAtqIA3gVc9GKDdSp/2UpkAhTTv5ogKgnIzs1InWtOIBdDRSVUQXDosFEXw
6ZZe1pQs2zfXyXxO4j0Wq36K4RgI0aqOVw+dcC+w5BidjVylgnYRV0PSDd72tid7
bIfnjDbUBo+o4LanPNGYK474KyO7AslgTE50w6zwbJzgdwCQ36hCpKqScBZzm60a
42LrgTVoIHHWAL1tDzWL/LzWflZGdJAezzNje0/f2Q3bGMiNHWoljAvUphkTZ7qt
E8+jWqmM+riH3e8Y5wKpO1BKt7NGHISEy//bUlnqTwisjIzVILZ6VjfugQ1AI+0x
llTXPBotFvfvXqxunBg7
=yzUO
-----END PGP SIGNATURE-----
Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module updates from Rusty Russell:
"Nothing major: the stricter permissions checking for sysfs broke a
staging driver; fix included. Greg KH said he'd take the patch but
hadn't as the merge window opened, so it's included here to avoid
breaking build"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
staging: fix up speakup kobject mode
Use 'E' instead of 'X' for unsigned module taint flag.
VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.
kallsyms: fix percpu vars on x86-64 with relocation.
kallsyms: generalize address range checking
module: LLVMLinux: Remove unused function warning from __param_check macro
Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE
module: remove MODULE_GENERIC_TABLE
module: allow multiple calls to MODULE_DEVICE_TABLE() per module
module: use pr_cont
and COLLAPSE_RANGE fallocate operations, and scalability improvements
in the jbd2 layer and in xattr handling when the extended attributes
spill over into an external block.
Other than that, the usual clean ups and minor bug fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJTPbD2AAoJENNvdpvBGATwDmUQANSfGYIQazB8XKKgtNTMiG/Y
Ky7n1JzN9lTX/6nMsqQnbfCweLRmxqpWUBuyKDRHUi8IG0/voXSTFsAOOgz0R15A
ERRRWkVvHixLpohuL/iBdEMFHwNZYPGr3jkm0EIgzhtXNgk5DNmiuMwvHmCY27kI
kdNZIw9fip/WRNoFLDBGnLGC37aanoHhCIbVlySy5o9LN1pkC8BgXAYV0Rk19SVd
bWCudSJEirFEqWS5H8vsBAEm/ioxTjwnNL8tX8qms6orZ6h8yMLFkHoIGWPw3Q15
a0TSUoMyav50Yr59QaDeWx9uaPQVeK41wiYFI2rZOnyG2ts0u0YXs/nLwJqTovgs
rzvbdl6cd3Nj++rPi97MTA7iXK96WQPjsDJoeeEgnB0d/qPyTk6mLKgftzLTNgSa
ZmWjrB19kr6CMbebMC4L6eqJ8Fr66pCT8c/iue8wc4MUHi7FwHKH64fqWvzp2YT/
+165dqqo2JnUv7tIp6sUi1geun+bmDHLZFXgFa7fNYFtcU3I+uY1mRr3eMVAJndA
2d6ASe/KhQbpVnjKJdQ8/b833ZS3p+zkgVPrd68bBr3t7gUmX91wk+p1ct6rUPLr
700F+q/pQWL8ap0pU9Ht/h3gEJIfmRzTwxlOeYyOwDseqKuS87PSB3BzV3dDunSU
DrPKlXwIgva7zq5/S0Vr
=4s1Z
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Major changes for 3.14 include support for the newly added ZERO_RANGE
and COLLAPSE_RANGE fallocate operations, and scalability improvements
in the jbd2 layer and in xattr handling when the extended attributes
spill over into an external block.
Other than that, the usual clean ups and minor bug fixes"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits)
ext4: fix premature freeing of partial clusters split across leaf blocks
ext4: remove unneeded test of ret variable
ext4: fix comment typo
ext4: make ext4_block_zero_page_range static
ext4: atomically set inode->i_flags in ext4_set_inode_flags()
ext4: optimize Hurd tests when reading/writing inodes
ext4: kill i_version support for Hurd-castrated file systems
ext4: each filesystem creates and uses its own mb_cache
fs/mbcache.c: doucple the locking of local from global data
fs/mbcache.c: change block and index hash chain to hlist_bl_node
ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate
ext4: refactor ext4_fallocate code
ext4: Update inode i_size after the preallocation
ext4: fix partial cluster handling for bigalloc file systems
ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents
ext4: only call sync_filesystm() when remounting read-only
fs: push sync_filesystem() down to the file system's remount_fs()
jbd2: improve error messages for inconsistent journal heads
jbd2: minimize region locked by j_list_lock in jbd2_journal_forget()
jbd2: minimize region locked by j_list_lock in journal_get_create_access()
...
Pull btrfs changes from Chris Mason:
"This is a pretty long stream of bug fixes and performance fixes.
Qu Wenruo has replaced the btrfs async threads with regular kernel
workqueues. We'll keep an eye out for performance differences, but
it's nice to be using more generic code for this.
We still have some corruption fixes and other patches coming in for
the merge window, but this batch is tested and ready to go"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (108 commits)
Btrfs: fix a crash of clone with inline extents's split
btrfs: fix uninit variable warning
Btrfs: take into account total references when doing backref lookup
Btrfs: part 2, fix incremental send's decision to delay a dir move/rename
Btrfs: fix incremental send's decision to delay a dir move/rename
Btrfs: remove unnecessary inode generation lookup in send
Btrfs: fix race when updating existing ref head
btrfs: Add trace for btrfs_workqueue alloc/destroy
Btrfs: less fs tree lock contention when using autodefrag
Btrfs: return EPERM when deleting a default subvolume
Btrfs: add missing kfree in btrfs_destroy_workqueue
Btrfs: cache extent states in defrag code path
Btrfs: fix deadlock with nested trans handles
Btrfs: fix possible empty list access when flushing the delalloc inodes
Btrfs: split the global ordered extents mutex
Btrfs: don't flush all delalloc inodes when we doesn't get s_umount lock
Btrfs: reclaim delalloc metadata more aggressively
Btrfs: remove unnecessary lock in may_commit_transaction()
Btrfs: remove the unnecessary flush when preparing the pages
Btrfs: just do dirty page flush for the inode with compression before direct IO
...
Pull media updates from Mauro Carvalho Chehab:
"The main set of series of patches for media subsystem, including:
- document RC sysfs class
- added an API to setup scancode to allow waking up systems using the
Remote Controller
- add API for SDR devices. Drivers are still on staging
- some API improvements for getting EDID data from media
inputs/outputs
- new DVB frontend driver for drx-j (ATSC)
- one driver (it913x/it9137) got removed, in favor of an improvement
on another driver (af9035)
- added a skeleton V4L2 PCI driver at documentation
- added a dual flash driver (lm3646)
- added a new IR driver (img-ir)
- added an IR scancode decoder for the Sharp protocol
- some improvements at the usbtv driver, to allow its core to be
reused.
- added a new SDR driver (rtl2832u_sdr)
- added a new tuner driver (msi001)
- several improvements at em28xx driver to fix PM support, device
removal and to split the V4L2 specific bits into a separate
sub-driver
- one driver got converted to videobuf2 (s2255drv)
- the e4000 tuner driver now follows an improved binding model
- some fixes at V4L2 compat32 code
- several fixes and enhancements at videobuf2 code
- some cleanups at V4L2 API documentation
- usual driver enhancements, new board additions and misc fixups"
[ NOTE! This merge effective drops commit 4329b93b28 ("of: Reduce
indentation in of_graph_get_next_endpoint").
The of_graph_get_next_endpoint() function was moved and renamed by
commit fd9fdb78a9 ("[media] of: move graph helpers from
drivers/media/v4l2-core to drivers/of"). It was originally called
v4l2_of_get_next_endpoint() and lived in the file
drivers/media/v4l2-core/v4l2-of.c.
In that original location, it was then fixed to support empty port
nodes by commit b9db140c1e ("[media] v4l: of: Support empty port
nodes"), and that commit clashes badly with the dropped "Reduce
intendation" commit. I had to choose one or the other, and decided
that the "Support empty port nodes" commit was more important ]
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (426 commits)
[media] em28xx-dvb: fix PCTV 461e tuner I2C binding
Revert "[media] em28xx-dvb: fix PCTV 461e tuner I2C binding"
[media] em28xx: fix PCTV 290e LNA oops
[media] em28xx-dvb: fix PCTV 461e tuner I2C binding
[media] m88ds3103: fix bug on .set_tone()
[media] saa7134: fix WARN_ON during resume
[media] v4l2-dv-timings: add module name, description, license
[media] videodev2.h: add parenthesis around macro arguments
[media] saa6752hs: depends on CRC32
[media] si4713: fix Kconfig dependencies
[media] Sensoray 2255 uses videobuf2
[media] adv7180: free an interrupt on failure paths in init_device()
[media] e4000: make VIDEO_V4L2 dependency optional
[media] af9033: Don't export functions for the hardware filter
[media] af9035: use af9033 PID filters
[media] af9033: implement PID filter
[media] rtl2832_sdr: do not use dynamic stack allocation
[media] e4000: fix 32-bit build error
[media] em28xx-audio: make sure audio is unmuted on open()
[media] DocBook media: v4l2_format_sdr was renamed to v4l2_sdr_format
...
But there were a few features that were added.
Uprobes now work with event triggers and multi buffers.
Uprobes have support under ftrace and perf.
The big feature is that the function tracer can now be used within the
multi buffer instances. That is, you can now trace some functions
in one buffer, others in another buffer, all functions in a third buffer
and so on. They are basically agnostic from each other. This only
works for the function tracer and not for the function graph trace,
although you can have the function graph tracer running in the top level
buffer (or any tracer for that matter) and have different function tracing
going on in the sub buffers.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJTOthtAAoJEKQekfcNnQGu5c8H/Ana/U+0tmksp1dbHkRHsKSH
+Fsv4Jeu8gf1NaFKHEhkUTcFtnzE6qAPV2VCrcJwXbhAhhwZm+LjrnWdoy3215S3
cQW4LftLEonh2cM36Cos74TulMEYN6XmL6dQZV+CILKQkDrWU4qJjQ64okXEkqrd
9iG3p/mSXyvJcmnyg61ALnMOhZDLsXY3djBhWBPhiTPGS6BRb9zh4Pmw6Zv0n2rJ
U93Gt/3AQrv1ybu73dUxqP0abp60oXOiWoF/R2jcbKqIM+K9RPJX79unCV3jq3u9
f+6jMlB9PgAMqQj6ihJdwxKDDuzwyrVdEPnsgvl4jarCBCtVVwhKedBaKN/KS8k=
=HdXY
-----END PGP SIGNATURE-----
Merge tag 'trace-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"Most of the changes were largely clean ups, and some documentation.
But there were a few features that were added:
Uprobes now work with event triggers and multi buffers and have
support under ftrace and perf.
The big feature is that the function tracer can now be used within the
multi buffer instances. That is, you can now trace some functions in
one buffer, others in another buffer, all functions in a third buffer
and so on. They are basically agnostic from each other. This only
works for the function tracer and not for the function graph trace,
although you can have the function graph tracer running in the top
level buffer (or any tracer for that matter) and have different
function tracing going on in the sub buffers"
* tag 'trace-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (45 commits)
tracing: Add BUG_ON when stack end location is over written
tracepoint: Remove unused API functions
Revert "tracing: Move event storage for array from macro to standalone function"
ftrace: Constify ftrace_text_reserved
tracepoints: API doc update to tracepoint_probe_register() return value
tracepoints: API doc update to data argument
ftrace: Fix compilation warning about control_ops_free
ftrace/x86: BUG when ftrace recovery fails
ftrace: Warn on error when modifying ftrace function
ftrace: Remove freelist from struct dyn_ftrace
ftrace: Do not pass data to ftrace_dyn_arch_init
ftrace: Pass retval through return in ftrace_dyn_arch_init()
ftrace: Inline the code from ftrace_dyn_table_alloc()
ftrace: Cleanup of global variables ftrace_new_pgs and ftrace_update_cnt
tracing: Evaluate len expression only once in __dynamic_array macro
tracing: Correctly expand len expressions from __dynamic_array macro
tracing/module: Replace include of tracepoint.h with jump_label.h in module.h
tracing: Fix event header migrate.h to include tracepoint.h
tracing: Fix event header writeback.h to include tracepoint.h
tracing: Warn if a tracepoint is not set via debugfs
...
Pull networking updates from David Miller:
"Here is my initial pull request for the networking subsystem during
this merge window:
1) Support for ESN in AH (RFC 4302) from Fan Du.
2) Add full kernel doc for ethtool command structures, from Ben
Hutchings.
3) Add BCM7xxx PHY driver, from Florian Fainelli.
4) Export computed TCP rate information in netlink socket dumps, from
Eric Dumazet.
5) Allow IPSEC SA to be dumped partially using a filter, from Nicolas
Dichtel.
6) Convert many drivers to pci_enable_msix_range(), from Alexander
Gordeev.
7) Record SKB timestamps more efficiently, from Eric Dumazet.
8) Switch to microsecond resolution for TCP round trip times, also
from Eric Dumazet.
9) Clean up and fix 6lowpan fragmentation handling by making use of
the existing inet_frag api for it's implementation.
10) Add TX grant mapping to xen-netback driver, from Zoltan Kiss.
11) Auto size SKB lengths when composing netlink messages based upon
past message sizes used, from Eric Dumazet.
12) qdisc dumps can take a long time, add a cond_resched(), From Eric
Dumazet.
13) Sanitize netpoll core and drivers wrt. SKB handling semantics.
Get rid of never-used-in-tree netpoll RX handling. From Eric W
Biederman.
14) Support inter-address-family and namespace changing in VTI tunnel
driver(s). From Steffen Klassert.
15) Add Altera TSE driver, from Vince Bridgers.
16) Optimizing csum_replace2() so that it doesn't adjust the checksum
by checksumming the entire header, from Eric Dumazet.
17) Expand BPF internal implementation for faster interpreting, more
direct translations into JIT'd code, and much cleaner uses of BPF
filtering in non-socket ocntexts. From Daniel Borkmann and Alexei
Starovoitov"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1976 commits)
netpoll: Use skb_irq_freeable to make zap_completion_queue safe.
net: Add a test to see if a skb is freeable in irq context
qlcnic: Fix build failure due to undefined reference to `vxlan_get_rx_port'
net: ptp: move PTP classifier in its own file
net: sxgbe: make "core_ops" static
net: sxgbe: fix logical vs bitwise operation
net: sxgbe: sxgbe_mdio_register() frees the bus
Call efx_set_channels() before efx->type->dimension_resources()
xen-netback: disable rogue vif in kthread context
net/mlx4: Set proper build dependancy with vxlan
be2net: fix build dependency on VxLAN
mac802154: make csma/cca parameters per-wpan
mac802154: allow only one WPAN to be up at any given time
net: filter: minor: fix kdoc in __sk_run_filter
netlink: don't compare the nul-termination in nla_strcmp
can: c_can: Avoid led toggling for every packet.
can: c_can: Simplify TX interrupt cleanup
can: c_can: Store dlc private
can: c_can: Reduce register access
can: c_can: Make the code readable
...
Pull block driver update from Jens Axboe:
"On top of the core pull request, here's the pull request for the
driver related changes for 3.15. It contains:
- Improvements for msi-x registration for block drivers (mtip32xx,
skd, cciss, nvme) from Alexander Gordeev.
- A round of cleanups and improvements for drbd from Andreas
Gruenbacher and Rashika Kheria.
- A round of clanups and improvements for bcache from Kent.
- Removal of sleep_on() and friends in DAC960, ataflop, swim3 from
Arnd Bergmann.
- Bug fix for a bug in the mtip32xx async completion code from Sam
Bradshaw.
- Bug fix for accidentally bouncing IO on 32-bit platforms with
mtip32xx from Felipe Franciosi"
* 'for-3.15/drivers' of git://git.kernel.dk/linux-block: (103 commits)
bcache: remove nested function usage
bcache: Kill bucket->gc_gen
bcache: Kill unused freelist
bcache: Rework btree cache reserve handling
bcache: Kill btree_io_wq
bcache: btree locking rework
bcache: Fix a race when freeing btree nodes
bcache: Add a real GC_MARK_RECLAIMABLE
bcache: Add bch_keylist_init_single()
bcache: Improve priority_stats
bcache: Better alloc tracepoints
bcache: Kill dead cgroup code
bcache: stop moving_gc marking buckets that can't be moved.
bcache: Fix moving_pred()
bcache: Fix moving_gc deadlocking with a foreground write
bcache: Fix discard granularity
bcache: Fix another bug recovering from unclean shutdown
bcache: Fix a bug recovering from unclean shutdown
bcache: Fix a journalling reclaim after recovery bug
bcache: Fix a null ptr deref in journal replay
...
Pull core block layer updates from Jens Axboe:
"This is the pull request for the core block IO bits for the 3.15
kernel. It's a smaller round this time, it contains:
- Various little blk-mq fixes and additions from Christoph and
myself.
- Cleanup of the IPI usage from the block layer, and associated
helper code. From Frederic Weisbecker and Jan Kara.
- Duplicate code cleanup in bio-integrity from Gu Zheng. This will
give you a merge conflict, but that should be easy to resolve.
- blk-mq notify spinlock fix for RT from Mike Galbraith.
- A blktrace partial accounting bug fix from Roman Pen.
- Missing REQ_SYNC detection fix for blk-mq from Shaohua Li"
* 'for-3.15/core' of git://git.kernel.dk/linux-block: (25 commits)
blk-mq: add REQ_SYNC early
rt,blk,mq: Make blk_mq_cpu_notify_lock a raw spinlock
blk-mq: support partial I/O completions
blk-mq: merge blk_mq_insert_request and blk_mq_run_request
blk-mq: remove blk_mq_alloc_rq
blk-mq: don't dump CPU -> hw queue map on driver load
blk-mq: fix wrong usage of hctx->state vs hctx->flags
blk-mq: allow blk_mq_init_commands() to return failure
block: remove old blk_iopoll_enabled variable
blktrace: fix accounting of partially completed requests
smp: Rename __smp_call_function_single() to smp_call_function_single_async()
smp: Remove wait argument from __smp_call_function_single()
watchdog: Simplify a little the IPI call
smp: Move __smp_call_function_single() below its safe version
smp: Consolidate the various smp_call_function_single() declensions
smp: Teach __smp_call_function_single() to check for offline cpus
smp: Remove unused list_head from csd
smp: Iterate functions through llist_for_each_entry_safe()
block: Stop abusing rq->csd.list in blk-softirq
block: Remove useless IPI struct initialization
...
There have been lots of changes in ALSA core, HD-audio and ASoC, also
most of PCI drivers touched by conversions of printks. All these
resulted in a high volume and wide ranged patch sets in this release.
Many changes are fairly trivial, but also lots of nice cleanups and
refactors. There are a few new drivers, most notably, the Intel
Haswell and Baytrail ASoC driver.
Core changes:
- A bit modernization; embed the device struct into snd_card struct,
so that it may be referred from the beginning. A new snd_card_new()
function is introduced for that, and all drivers have been
converted.
- Simplification in the device management code in ALSA core;
now managed by a simple priority list instead
- Converted many kernel messages to use the standard dev_err() & co;
this would be the pretty visible difference, especially for
HD-audio.
HD-audio:
- Conexant codecs use the auto-parser as default now;
the old static code still remains in case of regressions.
Some old quirks have been rewritten with the fixups for auto-parser.
- C-Media codecs also use the auto-parser as default now, too.
- A device struct is assigned to each HD-audio codec, and the formerly
hwdep attributes are accessible over the codec sysfs, too.
hwdep attributes still remain for compatibility.
- Split the PCI-specific stuff for HD-audio controller into a separate
module, ane make a helper module for the generic controller driver.
This is a preliminary change for supporting Tegra HDMI controller in
near future, which slipped from 3.15 merge.
- Device-specific fixes: mute LED support for Lenovo Ideapad,
mic LED fix for HP laptops, more ASUS subwoofer quirks, yet more
Dell laptop headset quirks
- Make the HD-audio codec response a bit more robust
- A few improvements on Realtek ALC282 / 283 about the pop noises
- A couple of Intel HDMI fixes
ASoC:
- Lots of cleanups for enumerations; refactored lots of error prone
original codes to use more modern APIs
- Elimination of the ASoC level wrappers for I2C and SPI moving us
closer to converting to regmap completely and avoiding some
randconfig hassle
- Provide both manually and transparently locked DAPM APIs rather than
a mix of the two fixing some concurrency issues
- Start converting CODEC drivers to use separate bus interface drivers
rather than having them all in one file helping avoid dependency
issues
- DPCM support for Intel Haswell and Bay Trail platforms, lots of
fixes
- Lots of work on improvements for simple-card, DaVinci and the Renesas
rcar drivers.
- New drivers for Analog Devices ADAU1977, TI PCM512x and parts of the
CSR SiRF SoC, TLV320AIC31XXX, Armada 370 DB, Cirrus cs42xx8
- Fixes for the simple-card DAI format DT mess
- DT support for a couple more devices.
- Use of the tdm_slot mapping in a few drivers
Others:
- Support of reset_resume callback for improved S4 in USB-audio driver;
the device with boot quirks have been little tested, which we need
to watch out in this development cycle
- Add PM support for ICE1712 driver (finally!);
it's still pretty partial support, only for M-Audio devices
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJTOpQ/AAoJEGwxgFQ9KSmkTccP/RUxO1Coysvm+N+NUOtzvIgR
O++rMDpsFhBSRtB4YvaxAGWRnI629QBc8YSpebxX+KqrGyDe856abgMHydzXJ6hd
sM6//oaaZ8i0uXXzJza0/HXwIWHup9QcPVyFC4vAQq5mv6OCrH+Pvu7EXc0XbKh7
7B0ic28+AGPJTqV3sOx48AVzMvNnzPsKGriTWRUJ8Q6QnFqYrO0v6K9IH8/4pftg
2cihkA4JcIMNtoTSC/kSB3mXBvAX0ZFKQ2Juj1ukZxtXzaYmt9/ZLYPC+EM5OZrH
Bo4pXrpja38QlFBXYBbxXYWqCXQp+B7CPl8sNXF2rDaRhX8qcFrOP05uqV4wR5HW
AtCZjmhdvcYCbdUOv+Eck/HesRQMlIRKZN6/NAQBBN/WwrMMD1DmCODBke4uyg3+
Akb3yMQ0wXq/iSWRY0t5ejNGz9TV7V4NR1QoJS8+fgDnj1oP5sOAfr/BgZBcUIZ/
np8F77lTqqPqj9kPQnMRBEznvJvQxLRMAMxztCwgEr5520cxqNo02S3hNsOwF0xJ
t1K1iURZCFxNolSz68eSJOaXo7e77HPwiSze+JCRSfG5qJcltOlvgqO7UC/6wYcZ
2Z6tv3nGHzrQPVYR0aeWJXZJG8xIrr1Awg0P+G0ky6gfWCAGd7KBP6kYfuFVlytS
+ztGEe37SxgGGTisoMMh
=Y476
-----END PGP SIGNATURE-----
Merge tag 'sound-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"There have been lots of changes in ALSA core, HD-audio and ASoC, also
most of PCI drivers touched by conversions of printks. All these
resulted in a high volume and wide ranged patch sets in this release.
Many changes are fairly trivial, but also lots of nice cleanups and
refactors. There are a few new drivers, most notably, the Intel
Haswell and Baytrail ASoC driver.
Core changes:
- A bit modernization; embed the device struct into snd_card struct,
so that it may be referred from the beginning. A new
snd_card_new() function is introduced for that, and all drivers
have been converted.
- Simplification in the device management code in ALSA core; now
managed by a simple priority list instead
- Converted many kernel messages to use the standard dev_err() & co;
this would be the pretty visible difference, especially for
HD-audio.
HD-audio:
- Conexant codecs use the auto-parser as default now; the old static
code still remains in case of regressions. Some old quirks have
been rewritten with the fixups for auto-parser.
- C-Media codecs also use the auto-parser as default now, too.
- A device struct is assigned to each HD-audio codec, and the
formerly hwdep attributes are accessible over the codec sysfs, too.
hwdep attributes still remain for compatibility.
- Split the PCI-specific stuff for HD-audio controller into a
separate module, ane make a helper module for the generic
controller driver. This is a preliminary change for supporting
Tegra HDMI controller in near future, which slipped from 3.15
merge.
- Device-specific fixes: mute LED support for Lenovo Ideapad, mic LED
fix for HP laptops, more ASUS subwoofer quirks, yet more Dell
laptop headset quirks
- Make the HD-audio codec response a bit more robust
- A few improvements on Realtek ALC282 / 283 about the pop noises
- A couple of Intel HDMI fixes
ASoC:
- Lots of cleanups for enumerations; refactored lots of error prone
original codes to use more modern APIs
- Elimination of the ASoC level wrappers for I2C and SPI moving us
closer to converting to regmap completely and avoiding some
randconfig hassle
- Provide both manually and transparently locked DAPM APIs rather
than a mix of the two fixing some concurrency issues
- Start converting CODEC drivers to use separate bus interface
drivers rather than having them all in one file helping avoid
dependency issues
- DPCM support for Intel Haswell and Bay Trail platforms, lots of
fixes
- Lots of work on improvements for simple-card, DaVinci and the
Renesas rcar drivers.
- New drivers for Analog Devices ADAU1977, TI PCM512x and parts of
the CSR SiRF SoC, TLV320AIC31XXX, Armada 370 DB, Cirrus cs42xx8
- Fixes for the simple-card DAI format DT mess
- DT support for a couple more devices.
- Use of the tdm_slot mapping in a few drivers
Others:
- Support of reset_resume callback for improved S4 in USB-audio
driver; the device with boot quirks have been little tested, which
we need to watch out in this development cycle
- Add PM support for ICE1712 driver (finally!); it's still pretty
partial support, only for M-Audio devices"
* tag 'sound-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (610 commits)
ALSA: ice1712: Add suspend support for M-Audio ICE1712-based cards
ALSA: ice1712: add suspend support for ICE1712 chip
ALSA: hda - Enable beep for ASUS 1015E
ALSA: asihpi: fix some indenting in snd_card_asihpi_pcm_new()
ALSA: hda - add headset mic detect quirks for three Dell laptops
ASoC: tegra: move AC97 clock handling to the machine driver
ASoC: simple-card: Handle many DAI links
ASoC: simple-card: Add DT documentation for multi-DAI links
ASoC: simple-card: dynamically allocate the DAI link and properties
ASoC: imx-ssi: Add .xlate_tdm_slot_mask() support.
ASoC: fsl-esai: Add .xlate_tdm_slot_mask() support.
ASoC: fsl-utils: Add fsl_asoc_xlate_tdm_slot_mask() support.
ASoC: core: remove the 'of_' prefix of of_xlate_tdm_slot_mask.
ASoC: rcar: subnode tidyup for renesas,rsnd.txt
ASoC: Remove name_prefix unset during DAI link init hack
ALSA: hda - Inform the unexpectedly ignored pins by auto-parser
ASoC: rcar: bugfix: it cares about the non-src case
ARM: bockw: fixup SND_SOC_DAIFMT_CBx_CFx flags
ASoC: pcm: Drop incorrect double/extra frees
ASoC: mfld_machine: Fix compile error
...
- Device PM QoS support for latency tolerance constraints on systems with
hardware interfaces allowing such constraints to be specified. That is
necessary to prevent hardware-driven power management from becoming
overly aggressive on some systems and to prevent power management
features leading to excessive latencies from being used in some cases.
- Consolidation of the handling of ACPI hotplug notifications for device
objects. This causes all device hotplug notifications to go through
the root notify handler (that was executed for all of them anyway
before) that propagates them to individual subsystems, if necessary,
by executing callbacks provided by those subsystems (those callbacks
are associated with struct acpi_device objects during device
enumeration). As a result, the code in question becomes both smaller
in size and more straightforward and all of those changes should not
affect users.
- ACPICA update, including fixes related to the handling of _PRT in cases
when it is broken and the addition of "Windows 2013" to the list of
supported "features" for _OSI (which is necessary to support systems
that work incorrectly or don't even boot without it). Changes from
Bob Moore and Lv Zheng.
- Consolidation of ACPI _OST handling from Jiang Liu.
- ACPI battery and AC fixes allowing unusual system configurations to
be handled by that code from Alexander Mezin.
- New device IDs for the ACPI LPSS driver from Chiau Ee Chew.
- ACPI fan and thermal optimizations related to system suspend and resume
from Aaron Lu.
- Cleanups related to ACPI video from Jean Delvare.
- Assorted ACPI fixes and cleanups from Al Stone, Hanjun Guo, Lan Tianyu,
Paul Bolle, Tomasz Nowicki.
- Intel RAPL (Running Average Power Limits) driver cleanups from Jacob Pan.
- intel_pstate fixes and cleanups from Dirk Brandewie.
- cpufreq fixes related to system suspend/resume handling from Viresh Kumar.
- cpufreq core fixes and cleanups from Viresh Kumar, Stratos Karafotis,
Saravana Kannan, Rashika Kheria, Joe Perches.
- cpufreq drivers updates from Viresh Kumar, Zhuoyu Zhang, Rob Herring.
- cpuidle fixes related to the menu governor from Tuukka Tikkanen.
- cpuidle fix related to coupled CPUs handling from Paul Burton.
- Asynchronous execution of all device suspend and resume callbacks,
except for ->prepare and ->complete, during system suspend and resume
from Chuansheng Liu.
- Delayed resuming of runtime-suspended devices during system suspend for
the PCI bus type and ACPI PM domain.
- New set of PM helper routines to allow device runtime PM callbacks to
be used during system suspend and resume more easily from Ulf Hansson.
- Assorted fixes and cleanups in the PM core from Geert Uytterhoeven,
Prabhakar Lad, Philipp Zabel, Rashika Kheria, Sebastian Capella.
- devfreq fix from Saravana Kannan.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJTLgB1AAoJEILEb/54YlRxfs4P/35fIu9h8ClNWUPXqi3nlGIt
yMyumKvF1VdsOKLbjTtFq6B3UOlhqDijYTCQd7Xt7X8ONTk/ND9ec2t/5xGkSdUI
q46fa0qZXeqUn0Kt2t+kl6tgVQOkDj94aNlEh+7Ya3Uu6WYDDfmZtOBOFAMk6D8l
ND4rHJpX+eUsRLBrcxaUxxdD8AW5guGcPKyeyzsXv1bY1BZnpLFrZ3PhuI5dn2CL
L/zmk3A+wG6+ZlQxnwDdrKa3E6uhRSIDeF0vI4Byspa1wi5zXknJG2J7MoQ9JEE9
VQpBXlqach5wgXqJ8PAqAeaB6Ie26/F7PYG8r446zKw/5UUtdNUx+0dkjQ7Mz8Tu
ajuVxfwrrPhZeQqmVBxlH5Gg7Ez2KBKEfDxTdRnzI7FoA7PE5XDcg3kO64bhj8LJ
yugnV/ToU9wMztZnPC7CoGPwUgxMJvr9LwmxS4aeKcVUBES05eg0vS3lwdZMgqkV
iO0QkWTmhZ952qZCqZxbh0JqaaX8Wgx2kpX2tf1G2GJqLMZco289bLh6njNT+8CH
EzdQKYYyn6G6+Qg2M0f/6So3qU17x9XtE4ZBWQdGDpqYOGZhjZAOs/VnB1Ysw/K3
cDBzswlJd0CyyUps9B+qbf49OpbWVwl5kKeuHUuPxugEVryhpSp9AuG+tNil74Sj
JuGTGR4fyFjDBX5cvAPm
=ywR6
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management updates from Rafael Wysocki:
"The majority of this material spent some time in linux-next, some of
it even several weeks. There are a few relatively fresh commits in
it, but they are mostly fixes and simple cleanups.
ACPI took the lead this time, both in terms of the number of commits
and the number of modified lines of code, cpufreq follows and there
are a few changes in the PM core and in cpuidle too.
A new feature that already got some LWN.net's attention is the device
PM QoS extension allowing latency tolerance requirements to be
propagated from leaf devices to their ancestors with hardware
interfaces for specifying latency tolerance. That should help systems
with hardware-driven power management to avoid going too far with it
in cases when there are latency tolerance constraints.
There also are some significant changes in the ACPI core related to
the way in which hotplug notifications are handled. They affect PCI
hotplug (ACPIPHP) and the ACPI dock station code too. The bottom line
is that all those notification now go through the root notify handler
and are propagated to the interested subsystems by means of callbacks
instead of having to install a notify handler for each device object
that we can potentially get hotplug notifications for.
In addition to that ACPICA will now advertise "Windows 2013"
compatibility for _OSI, because some systems out there don't work
correctly if that is not done (some of them don't even boot).
On the system suspend side of things, all of the device suspend and
resume callbacks, except for ->prepare() and ->complete(), are now
going to be executed asynchronously as that turns out to speed up
system suspend and resume on some platforms quite significantly and we
have a few more optimizations in that area.
Apart from that, there are some new device IDs and fixes and cleanups
all over. In particular, the system suspend and resume handling by
cpufreq should be improved and the cpuidle menu governor should be a
bit more robust now.
Specifics:
- Device PM QoS support for latency tolerance constraints on systems
with hardware interfaces allowing such constraints to be specified.
That is necessary to prevent hardware-driven power management from
becoming overly aggressive on some systems and to prevent power
management features leading to excessive latencies from being used
in some cases.
- Consolidation of the handling of ACPI hotplug notifications for
device objects. This causes all device hotplug notifications to go
through the root notify handler (that was executed for all of them
anyway before) that propagates them to individual subsystems, if
necessary, by executing callbacks provided by those subsystems
(those callbacks are associated with struct acpi_device objects
during device enumeration). As a result, the code in question
becomes both smaller in size and more straightforward and all of
those changes should not affect users.
- ACPICA update, including fixes related to the handling of _PRT in
cases when it is broken and the addition of "Windows 2013" to the
list of supported "features" for _OSI (which is necessary to
support systems that work incorrectly or don't even boot without
it). Changes from Bob Moore and Lv Zheng.
- Consolidation of ACPI _OST handling from Jiang Liu.
- ACPI battery and AC fixes allowing unusual system configurations to
be handled by that code from Alexander Mezin.
- New device IDs for the ACPI LPSS driver from Chiau Ee Chew.
- ACPI fan and thermal optimizations related to system suspend and
resume from Aaron Lu.
- Cleanups related to ACPI video from Jean Delvare.
- Assorted ACPI fixes and cleanups from Al Stone, Hanjun Guo, Lan
Tianyu, Paul Bolle, Tomasz Nowicki.
- Intel RAPL (Running Average Power Limits) driver cleanups from
Jacob Pan.
- intel_pstate fixes and cleanups from Dirk Brandewie.
- cpufreq fixes related to system suspend/resume handling from Viresh
Kumar.
- cpufreq core fixes and cleanups from Viresh Kumar, Stratos
Karafotis, Saravana Kannan, Rashika Kheria, Joe Perches.
- cpufreq drivers updates from Viresh Kumar, Zhuoyu Zhang, Rob
Herring.
- cpuidle fixes related to the menu governor from Tuukka Tikkanen.
- cpuidle fix related to coupled CPUs handling from Paul Burton.
- Asynchronous execution of all device suspend and resume callbacks,
except for ->prepare and ->complete, during system suspend and
resume from Chuansheng Liu.
- Delayed resuming of runtime-suspended devices during system suspend
for the PCI bus type and ACPI PM domain.
- New set of PM helper routines to allow device runtime PM callbacks
to be used during system suspend and resume more easily from Ulf
Hansson.
- Assorted fixes and cleanups in the PM core from Geert Uytterhoeven,
Prabhakar Lad, Philipp Zabel, Rashika Kheria, Sebastian Capella.
- devfreq fix from Saravana Kannan"
* tag 'pm+acpi-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (162 commits)
PM / devfreq: Rewrite devfreq_update_status() to fix multiple bugs
PM / sleep: Correct whitespace errors in <linux/pm.h>
intel_pstate: Set core to min P state during core offline
cpufreq: Add stop CPU callback to cpufreq_driver interface
cpufreq: Remove unnecessary braces
cpufreq: Fix checkpatch errors and warnings
cpufreq: powerpc: add cpufreq transition latency for FSL e500mc SoCs
MAINTAINERS: Reorder maintainer addresses for PM and ACPI
PM / Runtime: Update runtime_idle() documentation for return value meaning
video / output: Drop display output class support
fujitsu-laptop: Drop unneeded include
acer-wmi: Stop selecting VIDEO_OUTPUT_CONTROL
ACPI / gpu / drm: Stop selecting VIDEO_OUTPUT_CONTROL
ACPI / video: fix ACPI_VIDEO dependencies
cpufreq: remove unused notifier: CPUFREQ_{SUSPENDCHANGE|RESUMECHANGE}
cpufreq: Do not allow ->setpolicy drivers to provide ->target
cpufreq: arm_big_little: set 'physical_cluster' for each CPU
cpufreq: arm_big_little: make vexpress driver depend on bL core driver
ACPI / button: Add ACPI Button event via netlink routine
ACPI: Remove duplicate definitions of PREFIX
...
The packet hash can be considered a property of the packet, not just
on RX path.
This patch changes name of rxhash and l4_rxhash skbuff fields to be
hash and l4_hash respectively. This includes changing uses of the
field in the code which don't call the access functions.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quite a busy release for ASoC this time, more on janitorial work than
exciting new features but welcome nontheless:
- Lots of cleanups from Takashi for enumerations; the original API for
these was error prone so he's refactored lots of code to use more
modern APIs which avoid issues.
- Elimination of the ASoC level wrappers for I2C and SPI moving us
closer to converting to regmap completely and avoiding some
randconfig hassle.
- Provide both manually and transparently locked DAPM APIs rather than
a mix of the two fixing some concurrency issues.
- Start converting CODEC drivers to use separate bus interface drivers
rather than having them all in one file helping avoid dependency
issues.
- DPCM support for Intel Haswell and Bay Trail platforms.
- Lots of work on improvements for simple-card, DaVinci and the Renesas
rcar drivers.
- New drivers for Analog Devices ADAU1977, TI PCM512x and parts of the
CSR SiRF SoC.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTIOhJAAoJELSic+t+oim90CoP/3CVTm9cWv1qhPSU6jjn6RJG
/djmhntJfHd/GXo+0TiiwNK9WmZjFrJUr+5ofkDTCqSzFz1Suc90B6oHxY4dFbgF
IyIpTexGwTLv3H6yDjadYAfmGDSsE9sM2dkID9oXy6aEzjNby/a1VEiBnRgx16X1
YGvMVK8AGFn/AyC/zOV6EcKJxUjdDogqZ5wkR2XHzwDoYjl9ufxK9BnSIygYABOW
ABAjyrZf3xx97AH82BB6iqcZMh5GxGNTvI3hQd/vjx0r7RFUDNLqmF2cPZAMTRW/
bXWxVmtNHie1+lCldyMFm8pV/Pv09zuqDAQKbPY2TeHj2zF8CM548NlkFHqwHlp0
S9K5E1N+/2wcXMjQa1wBELohUdl6dVh1OFOAz7M8o0TJdSOZyR6PJ9r0NprP8NgS
67FBU+ZqnWIK159m9rKkFfPhnaDuDzk+rpwyK0fQxQgpdGGjLyv7OK3GhS30oTnA
Z2GjEyUySM1BcEEWAtfUD5fHbjN28e1Icn53q5q4JK4gvx4DXBy08uY/vumvjXjO
8oum3q3RjRvqIhzMrJoVgs+c8RHwS/bZQhlu9Q3qNTsDNDyMnaZWHFAnP8RDqHjv
ojZiMJkJdpqceZ3z1k5ZG8GWJ2JaZBikSbeNk2Ltg17/0nackq2r8ekrIoEUPVk2
ph4DJNC2s1qCFtx7tzQj
=C5oo
-----END PGP SIGNATURE-----
Merge tag 'asoc-v3.15' into asoc-next
ASoC: Updates for v3.15
Quite a busy release for ASoC this time, more on janitorial work than
exciting new features but welcome nontheless:
- Lots of cleanups from Takashi for enumerations; the original API for
these was error prone so he's refactored lots of code to use more
modern APIs which avoid issues.
- Elimination of the ASoC level wrappers for I2C and SPI moving us
closer to converting to regmap completely and avoiding some
randconfig hassle.
- Provide both manually and transparently locked DAPM APIs rather than
a mix of the two fixing some concurrency issues.
- Start converting CODEC drivers to use separate bus interface drivers
rather than having them all in one file helping avoid dependency
issues.
- DPCM support for Intel Haswell and Bay Trail platforms.
- Lots of work on improvements for simple-card, DaVinci and the Renesas
rcar drivers.
- New drivers for Analog Devices ADAU1977, TI PCM512x and parts of the
CSR SiRF SoC.
# gpg: Signature made Wed 12 Mar 2014 23:05:45 GMT using RSA key ID 7EA229BD
# gpg: Good signature from "Mark Brown <broonie@sirena.org.uk>"
# gpg: aka "Mark Brown <broonie@debian.org>"
# gpg: aka "Mark Brown <broonie@kernel.org>"
# gpg: aka "Mark Brown <broonie@tardis.ed.ac.uk>"
# gpg: aka "Mark Brown <broonie@linaro.org>"
# gpg: aka "Mark Brown <Mark.Brown@linaro.org>"
I originally wrote commit 35bb4399bd to shrink the size of the overhead of
tracepoints by several kilobytes. Later, I received a patch from Vaibhav
Nagarnaik that fixed a bug in the same code that this commit touches. Not
only did it fix a bug, it also removed code and shrunk the size of the
overhead of trace events even more than this commit did.
Since this commit is scheduled for 3.15 and Vaibhav's patch is already in
mainline, I need to revert this patch in order to keep it from conflicting
with Vaibhav's patch. Not to mention, Vaibhav's patch makes this patch
obsolete.
Link: http://lkml.kernel.org/r/20140320225637.0226041b@gandalf.local.home
Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Since most of the btrfs_workqueue is printed as pointer address,
for easier analysis, add trace for btrfs_workqueue alloc/destroy.
So it is possible to determine the workqueue that a given work belongs
to(by comparing the wq pointer address with alloc trace event).
Signed-off-by: Qu Wenruo <quenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
In event format strings, the array size is reported in two locations.
One in array subscript and then via the "size:" attribute. The values
reported there have a mismatch.
For e.g., in sched:sched_switch the prev_comm and next_comm character
arrays have subscript values as [32] where as the actual field size is
16.
name: sched_switch
ID: 301
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1;signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:char prev_comm[32]; offset:8; size:16; signed:1;
field:pid_t prev_pid; offset:24; size:4; signed:1;
field:int prev_prio; offset:28; size:4; signed:1;
field:long prev_state; offset:32; size:8; signed:1;
field:char next_comm[32]; offset:40; size:16; signed:1;
field:pid_t next_pid; offset:56; size:4; signed:1;
field:int next_prio; offset:60; size:4; signed:1;
After bisection, the following commit was blamed:
92edca0 tracing: Use direct field, type and system names
This commit removes the duplication of strings for field->name and
field->type assuming that all the strings passed in
__trace_define_field() are immutable. This is not true for arrays, where
the type string is created in event_storage variable and field->type for
all array fields points to event_storage.
Use __stringify() to create a string constant for the type string.
Also, get rid of event_storage and event_storage_mutex that are not
needed anymore.
also, an added benefit is that this reduces the overhead of events a bit more:
text data bss dec hex filename
8424787 2036472 1302528 11763787 b3804b vmlinux
8420814 2036408 1302528 11759750 b37086 vmlinux.patched
Link: http://lkml.kernel.org/r/1392349908-29685-1-git-send-email-vnagarnaik@google.com
Cc: Laurent Chavey <chavey@google.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Introduce new FALLOC_FL_ZERO_RANGE flag for fallocate. This has the same
functionality as xfs ioctl XFS_IOC_ZERO_RANGE.
It can be used to convert a range of file to zeros preferably without
issuing data IO. Blocks should be preallocated for the regions that span
holes in the file, and the entire range is preferable converted to
unwritten extents
This can be also used to preallocate blocks past EOF in the same way as
with fallocate. Flag FALLOC_FL_KEEP_SIZE which should cause the inode
size to remain the same.
Also add appropriate tracepoints.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This was originally added as at optimization that for various reasons isn't
needed anymore, but it does add a lot of nasty corner cases (and it was
responsible for some recently fixed bugs). Just get rid of it now.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Change the invalidate tracepoint to indicate how much data we're invalidating,
and change the alloc tracepoints to indicate what offset they're for.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
The SMBUS tracepoints can be enabled thusly:
echo 1 >/sys/kernel/debug/tracing/events/i2c/enable
and will dump messages that can be viewed in /sys/kernel/debug/tracing/trace
that look like:
... smbus_read: i2c-0 a=051 f=0000 c=fa BYTE_DATA
... smbus_reply: i2c-0 a=051 f=0000 c=fa BYTE_DATA l=1 [39]
... smbus_result: i2c-0 a=051 f=0000 c=fa BYTE_DATA rd res=0
formatted as:
i2c-<adapter-nr>
a=<addr>
f=<flags>
c=<command>
<protocol-name>
<rd|wr>
res=<result>
l=<data-len>
[<data-block>]
The adapters to be traced can be selected by something like:
echo adapter_nr==1 >/sys/kernel/debug/tracing/events/i2c/filter
Note that this shares the same filter and enablement as i2c.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Add tracepoints into the I2C message transfer function to retrieve the message
sent or received. The following config options must be turned on to make use
of the facility:
CONFIG_FTRACE
CONFIG_ENABLE_DEFAULT_TRACERS
The I2C tracepoint can be enabled thusly:
echo 1 >/sys/kernel/debug/tracing/events/i2c/enable
and will dump messages that can be viewed in /sys/kernel/debug/tracing/trace
that look like:
... i2c_write: i2c-5 #0 a=044 f=0000 l=2 [02-14]
... i2c_read: i2c-5 #1 a=044 f=0001 l=4
... i2c_reply: i2c-5 #1 a=044 f=0001 l=4 [33-00-00-00]
... i2c_result: i2c-5 n=2 ret=2
formatted as:
i2c-<adapter-nr>
#<message-array-index>
a=<addr>
f=<flags>
l=<datalen>
n=<message-array-size>
ret=<result>
[<data>]
The operation is done between the i2c_write/i2c_read lines and the i2c_reply
and i2c_result lines so that if the hardware hangs, the trace buffer can be
consulted to determine the problematic operation.
The adapters to be traced can be selected by something like:
echo adapter_nr==1 >/sys/kernel/debug/tracing/events/i2c/filter
These changes are based on code from Steven Rostedt.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
[wsa: adapted path for 'enable' in the commit msg]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Users have reported being unable to trace non-signed modules loaded
within a kernel supporting module signature.
This is caused by tracepoint.c:tracepoint_module_coming() refusing to
take into account tracepoints sitting within force-loaded modules
(TAINT_FORCED_MODULE). The reason for this check, in the first place, is
that a force-loaded module may have a struct module incompatible with
the layout expected by the kernel, and can thus cause a kernel crash
upon forced load of that module on a kernel with CONFIG_TRACEPOINTS=y.
Tracepoints, however, specifically accept TAINT_OOT_MODULE and
TAINT_CRAP, since those modules do not lead to the "very likely system
crash" issue cited above for force-loaded modules.
With kernels having CONFIG_MODULE_SIG=y (signed modules), a non-signed
module is tainted re-using the TAINT_FORCED_MODULE taint flag.
Unfortunately, this means that Tracepoints treat that module as a
force-loaded module, and thus silently refuse to consider any tracepoint
within this module.
Since an unsigned module does not fit within the "very likely system
crash" category of tainting, add a new TAINT_UNSIGNED_MODULE taint flag
to specifically address this taint behavior, and accept those modules
within Tracepoints. We use the letter 'X' as a taint flag character for
a module being loaded that doesn't know how to sign its name (proposed
by Steven Rostedt).
Also add the missing 'O' entry to trace event show_module_flags() list
for the sake of completeness.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
NAKed-by: Ingo Molnar <mingo@redhat.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: David Howells <dhowells@redhat.com>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Add ftrace for btrfs_workqueue for further workqueue tunning.
This patch needs to applied after the workqueue replace patchset.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Highlights include:
- Fix another nfs4_sequence corruptor in RELEASE_LOCKOWNER
- Fix an Oopsable delegation callback race
- Fix another bad stateid infinite loop
- Fail the data server I/O is the stateid represents a lost lock
- Fix an Oopsable sunrpc trace event
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTHJSVAAoJEGcL54qWCgDyVRkP/2t43gjMF6P+Yc7VUW2e5uTv
rHhPGFLuDVs9oS3WUYegzvThZMs//ovTaYgUSDNpOYztEB6P8bDRm41q/VgUIixY
zWFoEplDgAZZE7gP2EJuXJv3bEdhJqXuCG2KUysqMsaIGlahrlQdHmqGTz6Y931o
WROyMWVvnL4IoEtQHVR7DwyqkvSmifPJ8MZZv3Liy82wuw1fCsh8uy8mkYYSbdvN
OK4JmHqdJ+CbAZ0WmE4Xe3Itqy/aIMBL9Jyrq4Zl1QX0p7ez3Xpy4XwmtlZXn2KP
bKMfK2vP9RggagIpjUL+dhCqxlsyjlF6EzTnQRe7jXqlJ/vJ9pQF8X294jwRysfp
80jDqsTSND4JQiZuBISID23N1nL0TzrP2tWqipR9zx5JJMRVzYZWTzEq4w2uAHgg
aW2vTdRNRLZWydlfFNQ8FiuEPIFoQaJFmOCQisec2LtfffLZZBz7JPofjNH9CgU8
mcbPhv75m2imXDOylydiVoD4x/myCGheYw2hpqhb1ZeuQxdN9lnwa0JzjPiP1h38
XIYwzM7TE8WayrdkMDCeIem1dz/VexknfKmXmFXlMfn3GRKxowCSrggxKG92k0eP
L35cJj91a9AoxMz/ej0erv0iI1flLeoYP9aJzIRtZf+SB1BZkKhmWlFRQKqnlIOA
BzjYui4mUoEQEa5Sk7Th
=JfQx
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.14-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
- Fix another nfs4_sequence corruptor in RELEASE_LOCKOWNER
- Fix an Oopsable delegation callback race
- Fix another bad stateid infinite loop
- Fail the data server I/O is the stateid represents a lost lock
- Fix an Oopsable sunrpc trace event"
* tag 'nfs-for-3.14-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: Fix oops when trace sunrpc_task events in nfs client
NFSv4: Fail the truncate() if the lock/open stateid is invalid
NFSv4.1 Fail data server I/O if stateid represents a lost lock
NFSv4: Fix the return value of nfs4_select_rw_stateid
NFSv4: nfs4_stateid_is_current should return 'true' for an invalid stateid
NFS: Fix a delegation callback race
NFSv4: Fix another nfs4_sequence corruptor
Use a temporary variable to store the expansion of the len expression.
If the evaluation is expensive, this commit will ensure it is evaluated
only once inside ftrace_get_offsets_<call>.
Link: http://lkml.kernel.org/r/1393651938-16418-3-git-send-email-filbranden@google.com
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This fixes expansion of the len argument in __dynamic_array macros.
The previous code from commit 7d536cb3f would not fully evaluate the
expression before multiplying its result by the size of the type.
This went unnoticed because the length stored in the high 16 bits of the
offset (which is the one that was broken here) is only used by
filter_pred_strloc which only acts on strings for which the size of the
type is 1.
Link: http://lkml.kernel.org/r/1393651938-16418-2-git-send-email-filbranden@google.com
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The trace event headers are required to include tracepoint.h. The only reason
they worked now is because module.h included tracepoint.h, and that will soon
change.
Link: http://lkml.kernel.org/r/20140226190644.591040764@goodmis.org
Fixes: 7b2a2d4a18 "mm: migrate: Add a tracepoint for migrate_pages"
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The trace event headers are required to include tracepoint.h. The only reason
they worked now is because module.h included tracepoint.h, and that will soon
change.
Link: http://lkml.kernel.org/r/20140226190644.442886305@goodmis.org
Fixes: 455b286468 "writeback: Initial tracing support"
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The functions that assign the contents for the ftrace events are
defined by the TRACE_EVENT() macros. Each event has its own unique
way to assign data to its buffer. When you have over 500 events,
that means there's 500 functions assigning data uniquely for each
event (not really that many, as DECLARE_EVENT_CLASS() and multiple
DEFINE_EVENT()s will only need a single function).
By making helper functions in the core kernel to do some of the work
instead, we can shrink the size of the kernel down a bit.
With a kernel configured with 502 events, the change in size was:
text data bss dec hex filename
12987390 1913504 9785344 24686238 178ae9e /tmp/vmlinux
12959102 1913504 9785344 24657950 178401e /tmp/vmlinux.patched
That's a total of 28288 bytes, which comes down to 56 bytes per event.
Link: http://lkml.kernel.org/r/20120810034708.370808175@goodmis.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The code that shows array fields for events is defined for all events.
This can add up quite a bit when you have over 500 events.
By making helper functions in the core kernel to do the work
instead, we can shrink the size of the kernel down a bit.
With a kernel configured with 502 events, the change in size was:
text data bss dec hex filename
12990946 1913568 9785344 24689858 178bcc2 /tmp/vmlinux
12987390 1913504 9785344 24686238 178ae9e /tmp/vmlinux.patched
That's a total of 3556 bytes, which comes down to 7 bytes per event.
Although it's not much, this code is just called at initialization of
the events.
Link: http://lkml.kernel.org/r/20120810034708.084036335@goodmis.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The code for trace events to format the raw recorded event data
into human readable format in the 'trace' file is repeated for every
event in the system. When you have over 500 events, this can add up
quite a bit.
By making helper functions in the core kernel to do the work
instead, we can shrink the size of the kernel down a bit.
With a kernel configured with 502 events, the change in size was:
text data bss dec hex filename
12991007 1913568 9785344 24689919 178bcff /tmp/vmlinux.orig
12990946 1913568 9785344 24689858 178bcc2 /tmp/vmlinux.patched
Note, this version does not save as much as the version of this patch
I had a few years ago. That is because in the mean time, commit
f71130de5c ("tracing: Add a helper function for event print functions")
did a lot of the work my original patch did. But this change helps
slightly, and is part of a larger clean up to reduce the size much further.
Link: http://lkml.kernel.org/r/20120810034707.378538034@goodmis.org
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
trace_block_rq_complete does not take into account that request can
be partially completed, so we can get the following incorrect output
of blkparser:
C R 232 + 240 [0]
C R 240 + 232 [0]
C R 248 + 224 [0]
C R 256 + 216 [0]
but should be:
C R 232 + 8 [0]
C R 240 + 8 [0]
C R 248 + 8 [0]
C R 256 + 8 [0]
Also, the whole output summary statistics of completed requests and
final throughput will be incorrect.
This patch takes into account real completion size of the request and
fixes wrong completion accounting.
Signed-off-by: Roman Pen <r.peniaev@gmail.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: linux-kernel@vger.kernel.org
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Add new V4L2 stream format definition, V4L2_BUF_TYPE_SDR_CAPTURE,
for SDR receiver.
Signed-off-by: Antti Palosaari <crope@iki.fi>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
This patch implements fallocate's FALLOC_FL_COLLAPSE_RANGE for Ext4.
The semantics of this flag are following:
1) It collapses the range lying between offset and length by removing any data
blocks which are present in this range and than updates all the logical
offsets of extents beyond "offset + len" to nullify the hole created by
removing blocks. In short, it does not leave a hole.
2) It should be used exclusively. No other fallocate flag in combination.
3) Offset and length supplied to fallocate should be fs block size aligned
in case of xfs and ext4.
4) Collaspe range does not work beyond i_size.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Tested-by: Dongsu Park <dongsu.park@profitbricks.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This reverts commit c4a391b53a. Dave
Chinner <david@fromorbit.com> has reported the commit may cause some
inodes to be left out from sync(2). This is because we can call
redirty_tail() for some inode (which sets i_dirtied_when to current time)
after sync(2) has started or similarly requeue_inode() can set
i_dirtied_when to current time if writeback had to skip some pages. The
real problem is in the functions clobbering i_dirtied_when but fixing
that isn't trivial so revert is a safer choice for now.
CC: stable@vger.kernel.org # >= 3.13
Signed-off-by: Jan Kara <jack@suse.cz>
Remove the reporting of energy since it does not provide any useful
information about the state of the driver and will be a maintainance
headache going forward since the RAPL energy units register is not
architectural and subject to change between micro-architectures
References: https://bugzilla.kernel.org/show_bug.cgi?id=69831
Fixes: b69880f9cc (intel_pstate: Add trace point to report internal state.)
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rename symbols, variables, functions and structure fields related do
the resume latency device PM QoS type so that it is clear where they
belong (in particular, to avoid confusion with the latency tolerance
device PM QoS type introduced by a subsequent changeset).
Update the PM QoS documentation to better reflect its current state.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull media updates from Mauro Carvalho Chehab:
- a new jpeg codec driver for Samsung Exynos (jpeg-hw-exynos4)
- a new dvb frontend for ds2103 chipset (m88ds2103)
- a new sensor driver for Samsung S5K5BAF UXGA (s5k5baf)
- new drivers for R-Car VSP1
- a new radio driver: radio-raremono
- a new tuner driver for ts2022 chipset (m88ts2022)
- the analog part of em28xx is now a separate module that only
load/runs if the device is not a pure digital TV device
- added a staging driver for bcm2048 radio devices
- the omap 2 video driver (omap24xx) was moved to staging. This driver
is for an old hardware and uses a deprecated Kernel internal API. If
nobody cares enough to fix it, it would be removed on a couple Kernel
releases
- the sn9c102 driver was moved to staging. This driver was replaced by
gspca, and disabled on some distros, as almost all devices are known
to work properly with gspca. It should be removed from kernel on a
couple Kernel releases
- lots of driver fixes, improvements and cleanups
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (421 commits)
[media] media: v4l2-dev: fix video device index assignment
[media] rc-core: reuse device numbers
[media] em28xx-cards: properly initialize the device bitmap
[media] Staging: media: Fix line length exceeding 80 characters in as102_drv.c
[media] Staging: media: Fix line length exceeding 80 characters in as102_fe.c
[media] Staging: media: Fix quoted string split across line in as102_fe.c
[media] media: st-rc: Add reset support
[media] m2m-deinterlace: fix allocated struct type
[media] radio-usb-si4713: fix sparse non static symbol warnings
[media] em28xx-audio: remove needless check before usb_free_coherent()
[media] au0828: Fix sparse non static symbol warning
Revert "[media] go7007-usb: only use go->dev after allocated"
[media] em28xx-audio: provide an error code when URB submit fails
[media] em28xx: fix check for audio only usb interfaces when changing the usb alternate setting
[media] em28xx: fix usb alternate setting for analog and digital video endpoints > 0
[media] em28xx: make 'em28xx_ctrl_ops' static
em28xx-alsa: Fix error patch for init/fini
[media] em28xx-audio: flush work at .fini
[media] drxk: remove the option to load firmware asynchronously
[media] em28xx: adjust period size at runtime
...
Pull btrfs updates from Chris Mason:
"This is a pretty big pull, and most of these changes have been
floating in btrfs-next for a long time. Filipe's properties work is a
cool building block for inheriting attributes like compression down on
a per inode basis.
Jeff Mahoney kicked in code to export filesystem info into sysfs.
Otherwise, lots of performance improvements, cleanups and bug fixes.
Looks like there are still a few other small pending incrementals, but
I wanted to get the bulk of this in first"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (149 commits)
Btrfs: fix spin_unlock in check_ref_cleanup
Btrfs: setup inode location during btrfs_init_inode_locked
Btrfs: don't use ram_bytes for uncompressed inline items
Btrfs: fix btrfs_search_slot_for_read backwards iteration
Btrfs: do not export ulist functions
Btrfs: rework ulist with list+rb_tree
Btrfs: fix memory leaks on walking backrefs failure
Btrfs: fix send file hole detection leading to data corruption
Btrfs: add a reschedule point in btrfs_find_all_roots()
Btrfs: make send's file extent item search more efficient
Btrfs: fix to catch all errors when resolving indirect ref
Btrfs: fix protection between walking backrefs and root deletion
btrfs: fix warning while merging two adjacent extents
Btrfs: fix infinite path build loops in incremental send
btrfs: undo sysfs when open_ctree() fails
Btrfs: fix snprintf usage by send's gen_unique_name
btrfs: fix defrag 32-bit integer overflow
btrfs: sysfs: list the NO_HOLES feature
btrfs: sysfs: don't show reserved incompat feature
btrfs: call permission checks earlier in ioctls and return EPERM
...
Pull block IO driver changes from Jens Axboe:
- bcache update from Kent Overstreet.
- two bcache fixes from Nicholas Swenson.
- cciss pci init error fix from Andrew.
- underflow fix in the parallel IDE pg_write code from Dan Carpenter.
I'm sure the 1 (or 0) users of that are now happy.
- two PCI related fixes for sx8 from Jingoo Han.
- floppy init fix for first block read from Jiri Kosina.
- pktcdvd error return miss fix from Julia Lawall.
- removal of IRQF_SHARED from the SEGA Dreamcast CD-ROM code from
Michael Opdenacker.
- comment typo fix for the loop driver from Olaf Hering.
- potential oops fix for null_blk from Raghavendra K T.
- two fixes from Sam Bradshaw (Micron) for the mtip32xx driver, fixing
an OOM problem and a problem with handling security locked conditions
* 'for-3.14/drivers' of git://git.kernel.dk/linux-block: (47 commits)
mg_disk: Spelling s/finised/finished/
null_blk: Null pointer deference problem in alloc_page_buffers
mtip32xx: Correctly handle security locked condition
mtip32xx: Make SGL container per-command to eliminate high order dma allocation
drivers/block/loop.c: fix comment typo in loop_config_discard
drivers/block/cciss.c:cciss_init_one(): use proper errnos
drivers/block/paride/pg.c: underflow bug in pg_write()
drivers/block/sx8.c: remove unnecessary pci_set_drvdata()
drivers/block/sx8.c: use module_pci_driver()
floppy: bail out in open() if drive is not responding to block0 read
bcache: Fix auxiliary search trees for key size > cacheline size
bcache: Don't return -EINTR when insert finished
bcache: Improve bucket_prio() calculation
bcache: Add bch_bkey_equal_header()
bcache: update bch_bkey_try_merge
bcache: Move insert_fixup() to btree_keys_ops
bcache: Convert sorting to btree_keys
bcache: Convert debug code to btree_keys
bcache: Convert btree_iter to struct btree_keys
bcache: Refactor bset_tree sysfs stats
...
Pull core block IO changes from Jens Axboe:
"The major piece in here is the immutable bio_ve series from Kent, the
rest is fairly minor. It was supposed to go in last round, but
various issues pushed it to this release instead. The pull request
contains:
- Various smaller blk-mq fixes from different folks. Nothing major
here, just minor fixes and cleanups.
- Fix for a memory leak in the error path in the block ioctl code
from Christian Engelmayer.
- Header export fix from CaiZhiyong.
- Finally the immutable biovec changes from Kent Overstreet. This
enables some nice future work on making arbitrarily sized bios
possible, and splitting more efficient. Related fixes to immutable
bio_vecs:
- dm-cache immutable fixup from Mike Snitzer.
- btrfs immutable fixup from Muthu Kumar.
- bio-integrity fix from Nic Bellinger, which is also going to stable"
* 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
xtensa: fixup simdisk driver to work with immutable bio_vecs
block/blk-mq-cpu.c: use hotcpu_notifier()
blk-mq: for_each_* macro correctness
block: Fix memory leak in rw_copy_check_uvector() handling
bio-integrity: Fix bio_integrity_verify segment start bug
block: remove unrelated header files and export symbol
blk-mq: uses page->list incorrectly
blk-mq: use __smp_call_function_single directly
btrfs: fix missing increment of bi_remaining
Revert "block: Warn and free bio if bi_end_io is not set"
block: Warn and free bio if bi_end_io is not set
blk-mq: fix initializing request's start time
block: blk-mq: don't export blk_mq_free_queue()
block: blk-mq: make blk_sync_queue support mq
block: blk-mq: support draining mq queue
dm cache: increment bi_remaining when bi_end_io is restored
block: fixup for generic bio chaining
block: Really silence spurious compiler warnings
block: Silence spurious compiler warnings
block: Kill bio_pair_split()
...
Flag BTRFS_ORDERED_TRUNCATED is a new one, update the tracepoint to
support it.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <clm@fb.com>
We use set_bit() to assign ordered extent's flags, but in the related
tracepoint we don't do the same thing, which makes the trace output
not to parse flags correctly.
Also, since the flags are bits stuff, we change to use __print_flags with
a 'delim' instead of __print_symbolic.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <clm@fb.com>
Highlights include:
- Stable fix for an infinite loop in RPC state machine
- Stable fix for a use after free situation in the NFSv4 trunking discovery
- Stable fix for error handling in the NFSv4 trunking discovery
- Stable fix for the page write update code
- Stable fix for the NFSv4.1 mount time security negotiation
- Stable fix for the NFSv4 open code.
- O_DIRECT locking fixes
- fix an Oops in the pnfs file commit code
- RPC layer needs finer grained handling of connection errors
- More RPC GSS upcall fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJS5ozQAAoJEGcL54qWCgDy8EIQAMKYX1E5qOal3oJCzWdHAPNz
ZSQ7CbA3c66vgJwpxy5Mz4gEtTK1IEzfTX31gLgkCXkyw54As+0lOa/SvoXFUusN
BdBtskkIcVjhcly56xP2dzWGMsVrS8Vt+nwhsPv1Qaor5El0zXwPv8YE5PuuxJK5
fyQdFEsywnCHtmFdyBdzsV8qHvAA0rxZTMmd6ZDBPCi9362D+pfp/1ESVOA6O14N
rMBAbadF0pVM1UNvcvxSQaeqwCNqg5OuYKgyy9rhlH0WiQ6ijvKPrLVwg2pKZ2hj
DCmwEqmKNEpxIFeOvmgFs/uhOEBx2IOF58xTc0+X81q96yTVm80anG1VTNFX577U
gO8Ts0K/gWTD8ghxz4vh4/llc4yUv8ep8zB3qdSfL8C217UJIwnshkbPct7P1DTh
8vpWtUeVJPu6rwcxMQXy0NntNZjRo1aqrv+htvFzPAMicM2KEAp73eOjStefvtr5
JkdbvhhOR6dLwPrUEXM5FW5ewURegLjLcEqw3tq8kMnH0nEYjWOMBaB+uT0QFXun
EXNqCpQHmHisem/3lGU+iVPc9lPf3C6tPIgjvoSplKcah1l3phVx6a5ReL22Zx2n
qB2ePHfqToMjMcWiW3O3sbRpaDb+Br7xI4l8F3oeicvfv7SKB8k1u/w2IIoXKFIa
FIdD6R0UIPgdnH5c03EC
=abfY
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
- stable fix for an infinite loop in RPC state machine
- stable fix for a use after free situation in the NFSv4 trunking discovery
- stable fix for error handling in the NFSv4 trunking discovery
- stable fix for the page write update code
- stable fix for the NFSv4.1 mount time security negotiation
- stable fix for the NFSv4 open code.
- O_DIRECT locking fixes
- fix an Oops in the pnfs file commit code
- RPC layer needs finer grained handling of connection errors
- more RPC GSS upcall fixes"
* tag 'nfs-for-3.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (30 commits)
pnfs: Proper delay for NFS4ERR_RECALLCONFLICT in layout_get_done
pnfs: fix BUG in filelayout_recover_commit_reqs
nfs4: fix discover_server_trunking use after free
NFSv4.1: Handle errors correctly in nfs41_walk_client_list
nfs: always make sure page is up-to-date before extending a write to cover the entire page
nfs: page cache invalidation for dio
nfs: take i_mutex during direct I/O reads
nfs: merge nfs_direct_write into nfs_file_direct_write
nfs: merge nfs_direct_read into nfs_file_direct_read
nfs: increment i_dio_count for reads, too
nfs: defer inode_dio_done call until size update is done
nfs: fix size updates for aio writes
nfs4.1: properly handle ENOTSUP in SECINFO_NO_NAME
NFSv4.1: Fix a race in nfs4_write_inode
NFSv4.1: Don't trust attributes if a pNFS LAYOUTCOMMIT is outstanding
point to the right include file in a comment (left over from a9004abc3)
NFS: dprintk() should not print negative fileids and inode numbers
nfs: fix dead code of ipv6_addr_scope
sunrpc: Fix infinite loop in RPC state machine
SUNRPC: Add tracepoint for socket errors
...
Pull networking updates from David Miller:
1) BPF debugger and asm tool by Daniel Borkmann.
2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.
3) Correct reciprocal_divide and update users, from Hannes Frederic
Sowa and Daniel Borkmann.
4) Currently we only have a "set" operation for the hw timestamp socket
ioctl, add a "get" operation to match. From Ben Hutchings.
5) Add better trace events for debugging driver datapath problems, also
from Ben Hutchings.
6) Implement auto corking in TCP, from Eric Dumazet. Basically, if we
have a small send and a previous packet is already in the qdisc or
device queue, defer until TX completion or we get more data.
7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.
8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
Borkmann.
9) Share IP header compression code between Bluetooth and IEEE802154
layers, from Jukka Rissanen.
10) Fix ipv6 router reachability probing, from Jiri Benc.
11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.
12) Support tunneling in GRO layer, from Jerry Chu.
13) Allow bonding to be configured fully using netlink, from Scott
Feldman.
14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
already get the TCI. From Atzm Watanabe.
15) New "Heavy Hitter" qdisc, from Terry Lam.
16) Significantly improve the IPSEC support in pktgen, from Fan Du.
17) Allow ipv4 tunnels to cache routes, just like sockets. From Tom
Herbert.
18) Add Proportional Integral Enhanced packet scheduler, from Vijay
Subramanian.
19) Allow openvswitch to mmap'd netlink, from Thomas Graf.
20) Key TCP metrics blobs also by source address, not just destination
address. From Christoph Paasch.
21) Support 10G in generic phylib. From Andy Fleming.
22) Try to short-circuit GRO flow compares using device provided RX
hash, if provided. From Tom Herbert.
The wireless and netfilter folks have been busy little bees too.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
net/cxgb4: Fix referencing freed adapter
ipv6: reallocate addrconf router for ipv6 address when lo device up
fib_frontend: fix possible NULL pointer dereference
rtnetlink: remove IFLA_BOND_SLAVE definition
rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
qlcnic: update version to 5.3.55
qlcnic: Enhance logic to calculate msix vectors.
qlcnic: Refactor interrupt coalescing code for all adapters.
qlcnic: Update poll controller code path
qlcnic: Interrupt code cleanup
qlcnic: Enhance Tx timeout debugging.
qlcnic: Use bool for rx_mac_learn.
bonding: fix u64 division
rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
sfc: Use the correct maximum TX DMA ring size for SFC9100
Add Shradha Shah as the sfc driver maintainer.
net/vxlan: Share RX skb de-marking and checksum checks with ovs
tulip: cleanup by using ARRAY_SIZE()
ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
net/cxgb4: Don't retrieve stats during recovery
...
- ACPI core changes to make it create a struct acpi_device object for every
device represented in the ACPI tables during all namespace scans regardless
of the current status of that device. In accordance with this, ACPI hotplug
operations will not delete those objects, unless the underlying ACPI tables
go away.
- On top of the above, new sysfs attribute for ACPI device objects allowing
user space to check device status by triggering the execution of _STA for
its ACPI object. From Srinivas Pandruvada.
- ACPI core hotplug changes reducing code duplication, integrating the
PCI root hotplug with the core and reworking container hotplug.
- ACPI core simplifications making it use ACPI_COMPANION() in the code
"glueing" ACPI device objects to "physical" devices.
- ACPICA update to upstream version 20131218. This adds support for the
DBG2 and PCCT tables to ACPICA, fixes some bugs and improves debug
facilities. From Bob Moore, Lv Zheng and Betty Dall.
- Init code change to carry out the early ACPI initialization earlier.
That should allow us to use ACPI during the timekeeping initialization
and possibly to simplify the EFI initialization too. From Chun-Yi Lee.
- Clenups of the inclusions of ACPI headers in many places all over from
Lv Zheng and Rashika Kheria (work in progress).
- New helper for ACPI _DSM execution and rework of the code in drivers
that uses _DSM to execute it via the new helper. From Jiang Liu.
- New Win8 OSI blacklist entries from Takashi Iwai.
- Assorted ACPI fixes and cleanups from Al Stone, Emil Goode, Hanjun Guo,
Lan Tianyu, Masanari Iida, Oliver Neukum, Prarit Bhargava, Rashika Kheria,
Tang Chen, Zhang Rui.
- intel_pstate driver updates, including proper Baytrail support, from
Dirk Brandewie and intel_pstate documentation from Ramkumar Ramachandra.
- Generic CPU boost ("turbo") support for cpufreq from Lukasz Majewski.
- powernow-k6 cpufreq driver fixes from Mikulas Patocka.
- cpufreq core fixes and cleanups from Viresh Kumar, Jane Li, Mark Brown.
- Assorted cpufreq drivers fixes and cleanups from Anson Huang, John Tobias,
Paul Bolle, Paul Walmsley, Sachin Kamat, Shawn Guo, Viresh Kumar.
- cpuidle cleanups from Bartlomiej Zolnierkiewicz.
- Support for hibernation APM events from Bin Shi.
- Hibernation fix to avoid bringing up nonboot CPUs with ACPI EC disabled
during thaw transitions from Bjørn Mork.
- PM core fixes and cleanups from Ben Dooks, Leonardo Potenza, Ulf Hansson.
- PNP subsystem fixes and cleanups from Dmitry Torokhov, Levente Kurusa,
Rashika Kheria.
- New tool for profiling system suspend from Todd E Brandt and a cpupower
tool cleanup from One Thousand Gnomes.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJS3a1eAAoJEILEb/54YlRxnTgP/iGawvgjKWm6Qqp7WSIvd5gQ
zZ6q75C6Pc/W2fq1+OzVGnpCF8WYFy+nFDAXOvUHjIXuoxSwFcuW5l4aMckgl/0a
TXEWe9MJrCHHRfDApfFacCJ44U02bjJAD5vTyL/hKA+IHeinq4WCSojryYC+8jU0
cBrUIV0aNH8r5JR2WJNAyv/U29rXsDUOu0I4qTqZ4YaZT6AignMjtLXn1e9AH1Pn
DPZphTIo/HMnb+kgBOjt4snMk+ahVO9eCOxh/hH8ecnWExw9WynXoU5Nsna0tSZs
ssyHC7BYexD3oYsG8D52cFUpp4FCsJ0nFQNa2kw0LY+0FBNay43LySisKYHZPXEs
2WpESDv+/t7yhtnrvM+TtA7aBheKm2XMWGFSu/aERLE17jIidOkXKH5Y7ryYLNf/
uyRKxNS0NcZWZ0G+/wuY02jQYNkfYz3k/nTr8BAUItRBjdporGIRNEnR9gPzgCUC
uQhjXWMPulqubr8xbyefPWHTEzU2nvbXwTUWGjrBxSy8zkyy5arfqizUj+VG6afT
NsboANoMHa9b+xdzigSFdA3nbVK6xBjtU6Ywntk9TIpODKF5NgfARx0H+oSH+Zrj
32bMzgZtHw/lAbYsnQ9OnTY6AEWQYt6NMuVbTiLXrMHhM3nWwfg/XoN4nZqs6jPo
IYvE6WhQZU6L6fptGHFC
=dRf6
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management updates from Rafael Wysocki:
"As far as the number of commits goes, the top spot belongs to ACPI
this time with cpufreq in the second position and a handful of PM
core, PNP and cpuidle updates. They are fixes and cleanups mostly, as
usual, with a couple of new features in the mix.
The most visible change is probably that we will create struct
acpi_device objects (visible in sysfs) for all devices represented in
the ACPI tables regardless of their status and there will be a new
sysfs attribute under those objects allowing user space to check that
status via _STA.
Consequently, ACPI device eject or generally hot-removal will not
delete those objects, unless the table containing the corresponding
namespace nodes is unloaded, which is extremely rare. Also ACPI
container hotplug will be handled quite a bit differently and cpufreq
will support CPU boost ("turbo") generically and not only in the
acpi-cpufreq driver.
Specifics:
- ACPI core changes to make it create a struct acpi_device object for
every device represented in the ACPI tables during all namespace
scans regardless of the current status of that device. In
accordance with this, ACPI hotplug operations will not delete those
objects, unless the underlying ACPI tables go away.
- On top of the above, new sysfs attribute for ACPI device objects
allowing user space to check device status by triggering the
execution of _STA for its ACPI object. From Srinivas Pandruvada.
- ACPI core hotplug changes reducing code duplication, integrating
the PCI root hotplug with the core and reworking container hotplug.
- ACPI core simplifications making it use ACPI_COMPANION() in the
code "glueing" ACPI device objects to "physical" devices.
- ACPICA update to upstream version 20131218. This adds support for
the DBG2 and PCCT tables to ACPICA, fixes some bugs and improves
debug facilities. From Bob Moore, Lv Zheng and Betty Dall.
- Init code change to carry out the early ACPI initialization
earlier. That should allow us to use ACPI during the timekeeping
initialization and possibly to simplify the EFI initialization too.
From Chun-Yi Lee.
- Clenups of the inclusions of ACPI headers in many places all over
from Lv Zheng and Rashika Kheria (work in progress).
- New helper for ACPI _DSM execution and rework of the code in
drivers that uses _DSM to execute it via the new helper. From
Jiang Liu.
- New Win8 OSI blacklist entries from Takashi Iwai.
- Assorted ACPI fixes and cleanups from Al Stone, Emil Goode, Hanjun
Guo, Lan Tianyu, Masanari Iida, Oliver Neukum, Prarit Bhargava,
Rashika Kheria, Tang Chen, Zhang Rui.
- intel_pstate driver updates, including proper Baytrail support,
from Dirk Brandewie and intel_pstate documentation from Ramkumar
Ramachandra.
- Generic CPU boost ("turbo") support for cpufreq from Lukasz
Majewski.
- powernow-k6 cpufreq driver fixes from Mikulas Patocka.
- cpufreq core fixes and cleanups from Viresh Kumar, Jane Li, Mark
Brown.
- Assorted cpufreq drivers fixes and cleanups from Anson Huang, John
Tobias, Paul Bolle, Paul Walmsley, Sachin Kamat, Shawn Guo, Viresh
Kumar.
- cpuidle cleanups from Bartlomiej Zolnierkiewicz.
- Support for hibernation APM events from Bin Shi.
- Hibernation fix to avoid bringing up nonboot CPUs with ACPI EC
disabled during thaw transitions from Bjørn Mork.
- PM core fixes and cleanups from Ben Dooks, Leonardo Potenza, Ulf
Hansson.
- PNP subsystem fixes and cleanups from Dmitry Torokhov, Levente
Kurusa, Rashika Kheria.
- New tool for profiling system suspend from Todd E Brandt and a
cpupower tool cleanup from One Thousand Gnomes"
* tag 'pm+acpi-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (153 commits)
thermal: exynos: boost: Automatic enable/disable of BOOST feature (at Exynos4412)
cpufreq: exynos4x12: Change L0 driver data to CPUFREQ_BOOST_FREQ
Documentation: cpufreq / boost: Update BOOST documentation
cpufreq: exynos: Extend Exynos cpufreq driver to support boost
cpufreq / boost: Kconfig: Support for software-managed BOOST
acpi-cpufreq: Adjust the code to use the common boost attribute
cpufreq: Add boost frequency support in core
intel_pstate: Add trace point to report internal state.
cpufreq: introduce cpufreq_generic_get() routine
ARM: SA1100: Create dummy clk_get_rate() to avoid build failures
cpufreq: stats: create sysfs entries when cpufreq_stats is a module
cpufreq: stats: free table and remove sysfs entry in a single routine
cpufreq: stats: remove hotplug notifiers
cpufreq: stats: handle cpufreq_unregister_driver() and suspend/resume properly
cpufreq: speedstep: remove unused speedstep_get_state
platform: introduce OF style 'modalias' support for platform bus
PM / tools: new tool for suspend/resume performance optimization
ACPI: fix module autoloading for ACPI enumerated devices
ACPI: add module autoloading support for ACPI enumerated devices
ACPI: fix create_modalias() return value handling
...
This patch-set includes the following major enhancement patches.
o support inline_data
o refactor bio operations such as merge operations and rw type assignment
o enhance the direct IO path
o enhance bio operations
o truncate a node page when it becomes obsolete
o add sysfs entries: small_discards, max_victim_search, and in-place-update
o add a sysfs entry to control max_victim_search
The other bug fixes are as follows.
o fix a bug in truncate_partial_nodes
o avoid warnings during sparse and build process
o fix error handling flows
o fix potential bit overflows
And, there are a bunch of cleanups.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJS4HQfAAoJEEAUqH6CSFDSyyMP/iUXSMC9yw6eOmSjAh3boc6+
C7e4zrhdovekGTuZgg41SLdr83cpbEohv11wcXAfxB+eYFEz0zrAVzt54zMi7uOL
9JmFJ6XVL/T3omI5hpEwWHg6S6tOynN6mcjacsrvypEekgjHbbpLudSw6SCu3dKz
Lpc3z6CxrWbhvX8Iyf1j8mCceWkTO6eRv7u2H4Njtsq4Tukw3BHiBsURXt6kGwpx
CvRBgCFdQhv4GAtbDosmVjNWOUxvik7w2epHAPQGddFTgaCL9uS+gfweHK6H9EDp
1e3BDhmn5r9IhiLY8KVXRc8+po9kQeO1jNQATBuWggfjJSGbEBmrEQX4MFE3uCi9
q84hGV9+yaJxoT2A21qIeWgorF9gjqNbnrrENKHyKhOqXJSrh48u5LUV8KqIyz1Y
Qw62cypEB+PQxWegN76vwX/OrHMCLYMQ6c78bYLSwkBKonOrF5sN2+kJW5+zEj6n
q2cYi1PLMJe7LTcULUrxJTSPFLKM5yA2oYZq3LN4sUYBeN6USaouaIqcZBqRBTCO
adqlTa3sWytkDMAHsTpwrHABKK7pwiZoPLDVwjo0TIJ6Us4JhDtTktp5pj24fQ7Y
6lC9w4VbfAKtq8fMV17rZYD0lQFlmZk4uQRJ8XYicCRFx11kMPKYzdGmP5aVXWru
wxcztktnABtCAXK0PFLf
=gVDh
-----END PGP SIGNATURE-----
Merge tag 'for-f2fs-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, a couple of sysfs entries were introduced to tune the
f2fs at runtime.
In addition, f2fs starts to support inline_data and improves the
read/write performance in some workloads by refactoring bio-related
flows.
This patch-set includes the following major enhancement patches.
- support inline_data
- refactor bio operations such as merge operations and rw type
assignment
- enhance the direct IO path
- enhance bio operations
- truncate a node page when it becomes obsolete
- add sysfs entries: small_discards, max_victim_search, and
in-place-update
- add a sysfs entry to control max_victim_search
The other bug fixes are as follows.
- fix a bug in truncate_partial_nodes
- avoid warnings during sparse and build process
- fix error handling flows
- fix potential bit overflows
And, there are a bunch of cleanups"
* tag 'for-f2fs-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (95 commits)
f2fs: drop obsolete node page when it is truncated
f2fs: introduce NODE_MAPPING for code consistency
f2fs: remove the orphan block page array
f2fs: add help function META_MAPPING
f2fs: move a branch for code redability
f2fs: call mark_inode_dirty to flush dirty pages
f2fs: clean checkpatch warnings
f2fs: missing REQ_META and REQ_PRIO when sync_meta_pages(META_FLUSH)
f2fs: avoid f2fs_balance_fs call during pageout
f2fs: add delimiter to seperate name and value in debug phrase
f2fs: use spinlock rather than mutex for better speed
f2fs: move alloc new orphan node out of lock protection region
f2fs: move grabing orphan pages out of protection region
f2fs: remove the needless parameter of f2fs_wait_on_page_writeback
f2fs: update documents and a MAINTAINERS entry
f2fs: add a sysfs entry to control max_victim_search
f2fs: improve write performance under frequent fsync calls
f2fs: avoid to read inline data except first page
f2fs: avoid to left uninitialized data in page when read inline data
f2fs: fix truncate_partial_nodes bug
...
triggers by Tom Zanussi. A trigger is a way to enable an action when an
event is hit. The actions are:
o trace on/off - enable or disable tracing
o snapshot - save the current trace buffer in the snapshot
o stacktrace - dump the current stack trace to the ringbuffer
o enable/disable events - enable or disable another event
Namhyung Kim added updates to the tracing uprobes code. Having the
uprobes add support for fetch methods.
The rest are various bug fixes with the new code, and minor ones for
the old code.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
iQEcBAABAgAGBQJS3Z9fAAoJEKQekfcNnQGuFf0H/0CteaN+BJjpif6Tnxia15Sp
pcftzU0lgqfNzsfitmbjiVTgXWqCghoZo8UI9tQZvBZ9wmDIxeXQR73uoBgVlSCQ
ovyBO/R8r+lq+7EsDCwntZvrLbcdn6s/jzoruRvt7r35ghK5pH81DNR1BOzTQBhW
x+361Xtc13aok7N7JN8KR96VDUP9f8KU6PWqJ5lgS2Zl+wbVw6b0p8OV8IMCHczP
MdYrx8y4Jv4QWW7rMShAAVBe9qJQ56JWiWA17ysa4kY8BkKQ7QtlEFr+r1YY0nX5
67brXiL8u0NFzRx5y2VRpGc25BbImnVBFpoLQ5Itluq9OdZE3aOQubzXlY70R6g=
=Hkho
-----END PGP SIGNATURE-----
Merge tag 'trace-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"This pull request has a new feature to ftrace, namely the trace event
triggers by Tom Zanussi. A trigger is a way to enable an action when
an event is hit. The actions are:
o trace on/off - enable or disable tracing
o snapshot - save the current trace buffer in the snapshot
o stacktrace - dump the current stack trace to the ringbuffer
o enable/disable events - enable or disable another event
Namhyung Kim added updates to the tracing uprobes code. Having the
uprobes add support for fetch methods.
The rest are various bug fixes with the new code, and minor ones for
the old code"
* tag 'trace-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (38 commits)
tracing: Fix buggered tee(2) on tracing_pipe
tracing: Have trace buffer point back to trace_array
ftrace: Fix synchronization location disabling and freeing ftrace_ops
ftrace: Have function graph only trace based on global_ops filters
ftrace: Synchronize setting function_trace_op with ftrace_trace_function
tracing: Show available event triggers when no trigger is set
tracing: Consolidate event trigger code
tracing: Fix counter for traceon/off event triggers
tracing: Remove double-underscore naming in syscall trigger invocations
tracing/kprobes: Add trace event trigger invocations
tracing/probes: Fix build break on !CONFIG_KPROBE_EVENT
tracing/uprobes: Add @+file_offset fetch method
uprobes: Allocate ->utask before handler_chain() for tracing handlers
tracing/uprobes: Add support for full argument access methods
tracing/uprobes: Fetch args before reserving a ring buffer
tracing/uprobes: Pass 'is_return' to traceprobe_parse_probe_arg()
tracing/probes: Implement 'memory' fetch method for uprobes
tracing/probes: Add fetch{,_size} member into deref fetch method
tracing/probes: Move 'symbol' fetch method to kprobes
tracing/probes: Implement 'stack' fetch method for uprobes
...
The broad goal of the series is to improve allocation success rates for
huge pages through memory compaction, while trying not to increase the
compaction overhead. The original objective was to reintroduce
capturing of high-order pages freed by the compaction, before they are
split by concurrent activity. However, several bugs and opportunities
for simple improvements were found in the current implementation, mostly
through extra tracepoints (which are however too ugly for now to be
considered for sending).
The patches mostly deal with two mechanisms that reduce compaction
overhead, which is caching the progress of migrate and free scanners,
and marking pageblocks where isolation failed to be skipped during
further scans.
Patch 1 (from mgorman) adds tracepoints that allow calculate time spent in
compaction and potentially debug scanner pfn values.
Patch 2 encapsulates the some functionality for handling deferred compactions
for better maintainability, without a functional change
type is not determined without being actually needed.
Patch 3 fixes a bug where cached scanner pfn's are sometimes reset only after
they have been read to initialize a compaction run.
Patch 4 fixes a bug where scanners meeting is sometimes not properly detected
and can lead to multiple compaction attempts quitting early without
doing any work.
Patch 5 improves the chances of sync compaction to process pageblocks that
async compaction has skipped due to being !MIGRATE_MOVABLE.
Patch 6 improves the chances of sync direct compaction to actually do anything
when called after async compaction fails during allocation slowpath.
The impact of patches were validated using mmtests's stress-highalloc
benchmark with mmtests's stress-highalloc benchmark on a x86_64 machine
with 4GB memory.
Due to instability of the results (mostly related to the bugs fixed by
patches 2 and 3), 10 iterations were performed, taking min,mean,max
values for success rates and mean values for time and vmstat-based
metrics.
First, the default GFP_HIGHUSER_MOVABLE allocations were tested with the
patches stacked on top of v3.13-rc2. Patch 2 is OK to serve as baseline
due to no functional changes in 1 and 2. Comments below.
stress-highalloc
3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2
2-nothp 3-nothp 4-nothp 5-nothp 6-nothp
Success 1 Min 9.00 ( 0.00%) 10.00 (-11.11%) 43.00 (-377.78%) 43.00 (-377.78%) 33.00 (-266.67%)
Success 1 Mean 27.50 ( 0.00%) 25.30 ( 8.00%) 45.50 (-65.45%) 45.90 (-66.91%) 46.30 (-68.36%)
Success 1 Max 36.00 ( 0.00%) 36.00 ( 0.00%) 47.00 (-30.56%) 48.00 (-33.33%) 52.00 (-44.44%)
Success 2 Min 10.00 ( 0.00%) 8.00 ( 20.00%) 46.00 (-360.00%) 45.00 (-350.00%) 35.00 (-250.00%)
Success 2 Mean 26.40 ( 0.00%) 23.50 ( 10.98%) 47.30 (-79.17%) 47.60 (-80.30%) 48.10 (-82.20%)
Success 2 Max 34.00 ( 0.00%) 33.00 ( 2.94%) 48.00 (-41.18%) 50.00 (-47.06%) 54.00 (-58.82%)
Success 3 Min 65.00 ( 0.00%) 63.00 ( 3.08%) 85.00 (-30.77%) 84.00 (-29.23%) 85.00 (-30.77%)
Success 3 Mean 76.70 ( 0.00%) 70.50 ( 8.08%) 86.20 (-12.39%) 85.50 (-11.47%) 86.00 (-12.13%)
Success 3 Max 87.00 ( 0.00%) 86.00 ( 1.15%) 88.00 ( -1.15%) 87.00 ( 0.00%) 87.00 ( 0.00%)
3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2
2-nothp 3-nothp 4-nothp 5-nothp 6-nothp
User 6437.72 6459.76 5960.32 5974.55 6019.67
System 1049.65 1049.09 1029.32 1031.47 1032.31
Elapsed 1856.77 1874.48 1949.97 1994.22 1983.15
3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2
2-nothp 3-nothp 4-nothp 5-nothp 6-nothp
Minor Faults 253952267 254581900 250030122 250507333 250157829
Major Faults 420 407 506 530 530
Swap Ins 4 9 9 6 6
Swap Outs 398 375 345 346 333
Direct pages scanned 197538 189017 298574 287019 299063
Kswapd pages scanned 1809843 1801308 1846674 1873184 1861089
Kswapd pages reclaimed 1806972 1798684 1844219 1870509 1858622
Direct pages reclaimed 197227 188829 298380 286822 298835
Kswapd efficiency 99% 99% 99% 99% 99%
Kswapd velocity 953.382 970.449 952.243 934.569 922.286
Direct efficiency 99% 99% 99% 99% 99%
Direct velocity 104.058 101.832 153.961 143.200 148.205
Percentage direct scans 9% 9% 13% 13% 13%
Zone normal velocity 347.289 359.676 348.063 339.933 332.983
Zone dma32 velocity 710.151 712.605 758.140 737.835 737.507
Zone dma velocity 0.000 0.000 0.000 0.000 0.000
Page writes by reclaim 557.600 429.000 353.600 426.400 381.800
Page writes file 159 53 7 79 48
Page writes anon 398 375 345 346 333
Page reclaim immediate 825 644 411 575 420
Sector Reads 2781750 2769780 2878547 2939128 2910483
Sector Writes 12080843 12083351 12012892 12002132 12010745
Page rescued immediate 0 0 0 0 0
Slabs scanned 15756541545344 1778406 1786700 1794073
Direct inode steals 9657 10037 15795 14104 14645
Kswapd inode steals 46857 46335 50543 50716 51796
Kswapd skipped wait 0 0 0 0 0
THP fault alloc 97 91 81 71 77
THP collapse alloc 456 506 546 544 565
THP splits 6 5 5 4 4
THP fault fallback 0 1 0 0 0
THP collapse fail 14 14 12 13 12
Compaction stalls 1006 980 1537 1536 1548
Compaction success 303 284 562 559 578
Compaction failures 702 696 974 976 969
Page migrate success 1177325 1070077 3927538 3781870 3877057
Page migrate failure 0 0 0 0 0
Compaction pages isolated 2547248 2306457 8301218 8008500 8200674
Compaction migrate scanned 42290478 38832618 153961130 154143900 159141197
Compaction free scanned 89199429 79189151 356529027 351943166 356326727
Compaction cost 1566 1426 5312 5156 5294
NUMA PTE updates 0 0 0 0 0
NUMA hint faults 0 0 0 0 0
NUMA hint local faults 0 0 0 0 0
NUMA hint local percent 100 100 100 100 100
NUMA pages migrated 0 0 0 0 0
AutoNUMA cost 0 0 0 0 0
Observations:
- The "Success 3" line is allocation success rate with system idle
(phases 1 and 2 are with background interference). I used to get stable
values around 85% with vanilla 3.11. The lower min and mean values came
with 3.12. This was bisected to commit 81c0a2bb ("mm: page_alloc: fair
zone allocator policy") As explained in comment for patch 3, I don't
think the commit is wrong, but that it makes the effect of compaction
bugs worse. From patch 3 onwards, the results are OK and match the 3.11
results.
- Patch 4 also clearly helps phases 1 and 2, and exceeds any results
I've seen with 3.11 (I didn't measure it that thoroughly then, but it
was never above 40%).
- Compaction cost and number of scanned pages is higher, especially due
to patch 4. However, keep in mind that patches 3 and 4 fix existing
bugs in the current design of compaction overhead mitigation, they do
not change it. If overhead is found unacceptable, then it should be
decreased differently (and consistently, not due to random conditions)
than the current implementation does. In contrast, patches 5 and 6
(which are not strictly bug fixes) do not increase the overhead (but
also not success rates). This might be a limitation of the
stress-highalloc benchmark as it's quite uniform.
Another set of results is when configuring stress-highalloc t allocate
with similar flags as THP uses:
(GFP_HIGHUSER_MOVABLE|__GFP_NOMEMALLOC|__GFP_NORETRY|__GFP_NO_KSWAPD)
stress-highalloc
3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2
2-thp 3-thp 4-thp 5-thp 6-thp
Success 1 Min 2.00 ( 0.00%) 7.00 (-250.00%) 18.00 (-800.00%) 19.00 (-850.00%) 26.00 (-1200.00%)
Success 1 Mean 19.20 ( 0.00%) 17.80 ( 7.29%) 29.20 (-52.08%) 29.90 (-55.73%) 32.80 (-70.83%)
Success 1 Max 27.00 ( 0.00%) 29.00 ( -7.41%) 35.00 (-29.63%) 36.00 (-33.33%) 37.00 (-37.04%)
Success 2 Min 3.00 ( 0.00%) 8.00 (-166.67%) 21.00 (-600.00%) 21.00 (-600.00%) 32.00 (-966.67%)
Success 2 Mean 19.30 ( 0.00%) 17.90 ( 7.25%) 32.20 (-66.84%) 32.60 (-68.91%) 35.70 (-84.97%)
Success 2 Max 27.00 ( 0.00%) 30.00 (-11.11%) 36.00 (-33.33%) 37.00 (-37.04%) 39.00 (-44.44%)
Success 3 Min 62.00 ( 0.00%) 62.00 ( 0.00%) 85.00 (-37.10%) 75.00 (-20.97%) 64.00 ( -3.23%)
Success 3 Mean 66.30 ( 0.00%) 65.50 ( 1.21%) 85.60 (-29.11%) 83.40 (-25.79%) 83.50 (-25.94%)
Success 3 Max 70.00 ( 0.00%) 69.00 ( 1.43%) 87.00 (-24.29%) 86.00 (-22.86%) 87.00 (-24.29%)
3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2
2-thp 3-thp 4-thp 5-thp 6-thp
User 6547.93 6475.85 6265.54 6289.46 6189.96
System 1053.42 1047.28 1043.23 1042.73 1038.73
Elapsed 1835.43 1821.96 1908.67 1912.74 1956.38
3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2
2-thp 3-thp 4-thp 5-thp 6-thp
Minor Faults 256805673 253106328 253222299 249830289 251184418
Major Faults 395 375 423 434 448
Swap Ins 12 10 10 12 9
Swap Outs 530 537 487 455 415
Direct pages scanned 71859 86046 153244 152764 190713
Kswapd pages scanned 1900994 1870240 1898012 1892864 1880520
Kswapd pages reclaimed 1897814 1867428 1894939 1890125 1877924
Direct pages reclaimed 71766 85908 153167 152643 190600
Kswapd efficiency 99% 99% 99% 99% 99%
Kswapd velocity 1029.000 1067.782 1000.091 991.049 951.218
Direct efficiency 99% 99% 99% 99% 99%
Direct velocity 38.897 49.127 80.747 79.983 96.468
Percentage direct scans 3% 4% 7% 7% 9%
Zone normal velocity 351.377 372.494 348.910 341.689 335.310
Zone dma32 velocity 716.520 744.414 731.928 729.343 712.377
Zone dma velocity 0.000 0.000 0.000 0.000 0.000
Page writes by reclaim 669.300 604.000 545.700 538.900 429.900
Page writes file 138 66 58 83 14
Page writes anon 530 537 487 455 415
Page reclaim immediate 806 655 772 548 517
Sector Reads 2711956 2703239 2811602 2818248 2839459
Sector Writes 12163238 12018662 12038248 11954736 11994892
Page rescued immediate 0 0 0 0 0
Slabs scanned 1385088 1388364 1507968 1513292 1558656
Direct inode steals 1739 2564 4622 5496 6007
Kswapd inode steals 47461 46406 47804 48013 48466
Kswapd skipped wait 0 0 0 0 0
THP fault alloc 110 82 84 69 70
THP collapse alloc 445 482 467 462 539
THP splits 6 5 4 5 3
THP fault fallback 3 0 0 0 0
THP collapse fail 15 14 14 14 13
Compaction stalls 659 685 1033 1073 1111
Compaction success 222 225 410 427 456
Compaction failures 436 460 622 646 655
Page migrate success 446594 439978 1085640 1095062 1131716
Page migrate failure 0 0 0 0 0
Compaction pages isolated 1029475 1013490 2453074 2482698 2565400
Compaction migrate scanned 9955461 11344259 24375202 27978356 30494204
Compaction free scanned 27715272 28544654 80150615 82898631 85756132
Compaction cost 552 555 1344 1379 1436
NUMA PTE updates 0 0 0 0 0
NUMA hint faults 0 0 0 0 0
NUMA hint local faults 0 0 0 0 0
NUMA hint local percent 100 100 100 100 100
NUMA pages migrated 0 0 0 0 0
AutoNUMA cost 0 0 0 0 0
There are some differences from the previous results for THP-like allocations:
- Here, the bad result for unpatched kernel in phase 3 is much more
consistent to be between 65-70% and not related to the "regression" in
3.12. Still there is the improvement from patch 4 onwards, which brings
it on par with simple GFP_HIGHUSER_MOVABLE allocations.
- Compaction costs have increased, but nowhere near as much as the
non-THP case. Again, the patches should be worth the gained
determininsm.
- Patches 5 and 6 somewhat increase the number of migrate-scanned pages.
This is most likely due to __GFP_NO_KSWAPD flag, which means the cached
pfn's and pageblock skip bits are not reset by kswapd that often (at
least in phase 3 where no concurrent activity would wake up kswapd) and
the patches thus help the sync-after-async compaction. It doesn't
however show that the sync compaction would help so much with success
rates, which can be again seen as a limitation of the benchmark
scenario.
This patch (of 6):
Add two tracepoints for compaction begin and end of a zone. Using this it
is possible to calculate how much time a workload is spending within
compaction and potentially debug problems related to cached pfns for
scanning. In combination with the direct reclaim and slab trace points it
should be possible to estimate most allocation-related overhead for a
workload.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds three tracepoints
o trace_sched_move_numa when a task is moved to a node
o trace_sched_swap_numa when a task is swapped with another task
o trace_sched_stick_numa when a numa-related migration fails
The tracepoints allow the NUMA scheduler activity to be monitored and the
following high-level metrics can be calculated
o NUMA migrated stuck nr trace_sched_stick_numa
o NUMA migrated idle nr trace_sched_move_numa
o NUMA migrated swapped nr trace_sched_swap_numa
o NUMA local swapped trace_sched_swap_numa src_nid == dst_nid (should never happen)
o NUMA remote swapped trace_sched_swap_numa src_nid != dst_nid (should == NUMA migrated swapped)
o NUMA group swapped trace_sched_swap_numa src_ngid == dst_ngid
Maybe a small number of these are acceptable
but a high number would be a major surprise.
It would be even worse if bounces are frequent.
o NUMA avg task migs. Average number of migrations for tasks
o NUMA stddev task mig Self-explanatory
o NUMA max task migs. Maximum number of migrations for a single task
In general the intent of the tracepoints is to help diagnose problems
where automatic NUMA balancing appears to be doing an excessive amount
of useless work.
[akpm@linux-foundation.org: remove semicolon-after-if, repair coding-style]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A low local/remote numa hinting fault ratio is potentially explained by
failed migrations. This patch adds a tracepoint that fires when
migration fails due to migration rate limitation.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add perf trace event "power:pstate_sample" to report driver state to
aid in diagnosing issues reported against intel_pstate.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The existing net/netif_rx and net/netif_receive_skb trace events
provide little information about the skb, nor do they indicate how it
entered the stack.
Add trace events at entry of each of the exported functions, including
most fields that are likely to be interesting for debugging driver
datapath behaviour. Split netif_rx() and netif_receive_skb() so that
internal calls are not traced.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing net/net_dev_xmit trace event provides little information
about the skb that has been passed to the driver, and it is not
simple to add more since the skb may already have been freed at
the point the event is emitted.
Add a separate trace event before the skb is passed to the driver,
including most fields that are likely to be interesting for debugging
driver datapath behaviour.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The event trigger code that checks for callback triggers before and
after recording of an event has lots of flags checks. This code is
duplicated throughout the ftrace events, kprobes and system calls.
They all do the exact same checks against the event flags.
Added helper functions ftrace_trigger_soft_disabled(),
event_trigger_unlock_commit() and event_trigger_unlock_commit_regs()
that consolidated the code and these are used instead.
Link: http://lkml.kernel.org/r/20140106222703.5e7dbba2@gandalf.local.home
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
We need a reserve for allocating buckets for new btree nodes - and now that
we've got multiple btrees, it really needs to be per btree.
This reworks the reserves so we've got separate freelists for each reserve
instead of watermarks, which seems to make things a bit cleaner, and it adds
some code so that btree_split() can make sure the reserve is available before it
starts.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJSwLfoAAoJEHm+PkMAQRiGi6QH/1U1B7lmHChDTw3jj1lfm9gA
189Si4QJlnxFWCKHvKEL+pcaVuACU+aMGI8+KyMYK4/JfuWVjjj5fr/SvyHH2/8m
LdSK8aHMhJ46uBS4WJ/l6v46qQa5e2vn8RKSBAyKm/h4vpt+hd6zJdoFrFai4th7
k/TAwOAEHI5uzexUChwLlUBRTvbq4U8QUvDu+DeifC8cT63CGaaJ4qVzjOZrx1an
eP6UXZrKDASZs7RU950i7xnFVDQu4PsjlZi25udsbeiKcZJgPqGgXz5ULf8ZH8RQ
YCi1JOnTJRGGjyIOyLj7pyB01h7XiSM2+eMQ0S7g54F2s7gCJ58c2UwQX45vRWU=
=/4/R
-----END PGP SIGNATURE-----
Merge tag 'v3.13-rc6' into for-3.14/core
Needed to bring blk-mq uptodate, since changes have been going in
since for-3.14/core was established.
Fixup merge issues related to the immutable biovec changes.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Conflicts:
block/blk-flush.c
fs/btrfs/check-integrity.c
fs/btrfs/extent_io.c
fs/btrfs/scrub.c
fs/logfs/dev_bdev.c
This patch integrates redundant bio operations on read and write IOs.
1. Move bio-related codes to the top of data.c.
2. Replace f2fs_submit_bio with f2fs_submit_merged_bio, which handles read
bios additionally.
3. Introduce __submit_merged_bio to submit the merged bio.
4. Change f2fs_readpage to f2fs_submit_page_bio.
5. Introduce f2fs_submit_page_mbio to integrate previous submit_read_page and
submit_write_page.
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com >
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This patch inserts information of bio types in more detail.
So, we can now see REQ_META and REQ_PRIO too.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This patch adds a tracepoint for f2fs_submit_read_bio.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: integrate tracepoints of f2fs_submit_read(_write)_bio]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This patch adds a tracepoint for submit_read_page.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: integrate tracepoints of f2fs_submit_read(_write)_page]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Add a generic event_command.set_trigger_filter() op implementation and
have the current set of trigger commands use it - this essentially
gives them all support for filters.
Syntactically, filters are supported by adding 'if <filter>' just
after the command, in which case only events matching the filter will
invoke the trigger. For example, to add a filter to an
enable/disable_event command:
echo 'enable_event:system:event if common_pid == 999' > \
.../othersys/otherevent/trigger
The above command will only enable the system:event event if the
common_pid field in the othersys:otherevent event is 999.
As another example, to add a filter to a stacktrace command:
echo 'stacktrace if common_pid == 999' > \
.../somesys/someevent/trigger
The above command will only trigger a stacktrace if the common_pid
field in the event is 999.
The filter syntax is the same as that described in the 'Event
filtering' section of Documentation/trace/events.txt.
Because triggers can now use filters, the trigger-invoking logic needs
to be moved in those cases - e.g. for ftrace_raw_event_calls, if a
trigger has a filter associated with it, the trigger invocation now
needs to happen after the { assign; } part of the call, in order for
the trigger condition to be tested.
There's still a SOFT_DISABLED-only check at the top of e.g. the
ftrace_raw_events function, so when an event is soft disabled but not
because of the presence of a trigger, the original SOFT_DISABLED
behavior remains unchanged.
There's also a bit of trickiness in that some triggers need to avoid
being invoked while an event is currently in the process of being
logged, since the trigger may itself log data into the trace buffer.
Thus we make sure the current event is committed before invoking those
triggers. To do that, we split the trigger invocation in two - the
first part (event_triggers_call()) checks the filter using the current
trace record; if a command has the post_trigger flag set, it sets a
bit for itself in the return value, otherwise it directly invoks the
trigger. Once all commands have been either invoked or set their
return flag, event_triggers_call() returns. The current record is
then either committed or discarded; if any commands have deferred
their triggers, those commands are finally invoked following the close
of the current event by event_triggers_post_call().
To simplify the above and make it more efficient, the TRIGGER_COND bit
is introduced, which is set only if a soft-disabled trigger needs to
use the log record for filter testing or needs to wait until the
current log record is closed.
The syscall event invocation code is also changed in analogous ways.
Because event triggers need to be able to create and free filters,
this also adds a couple external wrappers for the existing
create_filter and free_filter functions, which are too generic to be
made extern functions themselves.
Link: http://lkml.kernel.org/r/7164930759d8719ef460357f143d995406e4eead.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add a 'trigger' file for each trace event, enabling 'trace event
triggers' to be set for trace events.
'trace event triggers' are patterned after the existing 'ftrace
function triggers' implementation except that triggers are written to
per-event 'trigger' files instead of to a single file such as the
'set_ftrace_filter' used for ftrace function triggers.
The implementation is meant to be entirely separate from ftrace
function triggers, in order to keep the respective implementations
relatively simple and to allow them to diverge.
The event trigger functionality is built on top of SOFT_DISABLE
functionality. It adds a TRIGGER_MODE bit to the ftrace_event_file
flags which is checked when any trace event fires. Triggers set for a
particular event need to be checked regardless of whether that event
is actually enabled or not - getting an event to fire even if it's not
enabled is what's already implemented by SOFT_DISABLE mode, so trigger
mode directly reuses that. Event trigger essentially inherit the soft
disable logic in __ftrace_event_enable_disable() while adding a bit of
logic and trigger reference counting via tm_ref on top of that in a
new trace_event_trigger_enable_disable() function. Because the base
__ftrace_event_enable_disable() code now needs to be invoked from
outside trace_events.c, a wrapper is also added for those usages.
The triggers for an event are actually invoked via a new function,
event_triggers_call(), and code is also added to invoke them for
ftrace_raw_event calls as well as syscall events.
The main part of the patch creates a new trace_events_trigger.c file
to contain the trace event triggers implementation.
The standard open, read, and release file operations are implemented
here.
The open() implementation sets up for the various open modes of the
'trigger' file. It creates and attaches the trigger iterator and sets
up the command parser. If opened for reading set up the trigger
seq_ops.
The read() implementation parses the event trigger written to the
'trigger' file, looks up the trigger command, and passes it along to
that event_command's func() implementation for command-specific
processing.
The release() implementation does whatever cleanup is needed to
release the 'trigger' file, like releasing the parser and trigger
iterator, etc.
A couple of functions for event command registration and
unregistration are added, along with a list to add them to and a mutex
to protect them, as well as an (initially empty) registration function
to add the set of commands that will be added by future commits, and
call to it from the trace event initialization code.
also added are a couple trigger-specific data structures needed for
these implementations such as a trigger iterator and a struct for
trigger-specific data.
A couple structs consisting mostly of function meant to be implemented
in command-specific ways, event_command and event_trigger_ops, are
used by the generic event trigger command implementations. They're
being put into trace.h alongside the other trace_event data structures
and functions, in the expectation that they'll be needed in several
trace_event-related files such as trace_events_trigger.c and
trace_events.c.
The event_command.func() function is meant to be called by the trigger
parsing code in order to add a trigger instance to the corresponding
event. It essentially coordinates adding a live trigger instance to
the event, and arming the triggering the event.
Every event_command func() implementation essentially does the
same thing for any command:
- choose ops - use the value of param to choose either a number or
count version of event_trigger_ops specific to the command
- do the register or unregister of those ops
- associate a filter, if specified, with the triggering event
The reg() and unreg() ops allow command-specific implementations for
event_trigger_op registration and unregistration, and the
get_trigger_ops() op allows command-specific event_trigger_ops
selection to be parameterized. When a trigger instance is added, the
reg() op essentially adds that trigger to the triggering event and
arms it, while unreg() does the opposite. The set_filter() function
is used to associate a filter with the trigger - if the command
doesn't specify a set_filter() implementation, the command will ignore
filters.
Each command has an associated trigger_type, which serves double duty,
both as a unique identifier for the command as well as a value that
can be used for setting a trigger mode bit during trigger invocation.
The signature of func() adds a pointer to the event_command struct,
used to invoke those functions, along with a command_data param that
can be passed to the reg/unreg functions. This allows func()
implementations to use command-specific blobs and supports code
re-use.
The event_trigger_ops.func() command corrsponds to the trigger 'probe'
function that gets called when the triggering event is actually
invoked. The other functions are used to list the trigger when
needed, along with a couple mundane book-keeping functions.
This also moves event_file_data() into trace.h so it can be used
outside of trace_events.c.
Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Idea-by: Steve Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
machines are sporting a new extended error logging capability which, if
done properly in the BIOS, makes a corresponding EDAC module redundant,
from Gong Chen.
* PCIe AER tracepoint severity levels fix, from Rui Wang.
* Error path correction for the mce device init, from Levente Kurusa.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
iQIcBAABAgAGBQJSrCysAAoJEBLB8Bhh3lVK1ikP/0hKY1Kk4tjbSta9A9Z8LdQG
9F5JzEny47DpTrLaKij7MqAlbYFO8sSm7Zw0CEztTF7Ou/H37GAuxhMlB8ECMGOm
Dzu53X1rySTna9mB+1gyXXd+pJypp/oe18/o16rw1QKjI9o2Kfgwfj7lKvytR549
kDM1dhxEImQIS5cpJPkOPbcpVlSqYN7BnK9/Qx3h0W70httT/8qrr9xVtVL7wjOT
auTA0R5/TkV06FtxyfHUNULEWTSP+2yNP/iJbusR6f4Jk1j0XmyCFr0BYOkPA1UO
9+wC9+2R+r7rJw8MBfMzNmPrRzDJHdaiHPwYqse05yewRHfRHe5cgZWJYbL8Qv0u
2WOX+fY12EfDYlihcOYtlupRzhGfGKRsaRpSuG1zX87ctDxAfNZencv4hnaJvfqG
Xk6ggIX6tHKEivO2gmaPsmhoKveh0zcozUs+wgh/tvV5QB6ioFCjzHfSEsix5+BH
ryyg1ri7IZnh92g3UuSUpE0OCbAquMfI7XIJo+kFs0u79dZTL/kD3wVu6oYazwdy
yTrvIq7Bq5cMWnnni5w7dIU09ef2uvDgyHyAS6+RiqaQxhYFsW8/yx2zJrIloWRs
7txz6t3CVmWFiejIg2gw6KyjaG6pXRBkDkI1XU6T+bKLb31ojx2+i9UKIIUeRZTB
iisWAOI6ZSdt4eAkgeaI
=r//I
-----END PGP SIGNATURE-----
Merge tag 'ras_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras
Pull RAS updates from Borislav Petkov:
* Add the functionality to override error reporting agents as some
machines are sporting a new extended error logging capability which, if
done properly in the BIOS, makes a corresponding EDAC module redundant,
from Gong Chen.
* PCIe AER tracepoint severity levels fix, from Rui Wang.
* Error path correction for the mce device init, from Levente Kurusa.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add tracepoints to the QBUF and DQBUF ioctls to enable rudimentary
performance measurements using standard kernel tracers.
[m.chehab@samsung.com: CodingStyle fixes (whitespacing)]
Signed-off-by: Wade Farnsworth <wade_farnsworth@mentor.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
There's inconsistency between dmesg and the trace event output.
When dmesg says "severity=Corrected", the trace event says
"severity=Fatal". What happens is that HW_EVENT_ERR_CORRECTED is
defined in edac.h:
enum hw_event_mc_err_type {
HW_EVENT_ERR_CORRECTED,
HW_EVENT_ERR_UNCORRECTED,
HW_EVENT_ERR_FATAL,
HW_EVENT_ERR_INFO,
};
while aer_print_error() uses aer_error_severity_string[] defined as:
static const char *aer_error_severity_string[] = {
"Uncorrected (Non-Fatal)",
"Uncorrected (Fatal)",
"Corrected"
};
In this case dmesg is correct because info->severity is assigned in
aer_isr_one_error() using the definitions in include/linux/ras.h:
Signed-off-by: Rui Wang <rui.y.wang@intel.com>
Acked-by: Ethan Zhao <ethan.kernel@gmail.com>
Link: http://lkml.kernel.org/r/CANVTcTaP18CiGOSEcX5Ch_wPw9mEhkgokfp+d+ZOMFD+Ce4juA@mail.gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull perf fixes from Ingo Molnar:
"Misc kernel and tooling fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools lib traceevent: Fix conversion of pointer to integer of different size
perf/trace: Properly use u64 to hold event_id
perf: Remove fragile swevent hlist optimization
ftrace, perf: Avoid infinite event generation loop
tools lib traceevent: Fix use of multiple options in processing field
perf header: Fix possible memory leaks in process_group_desc()
perf header: Fix bogus group name
perf tools: Tag thread comm as overriden
Pull btrfs fixes from Chris Mason:
"Almost all of these are bug fixes. Dave Sterba's documentation update
is the big exception because he removed our promises to set any
machine running Btrfs on fire"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Documentation: filesystems: update btrfs tools section
Documentation: filesystems: add new btrfs mount options
btrfs: update kconfig help text
btrfs: fix bio_size_ok() for max_sectors > 0xffff
btrfs: Use trace condition for get_extent tracepoint
btrfs: fix typo in the log message
Btrfs: fix list delete warning when removing ordered root from the list
Btrfs: print bytenr instead of page pointer in check-int
Btrfs: remove dead codes from ctree.h
Btrfs: don't wait for ordered data outside desired range
Btrfs: fix lockdep error in async commit
Btrfs: avoid heavy operations in btrfs_commit_super
Btrfs: fix __btrfs_start_workers retval
Btrfs: disable online raid-repair on ro mounts
Btrfs: do not inc uncorrectable_errors counter on ro scrubs
Btrfs: only drop modified extents if we logged the whole inode
Btrfs: make sure to copy everything if we rename
Btrfs: don't BUG_ON() if we get an error walking backrefs
Doing an if statement to test some condition to know if we should
trigger a tracepoint is pointless when tracing is disabled. This just
adds overhead and wastes a branch prediction. This is why the
TRACE_EVENT_CONDITION() was created. It places the check inside the jump
label so that the branch does not happen unless tracing is enabled.
That is, instead of doing:
if (em)
trace_btrfs_get_extent(root, em);
Which is basically this:
if (em)
if (static_key(trace_btrfs_get_extent)) {
Using a TRACE_EVENT_CONDITION() we can just do:
trace_btrfs_get_extent(root, em);
And the condition trace event will do:
if (static_key(trace_btrfs_get_extent)) {
if (em) {
...
The static key is a non conditional jump (or nop) that is faster than
having to check if em is NULL or not.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Vince's perf-trinity fuzzer found yet another 'interesting' problem.
When we sample the irq_work_exit tracepoint with period==1 (or
PERF_SAMPLE_PERIOD) and we add an fasync SIGNAL handler we create an
infinite event generation loop:
,-> <IPI>
| irq_work_exit() ->
| trace_irq_work_exit() ->
| ...
| __perf_event_overflow() -> (due to fasync)
| irq_work_queue() -> (irq_work_list must be empty)
'--------- arch_irq_work_raise()
Similar things can happen due to regular poll() wakeups if we exceed
the ring-buffer wakeup watermark, or have an event_limit.
To avoid this, dis-allow sampling this particular tracepoint.
In order to achieve this, create a special perf_perm function pointer
for each event and call this (when set) on trying to create a
tracepoint perf event.
[ roasted: use expr... to allow for ',' in your expression ]
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20131114152304.GC5364@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The only real feature that was added this release is from Namhyung Kim,
who introduced "set_graph_notrace" filter that lets you run the function
graph tracer and not trace particular functions and their call chain.
Tom Zanussi added some updates to the ftrace multibuffer tracing that
made it more consistent with the top level tracing.
One of the fixes for perf function tracing required an API change in
RCU; the addition of "rcu_is_watching()". As Paul McKenney is pushing
that change in this release too, he gave me a branch that included
all the changes to get that working, and I pulled that into my tree
in order to complete the perf function tracing fix.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQEcBAABAgAGBQJSgX5SAAoJEKQekfcNnQGulUAH/jORqJrKaNAulmZ314VsAqfa
zMtF5UAAPf7kqc3AN/jtFrhJUNEfxWOo7A4r0FsM/rKdWJF+98GA6aqYVD+XoWFt
+36fg1enxbXUjixQ96Uh+o1+BJUgYDqljuWzqSu/oiXWfWwl8+WL4kcbhb+V9WcF
SpdzLCWVZRfhyDiN3+0zvyQ8RSG2Pd7CWn9zroI0e4sxGo0Ki6JUnIcXtZGOBDOQ
IIZdjXvGSfpJ+3u3XvRPXJcltRCtOsVWxYzrmvRlmHDW5QMe1+WmmrlojTePrLaJ
xn8+3WINqetAR+ZQnazbpt1XzJzKa8QtFgpiN0kT6qL7cg3N1Owc4vLGohl7wok=
=Nesf
-----END PGP SIGNATURE-----
Merge tag 'trace-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing update from Steven Rostedt:
"This batch of changes is mostly clean ups and small bug fixes. The
only real feature that was added this release is from Namhyung Kim,
who introduced "set_graph_notrace" filter that lets you run the
function graph tracer and not trace particular functions and their
call chain.
Tom Zanussi added some updates to the ftrace multibuffer tracing that
made it more consistent with the top level tracing.
One of the fixes for perf function tracing required an API change in
RCU; the addition of "rcu_is_watching()". As Paul McKenney is pushing
that change in this release too, he gave me a branch that included all
the changes to get that working, and I pulled that into my tree in
order to complete the perf function tracing fix"
* tag 'trace-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Add rcu annotation for syscall trace descriptors
tracing: Do not use signed enums with unsigned long long in fgragh output
tracing: Remove unused function ftrace_off_permanent()
tracing: Do not assign filp->private_data to freed memory
tracing: Add helper function tracing_is_disabled()
tracing: Open tracer when ftrace_dump_on_oops is used
tracing: Add support for SOFT_DISABLE to syscall events
tracing: Make register/unregister_ftrace_command __init
tracing: Update event filters for multibuffer
recordmcount.pl: Add support for __fentry__
ftrace: Have control op function callback only trace when RCU is watching
rcu: Do not trace rcu_is_watching() functions
ftrace/x86: skip over the breakpoint for ftrace caller
trace/trace_stat: use rbtree postorder iteration helper instead of opencoding
ftrace: Add set_graph_notrace filter
ftrace: Narrow down the protected area of graph_lock
ftrace: Introduce struct ftrace_graph_data
ftrace: Get rid of ftrace_graph_filter_enabled
tracing: Fix potential out-of-bounds in trace_get_user()
tracing: Show more exact help information about snapshot
the following areas: performance, avoiding waste of entropy, better
tracking of entropy estimates, support for non-x86 platforms that have
a register which can't be used for fine-grained timekeeping, but which
might be good enough for the random driver.
Also add some printk's so that we can see how quickly /dev/urandom can
get initialized, and when programs try to use /dev/urandom before it
is fully initialized (since this could be a security issue). This
shouldn't be an issue on x86 desktop/laptops --- a test on my Lenovo
T430s laptop shows that /dev/urandom is getting fully initialized
approximately two seconds before the root file system is mounted
read/write --- this may be an issue with ARM and MIPS embedded/mobile
systems, though. These printk's will be a useful canary before
potentially adding a future change to start blocking processes which
try to read from /dev/urandom before it is initialized, which is
something FreeBSD does already for security reasons, and which
security folks have been agitating for Linux to also adopt.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABCAAGBQJShC4MAAoJENNvdpvBGATwC0QQAMujsIxTZnsHwQrbb5eJf1kD
74TwQyEfWw5qnGQrc8JOoAbe1MG7C4QlfHxRsWxvCD8G+Mft4Q5ZgZOt0/ecAGD6
Tid58EaZGSfK9+YE6jgvJFekQADCREdPSxBASJ3cECT6dXXBX9IqR9gbAK02mM+w
QZdbgWBMsPJZiHSsCNeRbZ9oIiPdcNDsMJwzJhirPUeAnKCaX3z+LWc3XcMw7wYi
q5cSl0ENZd6QsBKs37A1ol5BtLEsoot2t3HKdnpOBsDQKSJ712KduwN5jUfs6h9D
0fqmVHwfKsge+D8/3NgBKz+yWLQnGkuB4Ibo+09BZXwH3rYU1/gKm0iLNi0yQ5fV
73bn4pqF6cZdDNgj0Ic+MyYAW+S/NOQ6TcF/3eSAPW6z/wHZOfZ2njCh1GEHBOKI
6iZZu+Ek7QyFJ/z5Fr1bXFJR7V99r7hRD3gwMCMZ/mjhloB2cyD0a2A9kFP85ykI
I4tFEnq0FpX/K60ag4hiLnqVx/TsmbdMoz+8OpQckHgQJrZMuRRf1d+T4au47Y6K
uXGLpSuvkALYW2koo2OoO2d873N/89fqFL8lI8Iy0YlgAxxxm++gl1Mql/E1wPOa
5jB0lW/jex/CquE7meTgRlM/fTU/HVbe3608ZNUYBJUHS9K/PaSnCCu2ya8/TsSW
xeVS/vMnNvtGerdEIyKm
=wla0
-----END PGP SIGNATURE-----
Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull /dev/random changes from Ted Ts'o:
"The /dev/random changes for 3.13 including a number of improvements in
the following areas: performance, avoiding waste of entropy, better
tracking of entropy estimates, support for non-x86 platforms that have
a register which can't be used for fine-grained timekeeping, but which
might be good enough for the random driver.
Also add some printk's so that we can see how quickly /dev/urandom can
get initialized, and when programs try to use /dev/urandom before it
is fully initialized (since this could be a security issue). This
shouldn't be an issue on x86 desktop/laptops --- a test on my Lenovo
T430s laptop shows that /dev/urandom is getting fully initialized
approximately two seconds before the root file system is mounted
read/write --- this may be an issue with ARM and MIPS embedded/mobile
systems, though. These printk's will be a useful canary before
potentially adding a future change to start blocking processes which
try to read from /dev/urandom before it is initialized, which is
something FreeBSD does already for security reasons, and which
security folks have been agitating for Linux to also adopt"
* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: add debugging code to detect early use of get_random_bytes()
random: initialize the last_time field in struct timer_rand_state
random: don't zap entropy count in rand_initialize()
random: printk notifications for urandom pool initialization
random: make add_timer_randomness() fill the nonblocking pool first
random: convert DEBUG_ENT to tracepoints
random: push extra entropy to the output pools
random: drop trickle mode
random: adjust the generator polynomials in the mixing function slightly
random: speed up the fast_mix function by a factor of four
random: cap the rate which the /dev/urandom pool gets reseeded
random: optimize the entropy_store structure
random: optimize spinlock use in add_device_randomness()
random: fix the tracepoint for get_random_bytes(_arch)
random: account for entropy loss due to overwrites
random: allow fractional bits to be tracked
random: statically compute poolbitshift, poolbytes, poolbits
random: mix in architectural randomness earlier in extract_buf()
Pull second round of block driver updates from Jens Axboe:
"As mentioned in the original pull request, the bcache bits were pulled
because of their dependency on the immutable bio vecs. Kent re-did
this part and resubmitted it, so here's the 2nd round of (mostly)
driver updates for 3.13. It contains:
- The bcache work from Kent.
- Conversion of virtio-blk to blk-mq. This removes the bio and request
path, and substitutes with the blk-mq path instead. The end result
almost 200 deleted lines. Patch is acked by Asias and Christoph, who
both did a bunch of testing.
- A removal of bootmem.h include from Grygorii Strashko, part of a
larger series of his killing the dependency on that header file.
- Removal of __cpuinit from blk-mq from Paul Gortmaker"
* 'for-linus' of git://git.kernel.dk/linux-block: (56 commits)
virtio_blk: blk-mq support
blk-mq: remove newly added instances of __cpuinit
bcache: defensively handle format strings
bcache: Bypass torture test
bcache: Delete some slower inline asm
bcache: Use ida for bcache block dev minor
bcache: Fix sysfs splat on shutdown with flash only devs
bcache: Better full stripe scanning
bcache: Have btree_split() insert into parent directly
bcache: Move spinlock into struct time_stats
bcache: Kill sequential_merge option
bcache: Kill bch_next_recurse_key()
bcache: Avoid deadlocking in garbage collection
bcache: Incremental gc
bcache: Add make_btree_freeing_key()
bcache: Add btree_node_write_sync()
bcache: PRECEDING_KEY()
bcache: bch_(btree|extent)_ptr_invalid()
bcache: Don't bother with bucket refcount for btree node allocations
bcache: Debug code improvements
...
This time the updates contain:
* Tracepoints for certain IOMMU-API functions to make
their use easier to debug
* A tracepoint for IOMMU page faults to make it easier
to get them in user space
* Updates and fixes for the new ARM SMMU driver after
the first hardware showed up
* Various other fixes and cleanups in other IOMMU drivers
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJShVQAAAoJECvwRC2XARrj4T8P/2C/aej9QoEZhZRsJbClt7d6
6j6VoAYzFGQ5KKGuIXH/qmJqQKrDhRq7O/dP6XZEFYTDiyAcpLPK9sZ3eovNrur6
xW7TIpewZczEOPY0sz7Hkg90DgP0DnU37fELA0oYoUe55jQ0uZrcXmptcWlQssei
UZ6Cx2x9ebVcvPz2Ge7cNuDT8FXpu2MGNR7FLlh49EarFwMkl/al5oFuTcnmAojO
ypsafA5JBmsjhu3VpiI+VolZMEnYzUtZlIjv44cHw891RL5iQkcxVT/UWC8q3jHW
+OZZci21/MN3X4f2GcQUE5lEJTLX+mcmlGRoDF6B4Lh4n0IZGikyNThZMPRU1Q6x
6Ab76qHhOJtcGnxWcMiEbReUC6oPRFyr8YzTrJJfNp6iTMNgXgISKwL6UV1A7Lha
pZDXjAzREgxe8FbU3JZGfgcMg7WlnN/Y33R5E/UGwXK/MDAL0BCwNV4PBE0LCbtH
2qCzBC3TIWF7NMbIp0GnD8cbJJRO7c1hIZkVNRUwbUXkrMT75CfNIhq3l9xeWAIF
ooXvNa+MO0uJPQ/0IAJc5+AGBEEiPnvXEnp8XfTWE6S8dkA26LokKslI6fsZE27s
r0P+dHhc1OiHsIAngYqetWXZ/OdeNMfeBhIWeiKj2VKrT8MG8e/tdO9ICAHkVQSt
dnUAmLQqyR41hcI5hFEu
=IWTK
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"This time the updates contain:
- Tracepoints for certain IOMMU-API functions to make their use
easier to debug
- A tracepoint for IOMMU page faults to make it easier to get them in
user space
- Updates and fixes for the new ARM SMMU driver after the first
hardware showed up
- Various other fixes and cleanups in other IOMMU drivers"
* tag 'iommu-updates-v3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (26 commits)
iommu/shmobile: Enable the driver on all ARM platforms
iommu/tegra-smmu: Staticize tegra_smmu_pm_ops
iommu/tegra-gart: Staticize tegra_gart_pm_ops
iommu/vt-d: Use list_for_each_entry_safe() for dmar_domain->devices traversal
iommu/vt-d: Use for_each_drhd_unit() instead of list_for_each_entry()
iommu/vt-d: Fixed interaction of VFIO_IOMMU_MAP_DMA with IOMMU address limits
iommu/arm-smmu: Clear global and context bank fault status registers
iommu/arm-smmu: Print context fault information
iommu/arm-smmu: Check for num_context_irqs > 0 to avoid divide by zero exception
iommu/arm-smmu: Refine check for proper size of mapped region
iommu/arm-smmu: Switch to subsys_initcall for driver registration
iommu/arm-smmu: use relaxed accessors where possible
iommu/arm-smmu: replace devm_request_and_ioremap by devm_ioremap_resource
iommu: Remove stack trace from broken irq remapping warning
iommu: Change iommu driver to call io_page_fault trace event
iommu: Add iommu_error class event to iommu trace
iommu/tegra: gart: cleanup devm_* functions usage
iommu/tegra: Print phys_addr_t using %pa
iommu: No need to pass '0x' when '%pa' is used
iommu: Change iommu driver to call unmap trace event
...
side: the HV and emulation flavors can now coexist in a single kernel
is probably the most interesting change from a user point of view.
On the x86 side there are nested virtualization improvements and a
few bugfixes. ARM got transparent huge page support, improved
overcommit, and support for big endian guests.
Finally, there is a new interface to connect KVM with VFIO. This
helps with devices that use NoSnoop PCI transactions, letting the
driver in the guest execute WBINVD instructions. This includes
some nVidia cards on Windows, that fail to start without these
patches and the corresponding userspace changes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJShPAhAAoJEBvWZb6bTYbyl48P/297GgmELHAGBgjvb6q7yyGu
L8+eHjKbh4XBAkPwyzbvUjuww5z2hM0N3JQ0BDV9oeXlO+zwwCEns/sg2Q5/NJXq
XxnTeShaKnp9lqVBnE6G9rAOUWKoyLJ2wItlvUL8JlaO9xJ0Vmk0ta4n2Nv5GqDp
db6UD7vju6rHtIAhNpvvAO51kAOwc01xxRixCVb7KUYOnmO9nvpixzoI/S0Rp1gu
w/OWMfCosDzBoT+cOe79Yx1OKcpaVW94X6CH1s+ShCw3wcbCL2f13Ka8/E3FIcuq
vkZaLBxio7vjUAHRjPObw0XBW4InXEbhI1DjzIvm8dmc4VsgmtLQkTCG8fj+jINc
dlHQUq6Do+1F4zy6WMBUj8tNeP1Z9DsABp98rQwR8+BwHoQpGQBpAxW0TE0ZMngC
t1caqyvjZ5pPpFUxSrAV+8Kg4AvobXPYOim0vqV7Qea07KhFcBXLCfF7BWdwq/Jc
0CAOlsLL4mHGIQWZJuVGw0YGP7oATDCyewlBuDObx+szYCoV4fQGZVBEL0KwJx/1
7lrLN7JWzRyw6xTgJ5VVwgYE1tUY4IFQcHu7/5N+dw8/xg9KWA3f4PeMavIKSf+R
qteewbtmQsxUnvuQIBHLs8NRWPnBPy+F3Sc2ckeOLIe4pmfTte6shtTXcLDL+LqH
NTmT/cfmYp2BRkiCfCiS
=rWNf
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM changes from Paolo Bonzini:
"Here are the 3.13 KVM changes. There was a lot of work on the PPC
side: the HV and emulation flavors can now coexist in a single kernel
is probably the most interesting change from a user point of view.
On the x86 side there are nested virtualization improvements and a few
bugfixes.
ARM got transparent huge page support, improved overcommit, and
support for big endian guests.
Finally, there is a new interface to connect KVM with VFIO. This
helps with devices that use NoSnoop PCI transactions, letting the
driver in the guest execute WBINVD instructions. This includes some
nVidia cards on Windows, that fail to start without these patches and
the corresponding userspace changes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits)
kvm, vmx: Fix lazy FPU on nested guest
arm/arm64: KVM: PSCI: propagate caller endianness to the incoming vcpu
arm/arm64: KVM: MMIO support for BE guest
kvm, cpuid: Fix sparse warning
kvm: Delete prototype for non-existent function kvm_check_iopl
kvm: Delete prototype for non-existent function complete_pio
hung_task: add method to reset detector
pvclock: detect watchdog reset at pvclock read
kvm: optimize out smp_mb after srcu_read_unlock
srcu: API for barrier after srcu read unlock
KVM: remove vm mmap method
KVM: IOMMU: hva align mapping page size
KVM: x86: trace cpuid emulation when called from emulator
KVM: emulator: cleanup decode_register_operand() a bit
KVM: emulator: check rex prefix inside decode_register()
KVM: x86: fix emulation of "movzbl %bpl, %eax"
kvm_host: typo fix
KVM: x86: emulate SAHF instruction
MAINTAINERS: add tree for kvm.git
Documentation/kvm: add a 00-INDEX file
...
- SWIOTLB has tracing added when doing bounce buffer.
- Xen ARM/ARM64 can use Xen-SWIOTLB. This work allows Linux to
safely program real devices for DMA operations when running as
a guest on Xen on ARM, without IOMMU support.*1
- xen_raw_printk works with PVHVM guests if needed.
Bug-fixes:
- Make memory ballooning work under HVM with large MMIO region.
- Inform hypervisor of MCFG regions found in ACPI DSDT.
- Remove deprecated IRQF_DISABLED.
- Remove deprecated __cpuinit.
[*1]:
"On arm and arm64 all Xen guests, including dom0, run with second stage
translation enabled. As a consequence when dom0 programs a device for a
DMA operation is going to use (pseudo) physical addresses instead
machine addresses. This work introduces two trees to track physical to
machine and machine to physical mappings of foreign pages. Local pages
are assumed mapped 1:1 (physical address == machine address). It
enables the SWIOTLB-Xen driver on ARM and ARM64, so that Linux can
translate physical addresses to machine addresses for dma operations
when necessary. " (Stefano).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
iQEcBAABAgAGBQJSgS86AAoJEFjIrFwIi8fJpY4H/R2gke1A1p9UvTwbkaDhgPs/
u/mkI6aH+ktgvu5QZNprki660uydtc4Ck7y8leeLGYw+ed1Ys559SJhRc/x8jBYZ
Hh2chnplld0LAjSpdIDTTePArE1xBo4Gz+fT0zc5cVh0leJwOXn92Kx8N5AWD/T3
gwH4Ok4K1dzZBIls7imM2AM/L1xcApcx3Dl/QpNcoePQtR4yLuPWMUbb3LM8pbUY
0B6ZVN4GOhtJ84z8HRKnh4uMnBYmhmky6laTlHVa6L+j1fv7aAPCdNbePjIt/Pvj
HVYB1O/ht73yHw0zGfK6lhoGG8zlu+Q7sgiut9UsGZZfh34+BRKzNTypqJ3ezQo=
=xc43
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.13-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull Xen updates from Konrad Rzeszutek Wilk:
"This has tons of fixes and two major features which are concentrated
around the Xen SWIOTLB library.
The short <blurb> is that the tracing facility (just one function) has
been added to SWIOTLB to make it easier to track I/O progress.
Additionally under Xen and ARM (32 & 64) the Xen-SWIOTLB driver
"is used to translate physical to machine and machine to physical
addresses of foreign[guest] pages for DMA operations" (Stefano) when
booting under hardware without proper IOMMU.
There are also bug-fixes, cleanups, compile warning fixes, etc.
The commit times for some of the commits is a bit fresh - that is b/c
we wanted to make sure we have the Ack's from the ARM folks - which
with the string of back-to-back conferences took a bit of time. Rest
assured - the code has been stewing in #linux-next for some time.
Features:
- SWIOTLB has tracing added when doing bounce buffer.
- Xen ARM/ARM64 can use Xen-SWIOTLB. This work allows Linux to
safely program real devices for DMA operations when running as a
guest on Xen on ARM, without IOMMU support. [*1]
- xen_raw_printk works with PVHVM guests if needed.
Bug-fixes:
- Make memory ballooning work under HVM with large MMIO region.
- Inform hypervisor of MCFG regions found in ACPI DSDT.
- Remove deprecated IRQF_DISABLED.
- Remove deprecated __cpuinit.
[*1]:
"On arm and arm64 all Xen guests, including dom0, run with second
stage translation enabled. As a consequence when dom0 programs a
device for a DMA operation is going to use (pseudo) physical
addresses instead machine addresses. This work introduces two trees
to track physical to machine and machine to physical mappings of
foreign pages. Local pages are assumed mapped 1:1 (physical address
== machine address). It enables the SWIOTLB-Xen driver on ARM and
ARM64, so that Linux can translate physical addresses to machine
addresses for dma operations when necessary. " (Stefano)"
* tag 'stable/for-linus-3.13-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (32 commits)
xen/arm: pfn_to_mfn and mfn_to_pfn return the argument if nothing is in the p2m
arm,arm64/include/asm/io.h: define struct bio_vec
swiotlb-xen: missing include dma-direction.h
pci-swiotlb-xen: call pci_request_acs only ifdef CONFIG_PCI
arm: make SWIOTLB available
xen: delete new instances of added __cpuinit
xen/balloon: Set balloon's initial state to number of existing RAM pages
xen/mcfg: Call PHYSDEVOP_pci_mmcfg_reserved for MCFG areas.
xen: remove deprecated IRQF_DISABLED
x86/xen: remove deprecated IRQF_DISABLED
swiotlb-xen: fix error code returned by xen_swiotlb_map_sg_attrs
swiotlb-xen: static inline xen_phys_to_bus, xen_bus_to_phys, xen_virt_to_bus and range_straddles_page_boundary
grant-table: call set_phys_to_machine after mapping grant refs
arm,arm64: do not always merge biovec if we are running on Xen
swiotlb: print a warning when the swiotlb is full
swiotlb-xen: use xen_dma_map/unmap_page, xen_dma_sync_single_for_cpu/device
xen: introduce xen_dma_map/unmap_page and xen_dma_sync_single_for_cpu/device
tracing/events: Fix swiotlb tracepoint creation
swiotlb-xen: use xen_alloc/free_coherent_pages
xen: introduce xen_alloc/free_coherent_pages
...
Pull core locking changes from Ingo Molnar:
"The biggest changes:
- add lockdep support for seqcount/seqlocks structures, this
unearthed both bugs and required extra annotation.
- move the various kernel locking primitives to the new
kernel/locking/ directory"
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
block: Use u64_stats_init() to initialize seqcounts
locking/lockdep: Mark __lockdep_count_forward_deps() as static
lockdep/proc: Fix lock-time avg computation
locking/doc: Update references to kernel/mutex.c
ipv6: Fix possible ipv6 seqlock deadlock
cpuset: Fix potential deadlock w/ set_mems_allowed
seqcount: Add lockdep functionality to seqcount/seqlock structures
net: Explicitly initialize u64_stats_sync structures for lockdep
locking: Move the percpu-rwsem code to kernel/locking/
locking: Move the lglocks code to kernel/locking/
locking: Move the rwsem code to kernel/locking/
locking: Move the rtmutex code to kernel/locking/
locking: Move the semaphore core to kernel/locking/
locking: Move the spinlock code to kernel/locking/
locking: Move the lockdep code to kernel/locking/
locking: Move the mutex code to kernel/locking/
hung_task debugging: Add tracepoint to report the hang
x86/locking/kconfig: Update paravirt spinlock Kconfig description
lockstat: Report avg wait and hold times
lockdep, x86/alternatives: Drop ancient lockdep fixup message
...
Pull ARM updates from Russell King:
"Included in this series are:
1. BE8 (modern big endian) changes for ARM from Ben Dooks
2. big.Little support from Nicolas Pitre and Dave Martin
3. support for LPAE systems with all system memory above 4GB
4. Perf updates from Will Deacon
5. Additional prefetching and other performance improvements from Will.
6. Neon-optimised AES implementation fro Ard.
7. A number of smaller fixes scattered around the place.
There is a rather horrid merge conflict in tools/perf - I was never
notified of the conflict because it originally occurred between Will's
tree and other stuff. Consequently I have a resolution which Will
forwarded me, which I'll forward on immediately after sending this
mail.
The other notable thing is I'm expecting some build breakage in the
crypto stuff on ARM only with Ard's AES patches. These were merged
into a stable git branch which others had already pulled, so there's
little I can do about this. The problem is caused because these
patches have a dependency on some code in the crypto git tree - I
tried requesting a branch I can pull to resolve these, and all I got
each time from the crypto people was "we'll revert our patches then"
which would only make things worse since I still don't have the
dependent patches. I've no idea what's going on there or how to
resolve that, and since I can't split these patches from the rest of
this pull request, I'm rather stuck with pushing this as-is or
reverting Ard's patches.
Since it should "come out in the wash" I've left them in - the only
build problems they seem to cause at the moment are with randconfigs,
and since it's a new feature anyway. However, if by -rc1 the
dependencies aren't in, I think it'd be best to revert Ard's patches"
I resolved the perf conflict roughly as per the patch sent by Russell,
but there may be some differences. Any errors are likely mine. Let's
see how the crypto issues work out..
* 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (110 commits)
ARM: 7868/1: arm/arm64: remove atomic_clear_mask() in "include/asm/atomic.h"
ARM: 7867/1: include: asm: use 'int' instead of 'unsigned long' for 'oldval' in atomic_cmpxchg().
ARM: 7866/1: include: asm: use 'long long' instead of 'u64' within atomic.h
ARM: 7871/1: amba: Extend number of IRQS
ARM: 7887/1: Don't smp_cross_call() on UP devices in arch_irq_work_raise()
ARM: 7872/1: Support arch_irq_work_raise() via self IPIs
ARM: 7880/1: Clear the IT state independent of the Thumb-2 mode
ARM: 7878/1: nommu: Implement dummy early_paging_init()
ARM: 7876/1: clear Thumb-2 IT state on exception handling
ARM: 7874/2: bL_switcher: Remove cpu_hotplug_driver_{lock,unlock}()
ARM: footbridge: fix build warnings for netwinder
ARM: 7873/1: vfp: clear vfp_current_hw_state for dying cpu
ARM: fix misplaced arch_virt_to_idmap()
ARM: 7848/1: mcpm: Implement cpu_kill() to synchronise on powerdown
ARM: 7847/1: mcpm: Factor out logical-to-physical CPU translation
ARM: 7869/1: remove unused XSCALE_PMU Kconfig param
ARM: 7864/1: Handle 64-bit memory in case of 32-bit phys_addr_t
ARM: 7863/1: Let arm_add_memory() always use 64-bit arguments
ARM: 7862/1: pcpu: replace __get_cpu_var_uses
ARM: 7861/1: cacheflush: consolidate single-CPU ARMv7 cache disabling code
...
Merge first patch-bomb from Andrew Morton:
"Quite a lot of other stuff is banked up awaiting further
next->mainline merging, but this batch contains:
- Lots of random misc patches
- OCFS2
- Most of MM
- backlight updates
- lib/ updates
- printk updates
- checkpatch updates
- epoll tweaking
- rtc updates
- hfs
- hfsplus
- documentation
- procfs
- update gcov to gcc-4.7 format
- IPC"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (269 commits)
ipc, msg: fix message length check for negative values
ipc/util.c: remove unnecessary work pending test
devpts: plug the memory leak in kill_sb
./Makefile: export initial ramdisk compression config option
init/Kconfig: add option to disable kernel compression
drivers: w1: make w1_slave::flags long to avoid memory corruption
drivers/w1/masters/ds1wm.cuse dev_get_platdata()
drivers/memstick/core/ms_block.c: fix unreachable state in h_msb_read_page()
drivers/memstick/core/mspro_block.c: fix attributes array allocation
drivers/pps/clients/pps-gpio.c: remove redundant of_match_ptr
kernel/panic.c: reduce 1 byte usage for print tainted buffer
gcov: reuse kbasename helper
kernel/gcov/fs.c: use pr_warn()
kernel/module.c: use pr_foo()
gcov: compile specific gcov implementation based on gcc version
gcov: add support for gcc 4.7 gcov format
gcov: move gcov structs definitions to a gcc version specific file
kernel/taskstats.c: return -ENOMEM when alloc memory fails in add_del_listener()
kernel/taskstats.c: add nla_nest_cancel() for failure processing between nla_nest_start() and nla_nest_end()
kernel/sysctl_binary.c: use scnprintf() instead of snprintf()
...
This patch-set includes the following major enhancement patches.
o add a sysfs to control reclaiming free segments
o enhance the f2fs global lock procedures
o enhance the victim selection flow
o wait for selected node blocks during fsync
o add some tracepoints
o add a config to remove abundant BUG_ONs
The other bug fixes are as follows.
o fix deadlock on acl operations
o fix some bugs with respect to orphan inodes
And, there are a bunch of cleanups.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJSgMZ/AAoJEEAUqH6CSFDSsL4P/Ri6GZyy5F0DGjAJElX825gO
gthRZ1uq1OAXUYDOEy150CsFgIiWeu2MxiOV15UnmX893a5cXrf32afoa/Cqx8GG
FVEYc5+dDdogezQCW6XSatQ4s7cDQDymyT2Mky4MJyAxhpYtvbpcyVI/OWdVcTwh
pqdJfsfuOikbOOL6VU2B5dDKwjc+6lgntdv/eICzNCH9NqHv8kxmm+h3NfaqUVrW
pK1irqsXrktcwLIrOH0c5ZpPcKPghJuw37oFpEw8MxYbTnpdrbLq4BKE/fRh8Fhf
R+sQgEoWZriE1SISHmYjWdt87hnFCk3wysl61Z/zkOxnYKebRBrjEiudzxAHDIGY
+I71ovpVCWe0uljdiTBpLQ/iN4p2fRMLjn7j1IsMzoG9yfVFduMaY70m1AOZI/7z
03QRpkmiRi7F8GYTSlPefsUUAnMYVDO6DzsyfHdxa8v+4UvWhSE4380L9DttNbCr
2/+NGRZ4kga6GSsMhdn2Bnm6i3TkMDJosu4USkv4qGR1SH1+S5dodwxfQdonPUZg
380kPkV7/gBYaMBSdrQFds3lh7g431gfYEfGSWt3vA14fFIWP7nIFpVIPGMM6/Sd
GFe6gqZ2JLatqJnQNwEjPsBPPsiCAt6exbg86fTCvrS+oyQTiv44FNOWbz7iTrxw
5nZQfQHSMhKtux7rpM/N
=Grs1
-----END PGP SIGNATURE-----
Merge tag 'for-f2fs-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"This patch-set includes the following major enhancement patches.
- add a sysfs to control reclaiming free segments
- enhance the f2fs global lock procedures
- enhance the victim selection flow
- wait for selected node blocks during fsync
- add some tracepoints
- add a config to remove abundant BUG_ONs
The other bug fixes are as follows.
- fix deadlock on acl operations
- fix some bugs with respect to orphan inodes
And, there are a bunch of cleanups"
* tag 'for-f2fs-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (42 commits)
f2fs: issue more large discard command
f2fs: fix memory leak after kobject init failed in fill_super
f2fs: cleanup waiting routine for writeback pages in cp
f2fs: avoid to use a NULL point in destroy_segment_manager
f2fs: remove unnecessary TestClearPageError when wait pages writeback
f2fs: update f2fs document
f2fs: avoid to wait all the node blocks during fsync
f2fs: check all ones or zeros bitmap with bitops for better mount performance
f2fs: change the method of calculating the number summary blocks
f2fs: fix calculating incorrect free size when update xattr in __f2fs_setxattr
f2fs: add an option to avoid unnecessary BUG_ONs
f2fs: introduce CONFIG_F2FS_CHECK_FS for BUG_ON control
f2fs: fix a deadlock during init_acl procedure
f2fs: clean up acl flow for better readability
f2fs: remove unnecessary segment bitmap updates
f2fs: add tracepoint for vm_page_mkwrite
f2fs: add tracepoint for set_page_dirty
f2fs: remove redundant set_page_dirty from write_compacted_summaries
f2fs: add reclaiming control by sysfs
f2fs: introduce f2fs_balance_fs_bg for some background jobs
...
In general, every tracepoint should be zero overhead if it is disabled.
However, trace_mm_page_alloc_extfrag() is one of exception. It evaluate
"new_type == start_migratetype" even if tracepoint is disabled.
However, the code can be moved into tracepoint's TP_fast_assign() and
TP_fast_assign exist exactly such purpose. This patch does it.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When there are processes heavily creating small files while sync(2) is
running, it can easily happen that quite some new files are created
between WB_SYNC_NONE and WB_SYNC_ALL pass of sync(2). That can happen
especially if there are several busy filesystems (remember that sync
traverses filesystems sequentially and waits in WB_SYNC_ALL phase on one
fs before starting it on another fs). Because WB_SYNC_ALL pass is slow
(e.g. causes a transaction commit and cache flush for each inode in
ext3), resulting sync(2) times are rather large.
The following script reproduces the problem:
function run_writers
{
for (( i = 0; i < 10; i++ )); do
mkdir $1/dir$i
for (( j = 0; j < 40000; j++ )); do
dd if=/dev/zero of=$1/dir$i/$j bs=4k count=4 &>/dev/null
done &
done
}
for dir in "$@"; do
run_writers $dir
done
sleep 40
time sync
Fix the problem by disregarding inodes dirtied after sync(2) was called
in the WB_SYNC_ALL pass. To allow for this, sync_inodes_sb() now takes
a time stamp when sync has started which is used for setting up work for
flusher threads.
To give some numbers, when above script is run on two ext4 filesystems
on simple SATA drive, the average sync time from 10 runs is 267.549
seconds with standard deviation 104.799426. With the patched kernel,
the average sync time from 10 runs is 2.995 seconds with standard
deviation 0.096.
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are no too intrusive changes in this update batch. The biggest
LOC is found in the new DICE driver, and other small changes are
scattered over the whole sound subtree (which is a common pattern).
Below are highlights:
- ALSA core:
* Memory allocation support with genpool
* Fix blocking in drain ioctl of compress_offload
- HD-audio:
* Improved AMD HDMI supports
* Intel HDMI detection improvements
* thinkpad_acpi mute-key integration
* New PCI ID, New ALC255,285,293 codecs, CX20952
- USB-audio:
* New buffer size management
* Clean up endpoint handling codes
- ASoC:
* Further work on the dmaengine helpers, including support for
configuring the parameters for DMA by reading the capabilities of
the DMA controller which removes some guesswork and magic numbers
from drivers.
* A refresh of the documentation.
* Conversions of many drivers to direct regmap API usage in order to
allow the ASoC level register I/O code to be removed, this will
hopefully be completed by v3.14.
* Support for using async register I/O in DAPM, reducing the time
taken to implement power transitions on systems that support it.
- Fireiwre: DICE driver
- Lots of small fixes for bugs reported by Coverity
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJSf2ycAAoJEGwxgFQ9KSmkVPcQAIenO8wxmHFyxHStQEt4GkM/
1BNk3V9MqAVv+ecjNPWrak+IUFY48gelUISfL1qIvlSl5pZ+FS+UEVSObczeI5Fp
aY1WDCypC3nfsIm4JCIF/Mv3CpE3eY0Gcxqy6OO87mEVs14rLl/Q0NUw2UVrxRQp
tu0dh6/C3Bjh8+qSnVnPVcLQG6tQsl7Wv71TyowL4ywom9yrx3uBT1qmqLftG8AH
Wjm2mpxj0dCGAqTcgiu4DMyTJw7kuTmLduDbhExqIApiaeB2o5ilZny/uQBrP32z
rdUiJm6cSmQ1jv7L0C0xR3vXv73rS73jXMYh2Qt/9iEZIZkwAhTy0Z7Jr5bMfPjP
I9hICYRGhfa0S2UJa7yd6Jy3qlnUSyCAU9StQlLIiA+e3Xg0a8yoTZFQ/qWSWzwL
UK584Wst/lCG8QWUwKV/3n/75ALcKZ1cVrBlcCvcKJwv6OKua7DK0XtDfGpsM5sz
tiXjyY6T8nh87x62z3/IGMHD43xRp6zmadgwvCzYLkcBbsDNQSQHqzvly0XXtLYb
4N0cEJjHjHDbiQXkWEreDZ/y9cUSv129GZWsnUQAsO1OoHQaf8hUQt5PxBeYGu9B
E60pERBNVvicajitdwL+GJ1WeqTkl3VnU8s/ucLXGoGb92Z0aWhqtrMAHCj9MybP
S2aL7q6otZ4k+Wgh3VKj
=lxuj
-----END PGP SIGNATURE-----
Merge tag 'sound-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"There are no too intrusive changes in this update batch. The biggest
LOC is found in the new DICE driver, and other small changes are
scattered over the whole sound subtree (which is a common pattern).
Below are highlights:
- ALSA core:
* Memory allocation support with genpool
* Fix blocking in drain ioctl of compress_offload
- HD-audio:
* Improved AMD HDMI supports
* Intel HDMI detection improvements
* thinkpad_acpi mute-key integration
* New PCI ID, New ALC255,285,293 codecs, CX20952
- USB-audio:
* New buffer size management
* Clean up endpoint handling codes
- ASoC:
* Further work on the dmaengine helpers, including support for
configuring the parameters for DMA by reading the capabilities of
the DMA controller which removes some guesswork and magic numbers
from drivers.
* A refresh of the documentation.
* Conversions of many drivers to direct regmap API usage in order
to allow the ASoC level register I/O code to be removed, this
will hopefully be completed by v3.14.
* Support for using async register I/O in DAPM, reducing the time
taken to implement power transitions on systems that support it.
- Firewire: DICE driver
- Lots of small fixes for bugs reported by Coverity"
* tag 'sound-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (382 commits)
ALSA: hda/realtek - Add new codec ALC255/ALC3234 UAJ supported
ALSA: hda - Apply MacBook fixups for CS4208 correctly
ASoC: fsl: imx-wm8962: remove an unneeded check
ASoC: fsl: imx-pcm-fiq: Remove unused 'runtime' variable
ALSA: hda/realtek - Make fixup regs persist after resume
ALSA: hda_intel: ratelimit "spurious response" message
ASoC: generic-dmaengine-pcm: Use SNDRV_DMA_TYPE_DEV_IRAM as default
ASoC: dapm: Use WARN_ON() instead of BUG_ON()
ASoC: wm_adsp: Fix BUG_ON() and WARN_ON() usages
ASoC: Replace BUG() with WARN()
ASoC: wm_hubs: Replace BUG() with WARN()
ASoC: wm8996: Replace BUG() with WARN()
ASoC: wm8962: Replace BUG() with WARN()
ASoC: wm8958: Replace BUG() with WARN()
ASoC: wm8904: Replace BUG() with WARN()
ASoC: wm8900: Replace BUG() with WARN()
ASoC: wm8350: Replace BUG() with WARN()
ASoC: txx9: Use WARN_ON() instead of BUG_ON()
ASoC: sh: Use WARN_ON() instead of BUG_ON()
ASoC: rcar: Use WARN_ON() instead of BUG_ON()
...
As well as the usual driver updates and cleanups there's a few
improvements to the core here:
- The start of some improvements to factor out more of the SPI message
loop into the core. Right now this is just simplifying the code a
bit but hopefully next time around we'll also have managed to roll
out some noticable performance improvements which drivers can take
advantage of.
- Support for loading modules for ACPI enumerated SPI devices.
- Managed registration for SPI controllers.
- Helper for another common I/O pattern.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
iQIcBAABAgAGBQJSd9ZZAAoJELSic+t+oim9/UkP/1sxozJ0bpnSLRTrI5B5b8Xt
x13r/Hb9WaAxhZW4C/lgWUS1J/S1k4uuJHYFfS3+a3lqF5ulww+vkSuNuF8V0fCJ
egFuO5iQJwA6Npw8IqSf+29geNX8mMXu881g42Znur+SLlkno5sw5Fl7izJ/gfzN
SGNNp9sSi8j59XcUvSZZbOYYjji2n78RCmWD+gdaS7HilDXDYO2Jnh6N7BJ24/6/
lin+SzVRhSTHHH8Gz8UBfgKwDPDFB38Z/DIUSfz1bJP6EnkLKCpq1NqRJE/a4Wqs
vWhWO6f7WFJID8qs/q6UNnBzGs8tIXpMMQtRgB4NcJYdG6V7Vl1qYYgEyKwdQE3L
M7nTLNLppfqhUh4xg0O957ifpW7WYiA7grL5skF+yZNUMCeBkIsCLh847i+w113t
qwqxw6sQHeZbIzDq3BXU7zKUXJ+XEERTFNBHC8lWqcIm/cD8xjhwuhMtebkc75GU
PFCMeIIFd6BWbUPghXZnyTsHEITxFAyDJbEMj+KqtiscMKaubmrQ1qENMoIzpJof
lBPjT78vFIY4A31v21l1FwD/E6BeI/+epZn6UHGfuoepeCaZjGfuGKBxSyY7KF/n
okGwLKrRn84w6zN98XuoHcbPRtl35cHdom1VHHELs/7gPq6wW7/mn0bsOXkK5WDp
txUO/nlCkAcXPo+hfVAM
=emSz
-----END PGP SIGNATURE-----
Merge tag 'spi-v3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"As well as the usual driver updates and cleanups there's a few
improvements to the core here:
- The start of some improvements to factor out more of the SPI
message loop into the core. Right now this is just simplifying the
code a bit but hopefully next time around we'll also have managed
to roll out some noticable performance improvements which drivers
can take advantage of.
- Support for loading modules for ACPI enumerated SPI devices.
- Managed registration for SPI controllers.
- Helper for another common I/O pattern"
* tag 'spi-v3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (116 commits)
spi/hspi: add device tree support
spi: atmel: fix return value check in atmel_spi_probe()
spi: spi-imx: only enable the clocks when we start to transfer a message
spi/s3c64xx: Fix doubled clock disable on suspend
spi/s3c64xx: Do not ignore return value of spi_master_resume/suspend
spi: spi-mxs: Use u32 instead of uint32_t
spi: spi-mxs: Don't set clock for each xfer
spi: spi-mxs: Clean up setup_transfer function
spi: spi-mxs: Remove check of spi mode bits
spi: spi-mxs: Fix race in setup method
spi: spi-mxs: Remove bogus setting of ssp clk rate field
spi: spi-mxs: Remove full duplex check, spi core already does it
spi: spi-mxs: Fix chip select control bits in DMA mode
spi: spi-mxs: Fix extra CS pulses and read mode in multi-transfer messages
spi: spi-mxs: Change flag arguments in txrx functions to bit flags
spi: spi-mxs: Always clear INGORE_CRC, to keep CS asserted
spi: spi-mxs: Remove mxs_spi_enable and mxs_spi_disable
spi: spi-mxs: Always set LOCK_CS
spi/s3c64xx: Add missing pm_runtime_put on setup fail
spi/s3c64xx: Add missing pm_runtime_set_active() call in probe()
...
Pull scheduler changes from Ingo Molnar:
"The main changes in this cycle are:
- (much) improved CONFIG_NUMA_BALANCING support from Mel Gorman, Rik
van Riel, Peter Zijlstra et al. Yay!
- optimize preemption counter handling: merge the NEED_RESCHED flag
into the preempt_count variable, by Peter Zijlstra.
- wait.h fixes and code reorganization from Peter Zijlstra
- cfs_bandwidth fixes from Ben Segall
- SMP load-balancer cleanups from Peter Zijstra
- idle balancer improvements from Jason Low
- other fixes and cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED
stop_machine: Fix race between stop_two_cpus() and stop_cpus()
sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus
sched: Fix asymmetric scheduling for POWER7
sched: Move completion code from core.c to completion.c
sched: Move wait code from core.c to wait.c
sched: Move wait.c into kernel/sched/
sched/wait: Fix __wait_event_interruptible_lock_irq_timeout()
sched: Avoid throttle_cfs_rq() racing with period_timer stopping
sched: Guarantee new group-entities always have weight
sched: Fix hrtimer_cancel()/rq->lock deadlock
sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining
sched: Fix race on toggling cfs_bandwidth_used
sched: Remove extra put_online_cpus() inside sched_setaffinity()
sched/rt: Fix task_tick_rt() comment
sched/wait: Fix build breakage
sched/wait: Introduce prepare_to_wait_event()
sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too
sched: Remove get_online_cpus() usage
sched: Fix race in migrate_swap_stop()
...
Pull RCU updates from Ingo Molnar:
"The main RCU changes in this cycle are:
- Idle entry/exit changes, to throttle callback execution and other
refinements to speed up kbuild, primarily to address performance
issues located by Tibor Billes.
- Grace-period related changes, primarily to aid in debugging,
inspired by an -rt debugging session.
- Code reorganization moving RCU's source files into its own
kernel/rcu/ directory.
- RCU documentation updates
- Miscellaneous fixes.
Note, the following commit:
5c889690aa mm: Place preemption point in do_mlockall() loop
is identical to the commit already in your tree via email:
22356f447c mm: Place preemption point in do_mlockall() loop
[ Your version of the changelog nicely demonstrates it how kernel oops
messages should be trimmed properly :-/ ]"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
rcu: Move RCU-related source code to kernel/rcu directory
rcu: Fix occurrence of "the the" in checklist.txt
kthread: Add pointer to vmstat-avoidance patch
rcu: Update stall-warning documentation
rcu: Consistent rcu_is_watching() naming
rcu: Change EXPORT_SYMBOL() to EXPORT_SYMBOL_GPL()
rcu: Is it safe to enter an RCU read-side critical section?
rcu: Throttle invoke_rcu_core() invocations due to non-lazy callbacks
rcu: Throttle rcu_try_advance_all_cbs() execution
rcu: Remove redundant code from rcu_cleanup_after_idle()
rcu: Fix CONFIG_RCU_NOCB_CPU_ALL panic on machines with sparse CPU mask
rcu: Avoid sparse warnings in rcu_nocb_wake trace event
rcu: Track rcu_nocb_kthread()'s sleeping and awakening
rcu: Distinguish between NOCB and non-NOCB rcu_callback trace events
rcu: Add tracing for rcuo no-CBs CPU wakeup handshake
rcu: Add tracing of normal (non-NOCB) grace-period requests
rcu: Add tracing to rcu_gp_kthread()
rcu: Flag lockless access to ->gp_flags with ACCESS_ONCE()
rcu: Prevent spurious-wakeup DoS attack on rcu_gp_kthread()
rcu: Improve grace-period start logic
...
With all the recent refactoring around struct btree op struct search has
gotten rather large.
But we can now easily break it up in a different way - we break out
struct btree_insert_op which is for inserting data into the cache, and
that's now what the copying gc code uses - struct search is now specific
to request.c
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Conflicts:
kernel/Makefile
There are conflicts in kernel/Makefile due to file moving in the
scheduler tree - resolve them.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are conflicts in lockdep.c due to RCU changes, and also the RCU
tree changes kernel/Makefile - so pre-merge it to ease the moving of
locking related .c files to kernel/locking/.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The trace event filters are still tied to event calls rather than
event files, which means you don't get what you'd expect when using
filters in the multibuffer case:
Before:
# echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
bytes_alloc > 8192
# mkdir /sys/kernel/debug/tracing/instances/test1
# echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
bytes_alloc > 2048
# cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
bytes_alloc > 2048
Setting the filter in tracing/instances/test1/events shouldn't affect
the same event in tracing/events as it does above.
After:
# echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
bytes_alloc > 8192
# mkdir /sys/kernel/debug/tracing/instances/test1
# echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
bytes_alloc > 8192
# cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
bytes_alloc > 2048
We'd like to just move the filter directly from ftrace_event_call to
ftrace_event_file, but there are a couple cases that don't yet have
multibuffer support and therefore have to continue using the current
event_call-based filters. For those cases, a new USE_CALL_FILTER bit
is added to the event_call flags, whose main purpose is to keep the
old behavior for those cases until they can be updated with
multibuffer support; at that point, the USE_CALL_FILTER flag (and the
new associated call_filter_check_discard() function) can go away.
The multibuffer support also made filter_current_check_discard()
redundant, so this change removes that function as well and replaces
it with filter_check_discard() (or call_filter_check_discard() as
appropriate).
Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Resolve cherry-picking conflicts:
Conflicts:
mm/huge_memory.c
mm/memory.c
mm/mprotect.c
See this upstream merge commit for more details:
52469b4fcd Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently check_hung_task() prints a warning if it detects the
problem, but it is not convenient to watch the system logs if
user-space wants to be notified about the hang.
Add the new trace_sched_process_hang() into check_hung_task(),
this way a user-space monitor can easily wait for the hang and
potentially resolve a problem.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Sullivan <dsulliva@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20131019161828.GA7439@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Page pinning is not mandatory in kvm async page fault processing since
after async page fault event is delivered to a guest it accesses page once
again and does its own GUP. Drop the FOLL_GET flag in GUP in async_pf
code, and do some simplifying in check/clear processing.
Suggested-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Gu zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: chai wen <chaiw.fnst@cn.fujitsu.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
The loops which SPI controller drivers use to process the list of transfers
in a spi_message are typically very similar and have some error prone areas
such as the handling of /CS. Help simplify drivers by factoring this code
out into the core - if drivers provide a transfer_one() function instead
of a transfer_one_message() function the core will handle processing at the
message level.
/CS can be controlled by either setting cs_gpio or providing a set_cs
function. If this is not possible for hardware reasons then both can be
omitted and the driver should continue to implement manual /CS handling.
This is a first step in refactoring and it is expected that there will be
further enhancements, for example factoring out of the mapping of transfers
for DMA and the initiation and completion of interrupt driven transfers.
Signed-off-by: Mark Brown <broonie@linaro.org>
Instead of using the random driver's ad-hoc DEBUG_ENT() mechanism, use
tracepoints instead. This allows for a much more fine-grained control
of which debugging mechanism which a developer might need, and unifies
the debugging messages with all of the existing tracepoints.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
As the input pool gets filled, start transfering entropy to the output
pools until they get filled. This allows us to use the output pools
to store more system entropy. Waste not, want not....
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Fix a problem where get_random_bytes_arch() was calling the tracepoint
get_random_bytes(). So add a new tracepoint for
get_random_bytes_arch(), and make get_random_bytes() and
get_random_bytes_arch() call their correct tracepoint.
Also, add a new tracepoint for add_device_randomness()
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQEcBAABAgAGBQJSUc9zAAoJEHm+PkMAQRiG9DMH/AtpuAF6LlMRPjrCeuJQ1pyh
T0IUO+CsLKO6qtM5IyweP8V6zaasNjIuW1+B6IwVIl8aOrM+M7CwRiKvpey26ldM
I8G2ron7hqSOSQqSQs20jN2yGAqQGpYIbTmpdGLAjQ350NNNvEKthbP5SZR5PAmE
UuIx5OGEkaOyZXvCZJXU9AZkCxbihlMSt2zFVxybq2pwnGezRUYgCigE81aeyE0I
QLwzzMVdkCxtZEpkdJMpLILAz22jN4RoVDbXRa2XC7dA9I2PEEXI9CcLzqCsx2Ii
8eYS+no2K5N2rrpER7JFUB2B/2X8FaVDE+aJBCkfbtwaYTV9UYLq3a/sKVpo1Cs=
=xSFJ
-----END PGP SIGNATURE-----
Merge tag 'v3.12-rc4' into sched/core
Merge Linux v3.12-rc4 to fix a conflict and also to refresh the tree
before applying more scheduler patches.
Conflicts:
arch/avr32/include/asm/Kbuild
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The unpacked_lun field in the SCSI target tracepoints should be
initialized with cmd->orig_fe_lun rather than cmd->se_lun->unpacked_lun
for two reasons:
- most importantly, if we are in the cmd_complete tracepoint
returning a check condition due to no LUN found, cmd->se_lun will
be NULL and we'll crash trying to dereference it.
- also, in any case, cmd->se_lun->unpacked_lun is an internal index
into the target's internal set of LUNs; cmd->orig_fe_lun is much
more useful and interesting, since it's the value the initiator
actually sent.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Cc: <stable@vger.kernel.org> # 3.11+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Provide tracepoints for the lifecycle of a message from submission to
completion and for the active time for masters to help with performance
analysis of SPI I/O.
Signed-off-by: Mark Brown <broonie@linaro.org>
Ftrace is currently not able to detect when SWIOTLB has to do double buffering.
Under Xen you can only see it indirectly in function_graph, when
xen_swiotlb_map_page() doesn't stop after range_straddles_page_boundary(), but
calls spinlock functions, memcpy() and xen_phys_to_bus() as well. This patch
introduces the swiotlb:swiotlb_bounced event, which also prints out the
following informations to help you find out why bouncing happened:
dev_name: 0000:08:00.0 dma_mask=ffffffffffffffff dev_addr=9149f000 size=32768
swiotlb_force=0
If you use Xen, and (dev_addr + size + 1) > dma_mask, the buffer is out of the
device's DMA range. If swiotlb_force == 1, you should really change the kernel
parameters. Otherwise, the buffer is not contiguous in mfn space.
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
[v1: Don't print 'swiotlb_force=X', just print swiotlb_force if it is enabled]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We need a few special preempt_count accessors:
- task_preempt_count() for when we're interested in the preemption
count of another (non-running) task.
- init_task_preempt_count() for properly initializing the preemption
count.
- init_idle_preempt_count() a special case of the above for the idle
threads.
With these no generic code ever touches thread_info::preempt_count
anymore and architectures could choose to remove it.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-jf5swrio8l78j37d06fzmo4r@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
iommu_error class event can be enabled to trigger when an iommu
error occurs. This trace event is intended to be called to report the
error information. Trace information includes driver name, device name,
iova, and flags.
iommu_error:io_page_fault
Signed-off-by: Shuah Khan <shuah.kh@samsung.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Add tracing feature to iommu to report various iommu events. Classes
iommu_group, iommu_device, and iommu_map_unmap are defined.
iommu_group class events can be enabled to trigger when devices get added
to and removed from an iommu group. Trace information includes iommu group
id and device name.
iommu:add_device_to_group
iommu:remove_device_from_group
iommu_device class events can be enabled to trigger when devices are attached
to and detached from a domain. Trace information includes device name.
iommu:attach_device_to_domain
iommu:detach_device_from_domain
iommu_map_unmap class events can be enabled to trigger when iommu map and
unmap iommu ops. Trace information includes iova, physical address (map event
only), and size.
iommu:map
iommu:unmap
Signed-off-by: Shuah Khan <shuah.kh@samsung.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
When tracing switching, an external tracer needs a way to bootstrap
its knowledge of the logical<->physical CPU mapping.
This patch adds a sysfs attribute trace_trigger. A write to this
attribute will generate a power:cpu_migrate_current event for each
online CPU, indicating the current physical CPU for each logical
CPU.
Activating or deactivating the switcher also generates these
events, so that the tracer knows about the resulting remapping of
affected CPUs.
Signed-off-by: Dave Martin <dave.martin@linaro.org>
This patch adds simple trace events to the b.L switcher code
to allow tracing of CPU migration events.
To make use of the trace events, you will need:
CONFIG_FTRACE=y
CONFIG_ENABLE_DEFAULT_TRACERS=y
The following events are added:
* power:cpu_migrate_begin
* power:cpu_migrate_finish
each with the following data:
u64 timestamp;
u32 cpu_hwid;
power:cpu_migrate_begin occurs immediately before the
switcher-specific migration operations start.
power:cpu_migrate_finish occurs immediately when migration is
completed.
The cpu_hwid field contains the ID fields of the MPIDR.
* For power:cpu_migrate_begin, cpu_hwid is the ID of the outbound
physical CPU (equivalent to (from_phys_cpu,from_phys_cluster)).
* For power:cpu_migrate_finish, cpu_hwid is the ID of the inbound
physical CPU (equivalent to (to_phys_cpu,to_phys_cluster)).
By design, the cpu_hwid field is masked in the same way as the
device tree cpu node reg property, allowing direct correlation to
the DT description of the hardware.
The timestamp is added in order to minimise timing noise. An
accurate system-wide clock should be used for generating this
(hopefully getnstimeofday is appropriate, but it could be changed).
It could be any monotonic shared clock, since the aim is to allow
accurate deltas to be computed. We don't necessarily care about
accurate synchronisation with wall clock time.
In practice, each switch takes place on a single logical CPU,
and the trace infrastructure should guarantee that events are
well-ordered with respect to a single logical CPU.
Signed-off-by: Dave Martin <dave.martin@linaro.org>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
The event-tracing macros do not like bool tracing arguments, so this
commit makes them be of type char. This change has the knock-on effect
of making it illegal to pass a pointer into one of these arguments, so
also change rcutiny's first call to trace_rcu_batch_end() to convert
from pointer to boolean, prefixing with "!!".
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit adds event traces to track all of rcu_nocb_kthread()'s
blocking and awakening.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Lost wakeups from call_rcu() to the rcuo kthreads can result in hangs
that are difficult to diagnose. This commit therefore adds tracing to
help pin down the cause of these hangs.
Reported-by: Clark Williams <williams@redhat.com>
Reported-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Add const per kbuild test robot's advice. ]
This commit adds tracing to the normal grace-period request points.
These are rcu_gp_cleanup(), which checks for the need for another
grace period at the end of the previous grace period, and
rcu_start_gp_advanced(), which restarts RCU's state machine after
an idle period. These trace events are intended to help track down
bugs where RCU remains idle despite there being work for it to do.
Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit adds tracing to the rcu_gp_kthread() function in order to
help trace down hangs potentially involving this kthread.
Reported-by: Clark Williams <williams@redhat.com>
Reported-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Pull block IO fixes from Jens Axboe:
"After merge window, no new stuff this time only a collection of neatly
confined and simple fixes"
* 'for-3.12/core' of git://git.kernel.dk/linux-block:
cfq: explicitly use 64bit divide operation for 64bit arguments
block: Add nr_bios to block_rq_remap tracepoint
If the queue is dying then we only call the rq->end_io callout. This leaves bios setup on the request, because the caller assumes when the blk_execute_rq_nowait/blk_execute_rq call has completed that the rq->bios have been cleaned up.
bio-integrity: Fix use of bs->bio_integrity_pool after free
blkcg: relocate root_blkg setting and clearing
block: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...)
block: trace all devices plug operation
Pull btrfs fixes from Chris Mason:
"These are mostly bug fixes and a two small performance fixes. The
most important of the bunch are Josef's fix for a snapshotting
regression and Mark's update to fix compile problems on arm"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits)
Btrfs: create the uuid tree on remount rw
btrfs: change extent-same to copy entire argument struct
Btrfs: dir_inode_operations should use btrfs_update_time also
btrfs: Add btrfs: prefix to kernel log output
btrfs: refuse to remount read-write after abort
Btrfs: btrfs_ioctl_default_subvol: Revert back to toplevel subvolume when arg is 0
Btrfs: don't leak transaction in btrfs_sync_file()
Btrfs: add the missing mutex unlock in write_all_supers()
Btrfs: iput inode on allocation failure
Btrfs: remove space_info->reservation_progress
Btrfs: kill delay_iput arg to the wait_ordered functions
Btrfs: fix worst case calculator for space usage
Revert "Btrfs: rework the overcommit logic to be based on the total size"
Btrfs: improve replacing nocow extents
Btrfs: drop dir i_size when adding new names on replay
Btrfs: replay dir_index items before other items
Btrfs: check roots last log commit when checking if an inode has been logged
Btrfs: actually log directory we are fsync()'ing
Btrfs: actually limit the size of delalloc range
Btrfs: allocate the free space by the existed max extent size when ENOSPC
...
Adding the number of bios in a remapped request to 'block_rq_remap'
tracepoint.
Request remapper clones bios in a request to track the completion
status of each bio. So the number of bios can be useful information
for investigation.
Related discussions:
http://www.redhat.com/archives/dm-devel/2013-August/msg00084.htmlhttp://www.redhat.com/archives/dm-devel/2013-September/msg00024.html
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Fix build so that asoc trace event header doesn't depend on other headers.
Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
Pull vfs pile 4 from Al Viro:
"list_lru pile, mostly"
This came out of Andrew's pile, Al ended up doing the merge work so that
Andrew didn't have to.
Additionally, a few fixes.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits)
super: fix for destroy lrus
list_lru: dynamically adjust node arrays
shrinker: Kill old ->shrink API.
shrinker: convert remaining shrinkers to count/scan API
staging/lustre/libcfs: cleanup linux-mem.h
staging/lustre/ptlrpc: convert to new shrinker API
staging/lustre/obdclass: convert lu_object shrinker to count/scan API
staging/lustre/ldlm: convert to shrinkers to count/scan API
hugepage: convert huge zero page shrinker to new shrinker API
i915: bail out earlier when shrinker cannot acquire mutex
drivers: convert shrinkers to new count/scan API
fs: convert fs shrinkers to new scan/count API
xfs: fix dquot isolation hang
xfs-convert-dquot-cache-lru-to-list_lru-fix
xfs: convert dquot cache lru to list_lru
xfs: rework buffer dispose list tracking
xfs-convert-buftarg-lru-to-generic-code-fix
xfs: convert buftarg LRU to generic code
fs: convert inode and dentry shrinking to be node aware
vmscan: per-node deferred work
...
Pull btrfs updates from Chris Mason:
"This is against 3.11-rc7, but was pulled and tested against your tree
as of yesterday. We do have two small incrementals queued up, but I
wanted to get this bunch out the door before I hop on an airplane.
This is a fairly large batch of fixes, performance improvements, and
cleanups from the usual Btrfs suspects.
We've included Stefan Behren's work to index subvolume UUIDs, which is
targeted at speeding up send/receive with many subvolumes or snapshots
in place. It closes a long standing performance issue that was built
in to the disk format.
Mark Fasheh's offline dedup work is also here. In this case offline
means the FS is mounted and active, but the dedup work is not done
inline during file IO. This is a building block where utilities are
able to ask the FS to dedup a series of extents. The kernel takes
care of verifying the data involved really is the same. Today this
involves reading both extents, but we'll continue to evolve the
patches"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (118 commits)
Btrfs: optimize key searches in btrfs_search_slot
Btrfs: don't use an async starter for most of our workers
Btrfs: only update disk_i_size as we remove extents
Btrfs: fix deadlock in uuid scan kthread
Btrfs: stop refusing the relocation of chunk 0
Btrfs: fix memory leak of uuid_root in free_fs_info
btrfs: reuse kbasename helper
btrfs: return btrfs error code for dev excl ops err
Btrfs: allow partial ordered extent completion
Btrfs: convert all bug_ons in free-space-cache.c
Btrfs: add support for asserts
Btrfs: adjust the fs_devices->missing count on unmount
Btrf: cleanup: don't check for root_refs == 0 twice
Btrfs: fix for patch "cleanup: don't check the same thing twice"
Btrfs: get rid of one BUG() in write_all_supers()
Btrfs: allocate prelim_ref with a slab allocater
Btrfs: pass gfp_t to __add_prelim_ref() to avoid always using GFP_ATOMIC
Btrfs: fix race conditions in BTRFS_IOC_FS_INFO ioctl
Btrfs: fix race between removing a dev and writing sbs
Btrfs: remove ourselves from the cluster list under lock
...
In the current code, the value of fallback_migratetype that is printed
using the mm_page_alloc_extfrag tracepoint, is the value of the
migratetype *after* it has been set to the preferred migratetype (if the
ownership was changed). Obviously that wouldn't have been the original
intent. (We already have a separate 'change_ownership' field to tell
whether the ownership of the pageblock was changed from the
fallback_migratetype to the preferred type.)
The intent of the fallback_migratetype field is to show the migratetype
from which we borrowed pages in order to satisfy the allocation request.
So fix the code to print that value correctly.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are no more users of this API, so kill it dead, dead, dead and
quietly bury the corpse in a shallow, unmarked grave in a dark forest deep
in the hills...
[glommer@openvz.org: added flowers to the grave]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Reviewed-by: Greg Thelen <gthelen@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Highlights include:
- Fix NFSv4 recovery so that it doesn't recover lost locks in cases such as
lease loss due to a network partition, where doing so may result in data
corruption. Add a kernel parameter to control choice of legacy behaviour
or not.
- Performance improvements when 2 processes are writing to the same file.
- Flush data to disk when an RPCSEC_GSS session timeout is imminent.
- Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other
NFS clients from being able to manipulate our lease and file lockingr
state.
- Allow sharing of RPCSEC_GSS caches between different rpc clients
- Fix the broken NFSv4 security auto-negotiation between client and server
- Fix rmdir() to wait for outstanding sillyrename unlinks to complete
- Add a tracepoint framework for debugging NFSv4 state recovery issues.
- Add tracing to the generic NFS layer.
- Add tracing for the SUNRPC socket connection state.
- Clean up the rpc_pipefs mount/umount event management.
- Merge more patches from Chuck in preparation for NFSv4 migration support.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJSLelVAAoJEGcL54qWCgDyo2IQAKOfRJyZVnf4ipxi3xLNl1QF
w/70DVSIF1S1djWN7G3vgkxj/R8KCvJ8CcvkAD2BEgRDeZJ9TtyKAdM/jYLZ+W05
7k2QKk8fkwZmc1Y2qDqFwKHzP5ZgP5L2nGx7FNhi/99wEAe47yFG3qd3rUWKrcOf
mnd863zgGDE2Q10slhoq/bywwMJo6tKZNeaIE8kPjgFbBEh/jslpAWr8dSA4QgvJ
nZ8VB5XU8L+XJ0GpHHdjYm9LvQ51DbQ6omOF+0P4fI093azKmf4ZsrjMDWT8+iu3
XkXlnQmKLGTi7yB43hHtn2NiRqwGzCcZ1Amo9PpCFaHUt1RP9cc37UhG1T+x1xWJ
STEKDbvCdQ3FU9FvbgrGEwBR0e8fNS4fZY3ToDBflIcfwre0aWs5RCodZMUD0nUI
4wY5J9NsQR/bL+v8KeUR4V4cXK8YrgL0zB4u4WYzH5Npxr5KD0NEKDNqRPhrB9l2
LLF9Haql8j76Ff0ek6UGFIZjDE0h6Fs71wLBpLj+ZWArOJ7vBuLMBSOVqNpld9+9
f2fEG7qoGF4FGTY4myH/eakMPaWnk9Ol4Ls/svSIapJ9+rePD+a93e/qnmdofIMf
4TuEYk6ERib1qXgaeDRQuCsm2YE1Co5skGMaOsRFWgReE1c12QoJQVst2nMtEKp3
uV2w8LgX18aZOZXJVkCM
=ZuW+
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
- Fix NFSv4 recovery so that it doesn't recover lost locks in cases
such as lease loss due to a network partition, where doing so may
result in data corruption. Add a kernel parameter to control
choice of legacy behaviour or not.
- Performance improvements when 2 processes are writing to the same
file.
- Flush data to disk when an RPCSEC_GSS session timeout is imminent.
- Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other
NFS clients from being able to manipulate our lease and file
locking state.
- Allow sharing of RPCSEC_GSS caches between different rpc clients.
- Fix the broken NFSv4 security auto-negotiation between client and
server.
- Fix rmdir() to wait for outstanding sillyrename unlinks to complete
- Add a tracepoint framework for debugging NFSv4 state recovery
issues.
- Add tracing to the generic NFS layer.
- Add tracing for the SUNRPC socket connection state.
- Clean up the rpc_pipefs mount/umount event management.
- Merge more patches from Chuck in preparation for NFSv4 migration
support"
* tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (107 commits)
NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity
NFS: nfs_compare_super shouldn't check the auth flavour unless 'sec=' was set
NFSv4: Allow security autonegotiation for submounts
NFSv4: Disallow security negotiation for lookups when 'sec=' is specified
NFSv4: Fix security auto-negotiation
NFS: Clean up nfs_parse_security_flavors()
NFS: Clean up the auth flavour array mess
NFSv4.1 Use MDS auth flavor for data server connection
NFS: Don't check lock owner compatability unless file is locked (part 2)
NFS: Don't check lock owner compatibility in writes unless file is locked
nfs4: Map NFS4ERR_WRONG_CRED to EPERM
nfs4.1: Add SP4_MACH_CRED write and commit support
nfs4.1: Add SP4_MACH_CRED stateid support
nfs4.1: Add SP4_MACH_CRED secinfo support
nfs4.1: Add SP4_MACH_CRED cleanup support
nfs4.1: Add state protection handler
nfs4.1: Minimal SP4_MACH_CRED implementation
SUNRPC: Replace pointer values with task->tk_pid and rpc_clnt->cl_clid
SUNRPC: Add an identifier for struct rpc_clnt
SUNRPC: Ensure rpc_task->tk_pid is available for tracepoints
...
Instead of the pointer values, use the task and client identifier values
for tracing purposes.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* Added aggressive extent caching using the extent status tree. This
can actually decrease memory usage in read-mostly workloads since
the information is much more compactly stored in the extent status
tree than if we had to keep the extent tree metadata blocks in the
buffer cache. This also improves Asynchronous I/O since it is it
makes much less likely that we need to do metadata I/O to lookup the
extent tree information.
* Improve the recovery after corrupted allocation bitmaps are found
when running in errors=ignore mode.
Also fixed some writeback vs. truncate races when using a blocksize
less than the page size.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABCAAGBQJSJ1SxAAoJENNvdpvBGATw6xAP/250u0YggRHup5cxmkJ7x+EH
sv/Kbe8r1ftUY7aBQP1awHlVYnOZehh+kYUj+eIVPPXKananhu99qcJy99KFm8W9
gWVP5G0+zvKD++S8yHKhyKjqUtzZwhlYJU7oyqptPr903CVlfjsKx1OtGvUlbnde
Hh/e+XpbltICPIa/O6gsE3SyRakbPtI0gvC4GbsD6EvAl+Rj3l6l+Ty9IkDqGFs9
YCVA2MUly6ZFYNRS8wkOPRP8T8lLwqIa7CNc75bEJPrGQL1R0iiIez0yaoZ83SOu
HMC6wo3XjfgcsuMwJo/mtYsw06rjQy5oNPD5bISRaDtocI5v5Rv8t5EmANnoJFbu
gy+psJ0XcKimL1BfsQ4vFCNiAkskkCQaFr2yJbo6VTDtHS8XV39MeMZ6IvcSqO+6
DQafMcKNiltDbdsywncsee+8ecncv/ZEZDiA6pIUm0lbljPopuzf6sBvxWOFGiHM
xMBD0eyhns/TzfYHzzI+fTcR+GdBDqAkNOrA9i4medffS6iJDAJ6qC6ZhgQh32oR
MCfYosVQwxmCInqtCh51+od29rk7ZIuBrPjp1+uMHjHqG5jDKcANgB7g3VAeQOf0
zuEYTFvGk6cLKfuJtlnaItKXN+eRTtVtfHlLRRq1+wR9UK+dFONV0Jufzs7Y1URI
LbsmGkgxTL9xZEskZXgQ
=tosu
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"New features for 3.12:
- Added aggressive extent caching using the extent status tree. This
can actually decrease memory usage in read-mostly workloads since
the information is much more compactly stored in the extent status
tree than if we had to keep the extent tree metadata blocks in the
buffer cache. This also improves Asynchronous I/O since it is it
makes much less likely that we need to do metadata I/O to lookup
the extent tree information.
- Improve the recovery after corrupted allocation bitmaps are found
when running in errors=ignore mode.
Also fixed some writeback vs truncate races when using a blocksize
less than the page size"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (25 commits)
ext4: allow specifying external journal by pathname mount option
ext4: mark group corrupt on group descriptor checksum
ext4: mark block group as corrupt on inode bitmap error
ext4: mark block group as corrupt on block bitmap error
ext4: fix type declaration of ext4_validate_block_bitmap
ext4: error out if verifying the block bitmap fails
jbd2: Fix endian mixing problems in the checksumming code
ext4: isolate ext4_extents.h file
ext4: Fix misspellings using 'codespell' tool
ext4: convert write_begin methods to stable_page_writes semantics
ext4: fix use of potentially uninitialized variables in debugging code
ext4: fix lost truncate due to race with writeback
ext4: simplify truncation code in ext4_setattr()
ext4: fix ext4_writepages() in presence of truncate
ext4: move test whether extent to map can be extended to one place
ext4: fix warning in ext4_da_update_reserve_space()
quota: provide interface for readding allocated space into reserved space
ext4: avoid reusing recently deleted inodes in no journal mode
ext4: allocate delayed allocation blocks before rename
ext4: start handle at least possible moment when renaming files
...
Pull timers/nohz changes from Ingo Molnar:
"It mostly contains fixes and full dynticks off-case optimizations, by
Frederic Weisbecker"
* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
nohz: Include local CPU in full dynticks global kick
nohz: Optimize full dynticks's sched hooks with static keys
nohz: Optimize full dynticks state checks with static keys
nohz: Rename a few state variables
vtime: Always debug check snapshot source _before_ updating it
vtime: Always scale generic vtime accounting results
vtime: Optimize full dynticks accounting off case with static keys
vtime: Describe overriden functions in dedicated arch headers
m68k: hardirq_count() only need preempt_mask.h
hardirq: Split preempt count mask definitions
context_tracking: Split low level state headers
vtime: Fix racy cputime delta update
vtime: Remove a few unneeded generic vtime state checks
context_tracking: User/kernel broundary cross trace events
context_tracking: Optimize context switch off case with static keys
context_tracking: Optimize guest APIs off case with static key
context_tracking: Optimize main APIs off case with static key
context_tracking: Ground setup for static key use
context_tracking: Remove full dynticks' hacky dependency on wide context tracking
nohz: Only enable context tracking on full dynticks CPUs
...
Add client side debugging to help trace socket connection/disconnection
and unexpected state change issues.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Pull perf changes from Ingo Molnar:
"As a first remark I'd like to point out that the obsolete '-f'
(--force) option, which has not done anything for several releases,
has been removed from 'perf record' and related utilities. Everyone
please update muscle memory accordingly! :-)
Main changes on the perf kernel side:
- Performance optimizations:
. for trace events, by Steve Rostedt.
. for time values, by Peter Zijlstra
- New hardware support:
. for Intel Silvermont (22nm Atom) CPUs, by Zheng Yan
. for Intel SNB-EP uncore PMUs, by Zheng Yan
- Enhanced hardware support:
. for Intel uncore PMUs: add filter support for QPI boxes, by Zheng Yan
- Core perf events code enhancements and fixes:
. for full-nohz feature handling, by Frederic Weisbecker
. for group events, by Jiri Olsa
. for call chains, by Frederic Weisbecker
. for event stream parsing, by Adrian Hunter
- New ABI details:
. Add attr->mmap2 attribute, by Stephane Eranian
. Add PERF_EVENT_IOC_ID ioctl to return event ID, by Jiri Olsa
. Export u64 time_zero on the mmap header page to allow TSC
calculation, by Adrian Hunter
. Add dummy software event, by Adrian Hunter.
. Add a new PERF_SAMPLE_IDENTIFIER to make samples always
parseable, by Adrian Hunter.
. Make Power7 events available via sysfs, by Runzhen Wang.
- Code cleanups and refactorings:
. for nohz-full, by Frederic Weisbecker
. for group events, by Jiri Olsa
- Documentation updates:
. for perf_event_type, by Peter Zijlstra
Main changes on the perf tooling side (some of these tooling changes
utilize the above kernel side changes):
- Lots of 'perf trace' enhancements:
. Make 'perf trace' command line arguments consistent with
'perf record', by David Ahern.
. Allow specifying syscalls a la strace, by Arnaldo Carvalho de Melo.
. Add --verbose and -o/--output options, by Arnaldo Carvalho de Melo.
. Support ! in -e expressions, to filter a list of syscalls,
by Arnaldo Carvalho de Melo.
. Arg formatting improvements to allow masking arguments in
syscalls such as futex and open, where the some arguments are
ignored and thus should not be printed depending on other args,
by Arnaldo Carvalho de Melo.
. Beautify futex open, openat, open_by_handle_at, lseek and futex
syscalls, by Arnaldo Carvalho de Melo.
. Add option to analyze events in a file versus live, so that
one can do:
[root@zoo ~]# perf record -a -e raw_syscalls:* sleep 1
[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 25.150 MB perf.data (~1098836 samples) ]
[root@zoo ~]# perf trace -i perf.data -e futex --duration 1
17.799 ( 1.020 ms): 7127 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, ua
113.344 (95.429 ms): 7127 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, uaddr2: 0x7fff3f6c6648, val3: 4294967
133.778 ( 1.042 ms): 18004 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, uaddr2: 0x7fff3f6c6648, val3: 429496
[root@zoo ~]#
By David Ahern.
. Honor target pid / tid options when analyzing a file, by David Ahern.
. Introduce better formatting of syscall arguments, including so
far beautifiers for mmap, madvise, syscall return values,
by Arnaldo Carvalho de Melo.
. Handle HUGEPAGE defines in the mmap beautifier, by David Ahern.
- 'perf report/top' enhancements:
. Do annotation using /proc/kcore and /proc/kallsyms when
available, removing the forced need for a vmlinux file kernel
assembly annotation. This also improves this use case because
vmlinux has just the initial kernel image, not what is actually
in use after various code patchings by things like alternatives.
By Adrian Hunter.
. Add --ignore-callees=<regex> option to collapse undesired parts
of call graphs, by Greg Price.
. Simplify symbol filtering by doing it at machine class level,
by Adrian Hunter.
. Add support for callchains in the gtk UI, by Namhyung Kim.
. Add --objdump option to 'perf top', by Sukadev Bhattiprolu.
- 'perf kvm' enhancements:
. Add option to print only events that exceed a specified time
duration, by David Ahern.
. Improve stack trace printing, by David Ahern.
. Update documentation of the live command, by David Ahern
. Add perf kvm stat live mode that combines aspects of 'perf kvm
stat' record and report, by David Ahern.
. Add option to analyze specific VM in perf kvm stat report, by
David Ahern.
. Do not require /lib/modules/* on a guest, by Jason Wessel.
- 'perf script' enhancements:
. Fix symbol offset computation for some dsos, by David Ahern.
. Fix named threads support, by David Ahern.
. Don't install scripting files files when perl/python support
is disabled, by Arnaldo Carvalho de Melo.
- 'perf test' enhancements:
. Add various improvements and fixes to the "vmlinux matches
kallsyms" 'perf test' entry, related to the /proc/kcore
annotation feature. By Adrian Hunter.
. Add sample parsing test, by Adrian Hunter.
. Add test for reading object code, by Adrian Hunter.
. Add attr record group sampling test, by Jiri Olsa.
. Misc testing infrastructure improvements and other details,
by Jiri Olsa.
- 'perf list' enhancements:
. Skip unsupported hardware events, by Namhyung Kim.
. List pmu events, by Andi Kleen.
- 'perf diff' enhancements:
. Add support for more than two files comparison, by Jiri Olsa.
- 'perf sched' enhancements:
. Various improvements, including removing reliance on some
scheduler tracepoints that provide the same information as the
PERF_RECORD_{FORK,EXIT} events. By David Ahern.
. Remove odd build stall by moving a large struct initialization
from a local variable to a global one, by Namhyung Kim.
- 'perf stat' enhancements:
. Add --initial-delay option to skip measuring for a defined
startup phase, by Andi Kleen.
- Generic perf tooling infrastructure/plumbing changes:
. Tidy up sample parsing validation, by Adrian Hunter.
. Fix up jobserver setup in libtraceevent Makefile.
by Arnaldo Carvalho de Melo.
. Debug improvements, by Adrian Hunter.
. Fix correlation of samples coming after PERF_RECORD_EXIT event,
by David Ahern.
. Improve robustness of the topology parsing code,
by Stephane Eranian.
. Add group leader sampling, that allows just one event in a group
to sample while the other events have just its values read,
by Jiri Olsa.
. Add support for a new modifier "D", which requests that the
event, or group of events, be pinned to the PMU.
By Michael Ellerman.
. Support callchain sorting based on addresses, by Andi Kleen
. Prep work for multi perf data file storage, by Jiri Olsa.
. libtraceevent cleanups, by Namhyung Kim.
And lots and lots of other fixes and code reorganizations that did not
make it into the list, see the shortlog, diffstat and the Git log for
details!"
[ Also merge a leftover from the 3.11 cycle ]
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Prevent race in unthrottling code
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (237 commits)
perf trace: Tell arg formatters the arg index
perf trace: Add beautifier for open's flags arg
perf trace: Add beautifier for lseek's whence arg
perf tools: Fix symbol offset computation for some dsos
perf list: Skip unsupported events
perf tests: Add 'keep tracking' test
perf tools: Add support for PERF_COUNT_SW_DUMMY
perf: Add a dummy software event to keep tracking
perf trace: Add beautifier for futex 'operation' parm
perf trace: Allow syscall arg formatters to mask args
perf: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node()
perf: Export struct perf_branch_entry to userspace
perf: Add attr->mmap2 attribute to an event
perf/x86: Add Silvermont (22nm Atom) support
perf/x86: use INTEL_UEVENT_EXTRA_REG to define MSR_OFFCORE_RSP_X
perf trace: Handle missing HUGEPAGE defines
perf trace: Honor target pid / tid options when analyzing a file
perf trace: Add option to analyze events in a file versus live
perf evlist: Add tracepoint lookup by name
perf tests: Add a sample parsing test
...
Pull RCU updates from Ingo Molnar:
"Main RCU changes this cycle were:
- Full-system idle detection. This is for use by Frederic
Weisbecker's adaptive-ticks mechanism. Its purpose is to allow the
timekeeping CPU to shut off its tick when all other CPUs are idle.
- Miscellaneous fixes.
- Improved rcutorture test coverage.
- Updated RCU documentation"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU
nohz_full: Add full-system-idle state machine
jiffies: Avoid undefined behavior from signed overflow
rcu: Simplify _rcu_barrier() processing
rcu: Make rcutorture emit online failures if verbose
rcu: Remove unused variable from rcu_torture_writer()
rcu: Sort rcutorture module parameters
rcu: Increase rcutorture test coverage
rcu: Add duplicate-callback tests to rcutorture
doc: Fix memory-barrier control-dependency example
rcu: Update RTFP documentation
nohz_full: Add full-system-idle arguments to API
nohz_full: Add full-system idle states and variables
nohz_full: Add per-CPU idle-state tracking
nohz_full: Add rcu_dyntick data for scalable detection of all-idle state
nohz_full: Add Kconfig parameter for scalable detection of all-idle state
nohz_full: Add testing information to documentation
rcu: Eliminate unused APIs intended for adaptive ticks
rcu: Select IRQ_WORK from TREE_PREEMPT_RCU
rculist: list_first_or_null_rcu() should use list_entry_rcu()
...
Pull RCU updates from Paul E. McKenney:
"
* Update RCU documentation. These were posted to LKML at
https://lkml.org/lkml/2013/8/19/611.
* Miscellaneous fixes. These were posted to LKML at
https://lkml.org/lkml/2013/8/19/619.
* Full-system idle detection. This is for use by Frederic
Weisbecker's adaptive-ticks mechanism. Its purpose is
to allow the timekeeping CPU to shut off its tick when
all other CPUs are idle. These were posted to LKML at
https://lkml.org/lkml/2013/8/19/648.
* Improve rcutorture test coverage. These were posted to LKML at
https://lkml.org/lkml/2013/8/19/675.
"
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This shows exactly how btrfs processes the delayed refs onto disks,
which is very helpful on understanding delayed ref mechanism and
debugging related bugs.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
After applied the commit (4a092d73), we have reduced the number of
source files that need to #include ext4_extents.h. But we can do
better.
This commit defines ext4_zeroout_es() in extents.c and move
EXT_MAX_BLOCKS into ext4.h in order not to include ext4_extents.h in
indirect.c and ioctl.c. Meanwhile we just need to include this file in
extent_status.c when ES_AGGRESSIVE_TEST is defined. Otherwise, this
commit removes a duplicated declaration in trace/events/ext4.h.
After applied this patch, we just need to include ext4_extents.h file
in {super,migrate,move_extents,extents}.c, and it is easy for us to
define a new extent disk layout.
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When we read in an extent tree leaf block from disk, arrange to have
all of its entries cached. In nearly all cases the in-memory
representation will be more compact than the on-disk representation in
the buffer cache, and it allows us to get the information without
having to traverse the extent tree for successive extents.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Don't use an unsigned long long for the es_status flags; this requires
that we pass 64-bit values around which is painful on 32-bit systems.
Instead pass the extent status flags around using the low 4 bits of an
unsigned int, and shift them into place when we are reading or writing
es_pblk.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
This can be useful to track all kernel/user round trips.
And it's also helpful to debug the context tracking subsystem.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kevin Hilman <khilman@linaro.org>
perf_trace_buf_prepare() + perf_trace_buf_submit(task => NULL)
make no sense if hlist_empty(head). Change perf_trace_##call()
to check ->perf_events beforehand and do nothing if it is empty.
This removes the overhead for tasks without events associated
with them. For example, "perf record -e sched:sched_switch -p1"
attaches the counter(s) to the single task, but every task in
system will do perf_trace_buf_prepare/submit() just to realize
that it was not attached to this event.
However, we can only do this if __task == NULL, so we also add
the __builtin_constant_p(__task) check.
With this patch "perf bench sched pipe" shows approximately 4%
improvement when "perf record -p1" runs in parallel, many thanks
to Steven for the testing.
Link: http://lkml.kernel.org/r/20130806160847.GA2746@redhat.com
Tested-by: David Ahern <dsahern@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The next patch tries to avoid the costly perf_trace_buf_* calls
when possible but there is a problem. We can only do this if
__task == NULL, perf_tp_event(task != NULL) has the additional
code for this case.
Unfortunately, TP_perf_assign/__perf_xxx which changes the default
values of __count/__task variables for perf_trace_buf_submit() is
called "too late", after we already did perf_trace_buf_prepare(),
and the optimization above can't work.
So this patch simply embeds __perf_xxx() into TP_ARGS(), this way
DECLARE_EVENT_CLASS() can use the result of assignments hidden in
"args" right after ftrace_get_offsets_##call() which is mostly
trivial. This allows us to have the fast-path "__task != NULL"
check at the start, see the next patch.
Link: http://lkml.kernel.org/r/20130806160844.GA2739@redhat.com
Tested-by: David Ahern <dsahern@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
To simplify the review of the next patches:
1. We are going to reimplent __perf_task/counter and embedd them
into TP_ARGS(). expand TRACE_EVENT(sched_stat_runtime) into
DECLARE_EVENT_CLASS() + DEFINE_EVENT(), this way they can use
different TP_ARGS's.
2. Change perf_trace_##call() macro to do perf_fetch_caller_regs()
right before perf_trace_buf_prepare().
This way it evaluates TP_ARGS() asap, the next patch explores
this fact.
Note: after 87f44bbc perf_trace_buf_prepare() doesn't need
"struct pt_regs *regs", perhaps it makes sense to remove this
argument. And perhaps we can teach perf_trace_buf_submit()
to accept regs == NULL and do fetch_caller_regs(CALLER_ADDR1)
in this case.
3. Cosmetic, but the typecast from "void*" buys nothing. It just
adds the noise, remove it.
Link: http://lkml.kernel.org/r/20130806160841.GA2736@redhat.com
Acked-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
All the RCU tracepoints and functions that reference char pointers do
so with just 'char *' even though they do not modify the contents of
the string itself. This will cause warnings if a const char * is used
in one of these functions.
The RCU tracepoints store the pointer to the string to refer back to them
when the trace output is displayed. As this can be minutes, hours or
even days later, those strings had better be constant.
This change also opens the door to allow the RCU tracepoint strings and
their addresses to be exported so that userspace tracing tools can
translate the contents of the pointers of the RCU tracepoints.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
A new trace event is added to PM events to print the time it takes to
suspend and resume a device. It generates trace messages that
include device, driver, parent information in addition to the type of
PM ops invoked as well as the PM event and error status from the PM
ops. Example trace below:
bash-2239 [000] .... 290.883035: device_pm_report_time: backlight
acpi_video0 parent=0000:00:02.0 state=freeze ops=class nsecs=332 err=0
bash-2239 [000] .... 290.883041: device_pm_report_time: rfkill rf
kill3 parent=phy0 state=freeze ops=legacy class nsecs=216 err=0
bash-2239 [001] .... 290.973892: device_pm_report_time: ieee80211
phy0 parent=0000:01:00.0 state=freeze ops=legacy class nsecs=90846477 err=0
bash-2239 [001] .... 293.660129: device_pm_report_time: ieee80211 phy0 parent=0000:01:00.0 state=restore ops=legacy class nsecs=101295162 err=0
bash-2239 [001] .... 293.660147: device_pm_report_time: rfkill rfkill3 parent=phy0 state=restore ops=legacy class nsecs=1804 err=0
bash-2239 [001] .... 293.660157: device_pm_report_time: backlight acpi_video0 parent=0000:00:02.0 state=restore ops=class nsecs=757 err=0
Signed-off-by: Shuah Khan <shuah.kh@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some of the fixes need to go back to 3.10. They are minor, and deal mostly
with incorrect ref counting in accessing event files.
There was a couple of optimizations that should have perf perform a bit
better when accessing trace events.
And some various clean ups. Some of the clean ups are necessary to help
in a fix to a theoretical race between opening a event file and
deleting that event.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJR7TuSAAoJEOdOSU1xswtMFsEIALOQWth+jUEmd+TJNMgW7vHd
aJ4pjc0Br2ur0XOm4xsOOsuexQ/sKG4J0qJT4z01Ny4ZJ6UcL6CvLKlQXlySrUw5
POH6+7B7os3ikav+4KGDYJpeyR7l+uveA7IcqZz5OWAbz2yi3HbluPUUyFn+62ic
Q0IOi4KkCly4buHNqJqfQRUo+0eBb8sZUfaklIQE07Dd66YVyq4w2WogI2PxBanP
b6p4sE9n7wf7GxXXur5jPBz8PheAFu6a6dM9d9BX28ia79OGSGN4mYWbSNOn8wzl
gJr1ZqxKJBq73IHpNV7QBOCCgDJ9vtuqxKKm4kuLCMfjCTPBsQ3Bmo/qJulnnGI=
=AlmI
-----END PGP SIGNATURE-----
Merge tag 'trace-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes and cleanups from Steven Rostedt:
"This contains fixes, optimizations and some clean ups
Some of the fixes need to go back to 3.10. They are minor, and deal
mostly with incorrect ref counting in accessing event files.
There was a couple of optimizations that should have perf perform a
bit better when accessing trace events.
And some various clean ups. Some of the clean ups are necessary to
help in a fix to a theoretical race between opening a event file and
deleting that event"
* tag 'trace-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Kill the unbalanced tr->ref++ in tracing_buffers_open()
tracing: Kill trace_array->waiter
tracing: Do not (ab)use trace_seq in event_id_read()
tracing: Simplify the iteration logic in f_start/f_next
tracing: Add ref_data to function and fgraph tracer structs
tracing: Miscellaneous fixes for trace_array ref counting
tracing: Fix error handling to ensure instances can always be removed
tracing/kprobe: Wait for disabling all running kprobe handlers
tracing/perf: Move the PERF_MAX_TRACE_SIZE check into perf_trace_buf_prepare()
tracing/syscall: Avoid perf_trace_buf_*() if sys_data->perf_events is empty
tracing/function: Avoid perf_trace_buf_*() if event_function.perf_events is empty
tracing: Typo fix on ring buffer comments
tracing: Use trace_seq_puts()/trace_seq_putc() where possible
tracing: Use correct config guard CONFIG_STACK_TRACER
Pull block IO driver bits from Jens Axboe:
"As I mentioned in the core block pull request, due to real life
circumstances the driver pull request would be late. Now it looks
like -rc2 late... On the plus side, apart form the rsxx update, these
are all things that I could argue could go in later in the cycle as
they are fixes and not features. So even though things are late, it's
not ALL bad.
The pull request contains:
- Updates to bcache, all bug fixes, from Kent.
- A pile of drbd bug fixes (no big features this time!).
- xen blk front/back fixes.
- rsxx driver updates, some of them deferred form 3.10. So should be
well cooked by now"
* 'for-3.11/drivers' of git://git.kernel.dk/linux-block: (63 commits)
bcache: Allocation kthread fixes
bcache: Fix GC_SECTORS_USED() calculation
bcache: Journal replay fix
bcache: Shutdown fix
bcache: Fix a sysfs splat on shutdown
bcache: Advertise that flushes are supported
bcache: check for allocation failures
bcache: Fix a dumb race
bcache: Use standard utility code
bcache: Update email address
bcache: Delete fuzz tester
bcache: Document shrinker reserve better
bcache: FUA fixes
drbd: Allow online change of al-stripes and al-stripe-size
drbd: Constants should be UPPERCASE
drbd: Ignore the exit code of a fence-peer handler if it returns too late
drbd: Fix rcu_read_lock balance on error path
drbd: fix error return code in drbd_init()
drbd: Do not sleep inside rcu
bcache: Refresh usage docs
...
Every perf_trace_buf_prepare() caller does
WARN_ONCE(size > PERF_MAX_TRACE_SIZE, message) and "message" is
almost the same.
Shift this WARN_ONCE() into perf_trace_buf_prepare(). This changes
the meaning of _ONCE, but I think this is fine.
- 4947014 2932448 10104832 17984294 1126b26 vmlinux
+ 4948422 2932448 10104832 17985702 11270a6 vmlinux
on my build.
Link: http://lkml.kernel.org/r/20130617170211.GA19813@redhat.com
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Pull SCSI target updates from Nicholas Bellinger:
"Lots of activity this round on performance improvements in target-core
while benchmarking the prototype scsi-mq initiator code with
vhost-scsi fabric ports, along with a number of iscsi/iser-target
improvements and hardening fixes for exception path cases post v3.10
merge.
The highlights include:
- Make persistent reservations APTPL buffer allocated on-demand, and
drop per t10_reservation buffer. (grover)
- Make virtual LUN=0 a NULLIO device, and skip allocation of NULLIO
device pages (grover)
- Add transport_cmd_check_stop write_pending bit to avoid extra
access of ->t_state_lock is WRITE I/O submission fast-path. (nab)
- Drop unnecessary CMD_T_DEV_ACTIVE check from
transport_lun_remove_cmd to avoid extra access of ->t_state_lock in
release fast-path. (nab)
- Avoid extra t_state_lock access in __target_execute_cmd fast-path
(nab)
- Drop unnecessary vhost-scsi wait_for_tasks=true usage +
->t_state_lock access in release fast-path. (nab)
- Convert vhost-scsi to use modern se_cmd->cmd_kref
TARGET_SCF_ACK_KREF usage (nab)
- Add tracepoints for SCSI commands being processed (roland)
- Refactoring of iscsi-target handling of ISCSI_OP_NOOP +
ISCSI_OP_TEXT to be transport independent (nab)
- Add iscsi-target SendTargets=$IQN support for in-band discovery
(nab)
- Add iser-target support for in-band discovery (nab + Or)
- Add iscsi-target demo-mode TPG authentication context support (nab)
- Fix isert_put_reject payload buffer post (nab)
- Fix iscsit_add_reject* usage for iser (nab)
- Fix iscsit_sequence_cmd reject handling for iser (nab)
- Fix ISCSI_OP_SCSI_TMFUNC handling for iser (nab)
- Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED (nab)
The last five iscsi/iser-target items are CC'ed to stable, as they do
address issues present in v3.10 code. They are certainly larger than
I'd like for stable patch set, but are important to ensure proper
REJECT exception handling in iser-target for 3.10.y"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (51 commits)
iser-target: Ignore non TEXT + LOGOUT opcodes for discovery
target: make queue_tm_rsp() return void
target: remove unused codes from enum tcm_tmrsp_table
iscsi-target: kstrtou* configfs attribute parameter cleanups
iscsi-target: Fix tfc_tpg_auth_cit configfs length overflow
iscsi-target: Fix tfc_tpg_nacl_auth_cit configfs length overflow
iser-target: Add support for ISCSI_OP_TEXT opcode + payload handling
iser-target: Rename sense_buf_[dma,len] to pdu_[dma,len]
iser-target: Add vendor_err debug output
target: Add (obsolete) checking for PMI/LBA fields in READ CAPACITY(10)
target: Return correct sense data for IO past the end of a device
target: Add tracepoints for SCSI commands being processed
iser-target: Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED
iscsi-target: Fix ISCSI_OP_SCSI_TMFUNC handling for iser
iscsi-target: Fix iscsit_sequence_cmd reject handling for iser
iscsi-target: Fix iscsit_add_reject* usage for iser
iser-target: Fix isert_put_reject payload buffer post
iscsi-target: missing kfree() on error path
iscsi-target: Drop left-over iscsi_conn->bad_hdr
target: Make core_scsi3_update_and_write_aptpl return sense_reason_t
...
were added to 3.10, which includes several bug fixes that have been
marked for stable.
As for new features, there were a few, but nothing to write to LWN about.
These include:
New function trigger called "dump" and "cpudump" that will cause
ftrace to dump its buffer to the console when the function is called.
The difference between "dump" and "cpudump" is that "dump" will dump
the entire contents of the ftrace buffer, where as "cpudump" will only
dump the contents of the ftrace buffer for the CPU that called the function.
Another small enhancement is a new sysctl switch called "traceoff_on_warning"
which, when enabled, will disable tracing if any WARN_ON() is triggered.
This is useful if you want to debug what caused a warning and do not
want to risk losing your trace data by the ring buffer overwriting the
data before you can disable it. There's also a kernel command line
option that will make this enabled at boot up called the same thing.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJR1uF2AAoJEOdOSU1xswtMJ1IH/2LSiZAKTA2QaRgGQC/5Bb9c
XSOI1HfD/78lmUvTyb0AX8sLpkzZlvIONEQ/WaZUFo1Zjbrl45zJUwMkTE9uImEg
ZqI5x8OiiN6j4XrRbfYn3Ti060H/Jq41pZXa+shh961Vv51ilv/1yyLkoRmnjzuO
JTloPdXDV7icOqqiSdgxSdtUSv59Ef1ZdHgvvsb3aqzMC5btVQPi4kIys0ST1Tr1
pMWBY+UgvH0xYm3gvTR+W6jjDlkVZEH2alkmcinfr+uC1tm9DDqK2HA17Pd5yZ5z
HNdT76lCzf9iqRF5F8HUvUt+PIp76dNNxAt2qpB6APqAuJTojyguxXHDbY/0kzs=
=UvLi
-----END PGP SIGNATURE-----
Merge tag 'trace-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing changes from Steven Rostedt:
"The majority of the changes here are cleanups for the large changes
that were added to 3.10, which includes several bug fixes that have
been marked for stable.
As for new features, there were a few, but nothing to write to LWN
about. These include:
New function trigger called "dump" and "cpudump" that will cause
ftrace to dump its buffer to the console when the function is called.
The difference between "dump" and "cpudump" is that "dump" will dump
the entire contents of the ftrace buffer, where as "cpudump" will only
dump the contents of the ftrace buffer for the CPU that called the
function.
Another small enhancement is a new sysctl switch called
"traceoff_on_warning" which, when enabled, will disable tracing if any
WARN_ON() is triggered. This is useful if you want to debug what
caused a warning and do not want to risk losing your trace data by the
ring buffer overwriting the data before you can disable it. There's
also a kernel command line option that will make this enabled at boot
up called the same thing"
* tag 'trace-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (34 commits)
tracing: Make tracing_open_generic_{tr,tc}() static
tracing: Remove ftrace() function
tracing: Remove TRACE_EVENT_TYPE enum definition
tracing: Make tracer_tracing_{off,on,is_on}() static
tracing: Fix irqs-off tag display in syscall tracing
uprobes: Fix return value in error handling path
tracing: Fix race between deleting buffer and setting events
tracing: Add trace_array_get/put() to event handling
tracing: Get trace_array ref counts when accessing trace files
tracing: Add trace_array_get/put() to handle instance refs better
tracing: Protect ftrace_trace_arrays list in trace_events.c
tracing: Make trace_marker use the correct per-instance buffer
ftrace: Do not run selftest if command line parameter is set
tracing/kprobes: Don't pass addr=ip to perf_trace_buf_submit()
tracing: Use flag buffer_disabled for irqsoff tracer
tracing/kprobes: Turn trace_probe->files into list_head
tracing: Fix disabling of soft disable
tracing: Add missing syscall_metadata comment
tracing: Simplify code for showing of soft disabled flag
tracing/kprobes: Kill probe_enable_lock
...
* optional security enhancements
* fix path coverage in MAINTAINERS
* switch to using most used protocol and transport as default
* clean up buffer dumps in trace code
Held off on RDMA patches as they need to be cleaned up a bit, but
will try to get the cleaned, checked, and pushed by mid-week.
(attempt 2, hopefully this one won't screw up the history)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: GPGTools - http://gpgtools.org
iQIcBAABAgAGBQJR2iZUAAoJEDZk62b0Tg6xsfQP/i3cYmkpf58lb++WoWDohQdh
iH34P6Tv+5AKcF5SViBFDyXsdkE0D/Ixzl/E6jTsx+6OTSCA0eIw4OYyvPQpzFyp
1+RqnTyEq6v2SQaGZKW7k7NyXDiRhVypXBupuNq8eZpYKS8B3cKdnQ/WFSAXcxQ1
sbKWKUWnnqIZYnRNqNK4LTxz9cbLovXIQOYBhn0F+NoAFinC1ZQrWzuUVbct880i
cSoukTivmJHb37Pt9AKluPc6GGa6XHXkomQewh0WOnBJ/9FR3YUHeRXR04cnAWAL
zpGKagnIhYWtdaTJQXCzO2OMCQakhf9FiBWYGjfM9ysyzS4LDp1cknlyUPox97xF
o9o6MfFF161c8+uC/RpK8Lp3vG6CFPEcMVxp73BydNNI4/1hzbfCs3WcGdpkvAg/
rRik/zyN7l3jEwtvU03Y1WEV79Ep/Q8cvPqi4XZB2L1XYi43fT4yze6zMM/cmQ5K
DLTbFxtN5ILWg2LjQergORyn66WqQjproPqcgd9tVrvJ30Z5KPjIh+CBVcYPWp4V
hxD0Pd0yTySpxUqV4Qx/BMZdWiD1wuBgidKgl+jNldTaCSFtPqQ52LYmTWNpneI1
lcc3SMFRNRhqWMOFhzpcX1xGuXKD5eRiOrQ+L1ecFxGFYVndY5nwa6Pn8gUrfGHW
LEBmADtMsv2YQW2Kahk2
=ktVU
-----END PGP SIGNATURE-----
Merge tag 'for-linus-3.11-merge-window-part-1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
Pull 9p update from Eric Van Hensbergen:
"Grab bag of little fixes and enhancements:
- optional security enhancements
- fix path coverage in MAINTAINERS
- switch to using most used protocol and transport as default
- clean up buffer dumps in trace code
Held off on RDMA patches as they need to be cleaned up a bit, but will
try to get the cleaned, checked, and pushed by mid-week"
* tag 'for-linus-3.11-merge-window-part-1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: Add rest of 9p files to MAINTAINERS entry
9p: trace: use %*ph to dump buffer
net/9p: Handle error in zero copy request correctly for 9p2000.u
net/9p: Use virtio transpart as the default transport
net/9p: Make 9P2000.L the default protocol for 9p file system
Pull btrfs update from Chris Mason:
"These are the usual mixture of bugs, cleanups and performance fixes.
Miao has some really nice tuning of our crc code as well as our
transaction commits.
Josef is peeling off more and more problems related to early enospc,
and has a number of important bug fixes in here too"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (81 commits)
Btrfs: wait ordered range before doing direct io
Btrfs: only do the tree_mod_log_free_eb if this is our last ref
Btrfs: hold the tree mod lock in __tree_mod_log_rewind
Btrfs: make backref walking code handle skinny metadata
Btrfs: fix crash regarding to ulist_add_merge
Btrfs: fix several potential problems in copy_nocow_pages_for_inode
Btrfs: cleanup the code of copy_nocow_pages_for_inode()
Btrfs: fix oops when recovering the file data by scrub function
Btrfs: make the chunk allocator completely tree lockless
Btrfs: cleanup orphaned root orphan item
Btrfs: fix wrong mirror number tuning
Btrfs: cleanup redundant code in btrfs_submit_direct()
Btrfs: remove btrfs_sector_sum structure
Btrfs: check if we can nocow if we don't have data space
Btrfs: stop using try_to_writeback_inodes_sb_nr to flush delalloc
Btrfs: use a percpu to keep track of possibly pinned bytes
Btrfs: check for actual acls rather than just xattrs when caching no acl
Btrfs: move btrfs_truncate_page to btrfs_cont_expand instead of btrfs_truncate
Btrfs: optimize reada_for_balance
Btrfs: optimize read_block_for_search
...
This patch adds tracepoints to the target code for commands being
received and being completed, which is quite useful for debugging
interactions with initiators. For example, one can do something like the
following to watch commands that are completing unsuccessfully:
# echo 'scsi_status!=0' > /sys/kernel/debug/tracing/events/target/target_cmd_complete/filter
# echo 1 > /sys/kernel/debug/tracing/events/target/target_cmd_complete/enable
<run command that fails>
# cat /sys/kernel/debug/tracing/trace
iscsi_trx-0-1902 [003] ...1 990185.810385: target_cmd_complete: iqn.1993-08.org.debian:01:e51ede6aacfd <- LUN 001 status CHECK CONDITION (sense len 18 / 70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00 00 00) 0x95 data_length 512 CDB 95 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 (TA:SIMPLE C:00)
(v2: Drop undefined COMPARE_AND_WRITE)
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Merge first patch-bomb from Andrew Morton:
- various misc bits
- I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been
distracted. There has been quite a bit of activity.
- About half the MM queue
- Some backlight bits
- Various lib/ updates
- checkpatch updates
- zillions more little rtc patches
- ptrace
- signals
- exec
- procfs
- rapidio
- nbd
- aoe
- pps
- memstick
- tools/testing/selftests updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (445 commits)
tools/testing/selftests: don't assume the x bit is set on scripts
selftests: add .gitignore for kcmp
selftests: fix clean target in kcmp Makefile
selftests: add .gitignore for vm
selftests: add hugetlbfstest
self-test: fix make clean
selftests: exit 1 on failure
kernel/resource.c: remove the unneeded assignment in function __find_resource
aio: fix wrong comment in aio_complete()
drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode
drivers/memstick/host/r592.c: convert to module_pci_driver
drivers/memstick/host/jmb38x_ms: convert to module_pci_driver
pps-gpio: add device-tree binding and support
drivers/pps/clients/pps-gpio.c: convert to module_platform_driver
drivers/pps/clients/pps-gpio.c: convert to devm_* helpers
drivers/parport/share.c: use kzalloc
Documentation/accounting/getdelays.c: avoid strncpy in accounting tool
aoe: update internal version number to v83
aoe: update copyright date
aoe: perform I/O completions in parallel
...
Andrew Perepechko reported a problem whereby pages are being prematurely
evicted as the mark_page_accessed() hint is ignored for pages that are
currently on a pagevec --
http://www.spinics.net/lists/linux-ext4/msg37340.html .
Alexey Lyahkov and Robin Dong have also reported problems recently that
could be due to hot pages reaching the end of the inactive list too
quickly and be reclaimed.
Rather than addressing this on a per-filesystem basis, this series aims
to fix the mark_page_accessed() interface by deferring what LRU a page
is added to pagevec drain time and allowing mark_page_accessed() to call
SetPageActive on a pagevec page.
Patch 1 adds two tracepoints for LRU page activation and insertion. Using
these processes it's possible to build a model of pages in the
LRU that can be processed offline.
Patch 2 defers making the decision on what LRU to add a page to until when
the pagevec is drained.
Patch 3 searches the local pagevec for pages to mark PageActive on
mark_page_accessed. The changelog explains why only the local
pagevec is examined.
Patches 4 and 5 tidy up the API.
postmark, a dd-based test and fs-mark both single and threaded mode were
run but none of them showed any performance degradation or gain as a
result of the patch.
Using patch 1, I built a *very* basic model of the LRU to examine
offline what the average age of different page types on the LRU were in
milliseconds. Of course, capturing the trace distorts the test as it's
written to local disk but it does not matter for the purposes of this
test. The average age of pages in milliseconds were
vanilla deferdrain
Average age mapped anon: 1454 1250
Average age mapped file: 127841 155552
Average age unmapped anon: 85 235
Average age unmapped file: 73633 38884
Average age unmapped buffers: 74054 116155
The LRU activity was mostly files which you'd expect for a dd-based
workload. Note that the average age of buffer pages is increased by the
series and it is expected this is due to the fact that the buffer pages
are now getting added to the active list when drained from the pagevecs.
Note that the average age of the unmapped file data is decreased as they
are still added to the inactive list and are reclaimed before the
buffers.
There is no guarantee this is a universal win for all workloads and it
would be nice if the filesystem people gave some thought as to whether
this decision is generally a win or a loss.
This patch:
Using these tracepoints it is possible to model LRU activity and the
average residency of pages of different types. This can be used to
debug problems related to premature reclaim of pages of particular
types.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alexey Lyahkov <alexey.lyashkov@gmail.com>
Cc: Andrew Perepechko <anserper@ya.ru>
Cc: Robin Dong <sanbai@taobao.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Bernd Schubert <bernd.schubert@fastmail.fm>
Cc: David Howells <dhowells@redhat.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Hotplug changes allowing device hot-removal operations to fail
gracefully (instead of crashing the kernel) if they cannot be
carried out completely. From Rafael J Wysocki and Toshi Kani.
- Freezer update from Colin Cross and Mandeep Singh Baines targeted
at making the freezing of tasks a bit less heavy weight operation.
- cpufreq resume fix from Srivatsa S Bhat for a regression introduced
during the 3.10 cycle causing some cpufreq sysfs attributes to
return wrong values to user space after resume.
- New freqdomain_cpus sysfs attribute for the acpi-cpufreq driver to
provide information previously available via related_cpus from
Lan Tianyu.
- cpufreq fixes and cleanups from Viresh Kumar, Jacob Shin,
Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia, Arnd Bergmann, and
Tang Yuantian.
- Fix for an ACPICA regression causing suspend/resume issues to
appear on some systems introduced during the 3.4 development cycle
from Lv Zheng.
- ACPICA fixes and cleanups from Bob Moore, Tomasz Nowicki, Lv Zheng,
Chao Guan, and Zhang Rui.
- New cupidle driver for Xilinx Zynq processors from Michal Simek.
- cpuidle fixes and cleanups from Daniel Lezcano.
- Changes to make suspend/resume work correctly in Xen guests from
Konrad Rzeszutek Wilk.
- ACPI device power management fixes and cleanups from Fengguang Wu
and Rafael J Wysocki.
- ACPI documentation updates from Lv Zheng, Aaron Lu and Hanjun Guo.
- Fix for the IA-64 issue that was the reason for reverting commit
9f29ab1 and updates of the ACPI scan code from Rafael J Wysocki.
- Mechanism for adding CMOS RTC address space handlers from Lan Tianyu
(to allow some EC-related breakage to be fixed on some systems).
- Spec-compliant implementation of acpi_os_get_timer() from
Mika Westerberg.
- Modification of do_acpi_find_child() to execute _STA in order to
to avoid situations in which a pointer to a disabled device object
is returned instead of an enabled one with the same _ADR value.
From Jeff Wu.
- Intel BayTrail PCH (Platform Controller Hub) support for the ACPI
Intel Low-Power Subsystems (LPSS) driver and modificaions of that
driver to work around a couple of known BIOS issues from
Mika Westerberg and Heikki Krogerus.
- EC driver fix from Vasiliy Kulikov to make it use get_user() and
put_user() instead of dereferencing user space pointers blindly.
- Assorted ACPI code cleanups from Bjorn Helgaas, Nicholas Mazzuca and
Toshi Kani.
- Modification of the "runtime idle" helper routine to take the return
values of the callbacks executed by it into account and to call
rpm_suspend() if they return 0, which allows some code bloat
reduction to be done, from Rafael J Wysocki and Alan Stern.
- New trace points for PM QoS from Sahara <keun-o.park@windriver.com>.
- PM QoS documentation update from Lan Tianyu.
- Assorted core PM code cleanups and changes from Bernie Thompson,
Bjorn Helgaas, Julius Werner, and Shuah Khan.
- New devfreq driver for the Exynos5-bus device from Abhilash Kesavan.
- Minor devfreq cleanups, fixes and MAINTAINERS update from
MyungJoo Ham, Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and
Wei Yongjun.
- OMAP Adaptive Voltage Scaling (AVS) SmartReflex voltage control
driver updates from Andrii Tseglytskyi and Nishanth Menon.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJR0ZNOAAoJEKhOf7ml8uNsDLYP/0EU4rmvw0TWTITfp6RS1KDE
9GwBn96ZR4Q5bJd9gBCTPSqhHOYMqxWEUp99sn/M2wehG1pk/jw5LO56+2IhM3UZ
g1HDcJ7te2nVT/iXsKiAGTVhU9Rk0aYwoVSknwk27qpIBGxW9w/s5tLX8pY3Q3Zq
wL/7aTPjyL+PFFFEaxgH7qLqsl3DhbtYW5AriUBTkXout/tJ4eO1b7MNBncLDh8X
VQ/0DNCKE95VEJfkO4rk9RKUyVp9GDn0i+HXCD/FS4IA5oYzePdVdNDmXf7g+swe
CGlTZq8pB+oBpDiHl4lxzbNrKQjRNbGnDUkoRcWqn0nAw56xK+vmYnWJhW99gQ/I
fKnvxeLca5po1aiqmC4VSJxZIatFZqLrZAI4dzoCLWY+bGeTnCKmj0/F8ytFnZA2
8IuLLs7/dFOaHXV/pKmpg6FAlFa9CPxoqRFoyqb4M0GjEarADyalXUWsPtG+6xCp
R/p0CISpwk+guKZR/qPhL7M654S7SHrPwd2DPF0KgGsvk+G2GhoB8EzvD8BVp98Z
9siCGCdgKQfJQVI6R0k9aFmn/4gRQIAgyPhkhv9tqULUUkiaXki+/t8kPfnb8O/d
zep+CA57E2G8MYLkDJfpFeKS7GpPD6TIdgFdGmOUC0Y6sl9iTdiw4yTx8O2JM37z
rHBZfYGkJBrbGRu+Q1gs
=VBBq
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI updates from Rafael Wysocki:
"This time the total number of ACPI commits is slightly greater than
the number of cpufreq commits, but Viresh Kumar (who works on cpufreq)
remains the most active patch submitter.
To me, the most significant change is the addition of offline/online
device operations to the driver core (with the Greg's blessing) and
the related modifications of the ACPI core hotplug code. Next are the
freezer updates from Colin Cross that should make the freezing of
tasks a bit less heavy weight.
We also have a couple of regression fixes, a number of fixes for
issues that have not been identified as regressions, two new drivers
and a bunch of cleanups all over.
Highlights:
- Hotplug changes to support graceful hot-removal failures.
It sometimes is necessary to fail device hot-removal operations
gracefully if they cannot be carried out completely. For example,
if memory from a memory module being hot-removed has been allocated
for the kernel's own use and cannot be moved elsewhere, it's
desirable to fail the hot-removal operation in a graceful way
rather than to crash the kernel, but currenty a success or a kernel
crash are the only possible outcomes of an attempted memory
hot-removal. Needless to say, that is not a very attractive
alternative and it had to be addressed.
However, in order to make it work for memory, I first had to make
it work for CPUs and for this purpose I needed to modify the ACPI
processor driver. It's been split into two parts, a resident one
handling the low-level initialization/cleanup and a modular one
playing the actual driver's role (but it binds to the CPU system
device objects rather than to the ACPI device objects representing
processors). That's been sort of like a live brain surgery on a
patient who's riding a bike.
So this is a little scary, but since we found and fixed a couple of
regressions it caused to happen during the early linux-next testing
(a month ago), nobody has complained.
As a bonus we remove some duplicated ACPI hotplug code, because the
ACPI-based CPU hotplug is now going to use the common ACPI hotplug
code.
- Lighter weight freezing of tasks.
These changes from Colin Cross and Mandeep Singh Baines are
targeted at making the freezing of tasks a bit less heavy weight
operation. They reduce the number of tasks woken up every time
during the freezing, by using the observation that the freezer
simply doesn't need to wake up some of them and wait for them all
to call refrigerator(). The time needed for the freezer to decide
to report a failure is reduced too.
Also reintroduced is the check causing a lockdep warining to
trigger when try_to_freeze() is called with locks held (which is
generally unsafe and shouldn't happen).
- cpufreq updates
First off, a commit from Srivatsa S Bhat fixes a resume regression
introduced during the 3.10 cycle causing some cpufreq sysfs
attributes to return wrong values to user space after resume. The
fix is kind of fresh, but also it's pretty obvious once Srivatsa
has identified the root cause.
Second, we have a new freqdomain_cpus sysfs attribute for the
acpi-cpufreq driver to provide information previously available via
related_cpus. From Lan Tianyu.
Finally, we fix a number of issues, mostly related to the
CPUFREQ_POSTCHANGE notifier and cpufreq Kconfig options and clean
up some code. The majority of changes from Viresh Kumar with bits
from Jacob Shin, Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia,
Arnd Bergmann, and Tang Yuantian.
- ACPICA update
A usual bunch of updates from the ACPICA upstream.
During the 3.4 cycle we introduced support for ACPI 5 extended
sleep registers, but they are only supposed to be used if the
HW-reduced mode bit is set in the FADT flags and the code attempted
to use them without checking that bit. That caused suspend/resume
regressions to happen on some systems. Fix from Lv Zheng causes
those registers to be used only if the HW-reduced mode bit is set.
Apart from this some other ACPICA bugs are fixed and code cleanups
are made by Bob Moore, Tomasz Nowicki, Lv Zheng, Chao Guan, and
Zhang Rui.
- cpuidle updates
New driver for Xilinx Zynq processors is added by Michal Simek.
Multidriver support simplification, addition of some missing
kerneldoc comments and Kconfig-related fixes come from Daniel
Lezcano.
- ACPI power management updates
Changes to make suspend/resume work correctly in Xen guests from
Konrad Rzeszutek Wilk, sparse warning fix from Fengguang Wu and
cleanups and fixes of the ACPI device power state selection
routine.
- ACPI documentation updates
Some previously missing pieces of ACPI documentation are added by
Lv Zheng and Aaron Lu (hopefully, that will help people to
uderstand how the ACPI subsystem works) and one outdated doc is
updated by Hanjun Guo.
- Assorted ACPI updates
We finally nailed down the IA-64 issue that was the reason for
reverting commit 9f29ab11dd ("ACPI / scan: do not match drivers
against objects having scan handlers"), so we can fix it and move
the ACPI scan handler check added to the ACPI video driver back to
the core.
A mechanism for adding CMOS RTC address space handlers is
introduced by Lan Tianyu to allow some EC-related breakage to be
fixed on some systems.
A spec-compliant implementation of acpi_os_get_timer() is added by
Mika Westerberg.
The evaluation of _STA is added to do_acpi_find_child() to avoid
situations in which a pointer to a disabled device object is
returned instead of an enabled one with the same _ADR value. From
Jeff Wu.
Intel BayTrail PCH (Platform Controller Hub) support is added to
the ACPI driver for Intel Low-Power Subsystems (LPSS) and that
driver is modified to work around a couple of known BIOS issues.
Changes from Mika Westerberg and Heikki Krogerus.
The EC driver is fixed by Vasiliy Kulikov to use get_user() and
put_user() instead of dereferencing user space pointers blindly.
Code cleanups are made by Bjorn Helgaas, Nicholas Mazzuca and Toshi
Kani.
- Assorted power management updates
The "runtime idle" helper routine is changed to take the return
values of the callbacks executed by it into account and to call
rpm_suspend() if they return 0, which allows us to reduce the
overall code bloat a bit (by dropping some code that's not
necessary any more after that modification).
The runtime PM documentation is updated by Alan Stern (to reflect
the "runtime idle" behavior change).
New trace points for PM QoS are added by Sahara
(<keun-o.park@windriver.com>).
PM QoS documentation is updated by Lan Tianyu.
Code cleanups are made and minor issues are addressed by Bernie
Thompson, Bjorn Helgaas, Julius Werner, and Shuah Khan.
- devfreq updates
New driver for the Exynos5-bus device from Abhilash Kesavan.
Minor cleanups, fixes and MAINTAINERS update from MyungJoo Ham,
Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and Wei Yongjun.
- OMAP power management updates
Adaptive Voltage Scaling (AVS) SmartReflex voltage control driver
updates from Andrii Tseglytskyi and Nishanth Menon."
* tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (162 commits)
cpufreq: Fix cpufreq regression after suspend/resume
ACPI / PM: Fix possible NULL pointer deref in acpi_pm_device_sleep_state()
PM / Sleep: Warn about system time after resume with pm_trace
cpufreq: don't leave stale policy pointer in cdbs->cur_policy
acpi-cpufreq: Add new sysfs attribute freqdomain_cpus
cpufreq: make sure frequency transitions are serialized
ACPI: implement acpi_os_get_timer() according the spec
ACPI / EC: Add HP Folio 13 to ec_dmi_table in order to skip DSDT scan
ACPI: Add CMOS RTC Operation Region handler support
ACPI / processor: Drop unused variable from processor_perflib.c
cpufreq: tegra: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: s3c64xx: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: omap: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: imx6q: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: exynos: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: dbx500: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: davinci: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: arm-big-little: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: powernow-k8: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: pcc: call CPUFREQ_POSTCHANGE notfier in error cases
...
A small but useful set of regmap updates this time around:
- An abstraction for bitfields within a register map contributed by
Srinivas Kandagatla, allowing drivers to cope more easily when
hardware designers randomly move things about (mainly when talking
to things like system controllers).
- Changes from Lars-Peter Clausen to allow the MMIO regmap to be used from
hard IRQ context.
- Small improvements to the cache infrastructure and performance,
including a default cache sync operation so now all regmaps can sync
easily.
There's also a pinctrl driver making use of the new bitfield API, merged
here for dependency reasons. There will be a simple add/add conflict
with the pinctrl tree as a result.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJR0BkmAAoJELSic+t+oim94wgP/A+a0uJNxkQ3VK2myUU42VzA
LkiSgmpV/IsywyMJjV+/WgSPXv5BALjWdoHqaPGxEzbVTrQdxTVWhrlPsFAu7rLo
dQXoAXckvyhaw+GlJNpUkpIrNV3qxZN9eT8/Lm16pehXzllZif7CynJk6F5NQgMw
32HKuNFJxig+NMDzbeID1aSTg5yCsU+TCB40J7naYDAGIBXwNsXwGmVwoTJi6513
xWEJ8KvQ5F2C5PCUass+9Cozil/H95V1Vvei5qyo7aVG1Z2SF4ueC8sRZgULvTr/
wpPt/ia8TnjQcjYvnFVWyiiCGDmmYB+CQHxtIjsLVYoaBb2FsLEVfscYD+84+EAz
mQqEKxLIPfYvzZmU8zxcdXzDkD+Ztm0T8HJWrKwIWfBiKgrSk6R2kegFOrCrqmLX
cVHW3RXVZM3oW8G9T5FGR5fzh9acnAvvTKstSPnpMXTRLKozPG6G61+FtjDQNvxI
0IGgNnkZCxGFmVLAxzX/Z4WmuwARO+dSbY2t92qlOhfRLVJ8VR5WVu+ECDYDSBUD
U0EhXfmu2UJdClY2T+lw3TRo3F7hKHx5+C6cS6pNZC43lKtGWu8qClFmdJ+Y2Pzp
4yRUvKXjfnyuRNSYaIRcjxJQ7dPVfxsUz3w9cak4V/Gi2u/1cbbTjS+Wob1+jdEu
9ldwQ9d3gMMVWR5yb/Z4
=8WLH
-----END PGP SIGNATURE-----
Merge tag 'regmap-v3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"A small but useful set of regmap updates this time around:
- An abstraction for bitfields within a register map contributed by
Srinivas Kandagatla, allowing drivers to cope more easily when
hardware designers randomly move things about (mainly when talking
to things like system controllers).
- Changes from Lars-Peter Clausen to allow the MMIO regmap to be used
from hard IRQ context.
- Small improvements to the cache infrastructure and performance,
including a default cache sync operation so now all regmaps can
sync easily.
There's also a pinctrl driver making use of the new bitfield API,
merged here for dependency reasons. There will be a simple add/add
conflict with the pinctrl tree as a result."
* tag 'regmap-v3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
pinctrl: st: Remove unnecessary use of of_match_ptr macro
pinctrl: st: fix return value check
pinctrl: st: Add pinctrl and pinconf support.
regmap: debugfs: Suppress cache for partial register files
regmap: Add regmap_field APIs
regmap: core: Cache all registers by default when cache is enabled
regmap: Implemented default cache sync operation
regmap: Make regmap-mmio usable from atomic contexts
regmap: regcache: Fixup locking for custom lock callbacks
regmap: debugfs: Fix return from regmap_debugfs_get_dump_start
regmap: debugfs: Don't mark lockdep as broken due to debugfs write
regmap: rbtree: Use range information to allocate nodes
regmap: rbtree: Factor out node allocation
regmap: Make regmap_check_range_table() a public API
regmap: Add support for discarding parts of the register cache
Pull x86 tracing updates from Ingo Molnar:
"This tree adds IRQ vector tracepoints that are named after the handler
and which output the vector #, based on a zero-overhead approach that
relies on changing the IDT entries, by Seiji Aguchi.
The new tracepoints look like this:
# perf list | grep -i irq_vector
irq_vectors:local_timer_entry [Tracepoint event]
irq_vectors:local_timer_exit [Tracepoint event]
irq_vectors:reschedule_entry [Tracepoint event]
irq_vectors:reschedule_exit [Tracepoint event]
irq_vectors:spurious_apic_entry [Tracepoint event]
irq_vectors:spurious_apic_exit [Tracepoint event]
irq_vectors:error_apic_entry [Tracepoint event]
irq_vectors:error_apic_exit [Tracepoint event]
[...]"
* 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tracing: Add config option checking to the definitions of mce handlers
trace,x86: Do not call local_irq_save() in load_current_idt()
trace,x86: Move creation of irq tracepoints from apic.c to irq.c
x86, trace: Add irq vector tracepoints
x86: Rename variables for debugging
x86, trace: Introduce entering/exiting_irq()
tracing: Add DEFINE_EVENT_FN() macro
Pull perf updates from Ingo Molnar:
"Kernel improvements:
- watchdog driver improvements by Li Zefan
- Power7 CPI stack events related improvements by Sukadev Bhattiprolu
- event multiplexing via hrtimers and other improvements by Stephane
Eranian
- kernel stack use optimization by Andrew Hunter
- AMD IOMMU uncore PMU support by Suravee Suthikulpanit
- NMI handling rate-limits by Dave Hansen
- various hw_breakpoint fixes by Oleg Nesterov
- hw_breakpoint overflow period sampling and related signal handling
fixes by Jiri Olsa
- Intel Haswell PMU support by Andi Kleen
Tooling improvements:
- Reset SIGTERM handler in workload child process, fix from David
Ahern.
- Makefile reorganization, prep work for Kconfig patches, from Jiri
Olsa.
- Add automated make test suite, from Jiri Olsa.
- Add --percent-limit option to 'top' and 'report', from Namhyung
Kim.
- Sorting improvements, from Namhyung Kim.
- Expand definition of sysfs format attribute, from Michael Ellerman.
Tooling fixes:
- 'perf tests' fixes from Jiri Olsa.
- Make Power7 CPI stack events available in sysfs, from Sukadev
Bhattiprolu.
- Handle death by SIGTERM in 'perf record', fix from David Ahern.
- Fix printing of perf_event_paranoid message, from David Ahern.
- Handle realloc failures in 'perf kvm', from David Ahern.
- Fix divide by 0 in variance, from David Ahern.
- Save parent pid in thread struct, from David Ahern.
- Handle JITed code in shared memory, from Andi Kleen.
- Fixes for 'perf diff', from Jiri Olsa.
- Remove some unused struct members, from Jiri Olsa.
- Add missing liblk.a dependency for python/perf.so, fix from Jiri
Olsa.
- Respect CROSS_COMPILE in liblk.a, from Rabin Vincent.
- No need to do locking when adding hists in perf report, only 'top'
needs that, from Namhyung Kim.
- Fix alignment of symbol column in in the hists browser (top,
report) when -v is given, from NAmhyung Kim.
- Fix 'perf top' -E option behavior, from Namhyung Kim.
- Fix bug in isupper() and islower(), from Sukadev Bhattiprolu.
- Fix compile errors in bp_signal 'perf test', from Sukadev
Bhattiprolu.
... and more things"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (102 commits)
perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable()
perf/x86: Fix shared register mutual exclusion enforcement
perf/x86/intel: Support full width counting
x86: Add NMI duration tracepoints
perf: Drop sample rate when sampling is too slow
x86: Warn when NMI handlers take large amounts of time
hw_breakpoint: Introduce "struct bp_cpuinfo"
hw_breakpoint: Simplify *register_wide_hw_breakpoint()
hw_breakpoint: Introduce cpumask_of_bp()
hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths
hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths
perf/x86/intel: Add mem-loads/stores support for Haswell
perf/x86/intel: Support Haswell/v4 LBR format
perf/x86/intel: Move NMI clearing to end of PMI handler
perf/x86/intel: Add Haswell PEBS support
perf/x86/intel: Add simple Haswell PMU support
perf/x86/intel: Add Haswell PEBS record support
perf/x86/intel: Fix sparse warning
perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation
perf/x86/amd: Add IOMMU Performance Counter resource management
...
Translate the bitfields used in various flags argument to strings to
make the tracepoint output more human-readable.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Old gcc doesnt like the struct hack, and it is kind of ugly. So finish
off the work to convert pr_debug() statements to tracepoints, and delete
pkey()/pbtree().
Signed-off-by: Kent Overstreet <koverstreet@google.com>
The tracepoints were reworked to be more sensible, and fixed a null
pointer deref in one of the tracepoints.
Converted some of the pr_debug()s to tracepoints - this is partly a
performance optimization; it used to be that with DEBUG or
CONFIG_DYNAMIC_DEBUG pr_debug() was an empty macro; but at some point it
was changed to an empty inline function.
Some of the pr_debug() statements had rather expensive function calls as
part of the arguments, so this code was getting run unnecessarily even
on non debug kernels - in some fast paths, too.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Adds tracepoints to dev_pm_qos_add_request, dev_pm_qos_update_request,
and dev_pm_qos_remove_request. It's useful for checking device name,
dev_pm_qos_request_type, and value.
Signed-off-by: Sahara <keun-o.park@windriver.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Adds tracepoints to pm_qos_add_request, pm_qos_update_request,
pm_qos_remove_request, and pm_qos_update_request_timeout.
It's useful for checking pm_qos_class, value, and timeout_us.
Signed-off-by: Sahara <keun-o.park@windriver.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch adds tracepoints to pm_qos_update_target and
pm_qos_update_flags. It's useful for checking pm qos action,
previous value and current value.
Signed-off-by: Sahara <keun-o.park@windriver.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch has been invaluable in my adventures finding
issues in the perf NMI handler. I'm as big a fan of
printk() as anybody is, but using printk() in NMIs is
deadly when they're happening frequently.
Even hacking in trace_printk() ended up eating enough
CPU to throw off some of the measurements I was making.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen <dave@sr71.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Each TRACE_EVENT() adds several helper functions. If two or more trace events
share the same structure and print format, they can also share most of these
helper functions and save a lot of space from duplicate code. This is why the
DECLARE_EVENT_CLASS() and DEFINE_EVENT() were created.
Some events require a trigger to be called at registering and unregistering of
the event and to do so they use TRACE_EVENT_FN().
If multiple events require a trigger, they currently have no choice but to use
TRACE_EVENT_FN() as there's no DEFINE_EVENT_FN() available. This unfortunately
causes a lot of wasted duplicate code created.
By adding a DEFINE_EVENT_FN(), these events can still use a
DECLARE_EVENT_CLASS() and then define their own triggers.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/51C3236C.8030508@hds.com
Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Adding new flags to keep tracepoints consistent with btrfs.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Rename ext4_da_writepages() to ext4_writepages() and use it for all
modes. We still need to iterate over all the pages in the case of
data=journalling, but in the case of nodelalloc/data=ordered (which is
what file systems mounted using ext3 backwards compatibility will use)
this will allow us to use a much more efficient I/O submission path.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
There are two issues with current writeback path in ext4. For one we
don't necessarily map complete pages when blocksize < pagesize and
thus needn't do any writeback in one iteration. We always map some
blocks though so we will eventually finish mapping the page. Just if
writeback races with other operations on the file, forward progress is
not really guaranteed. The second problem is that current code
structure makes it hard to associate all the bios to some range of
pages with one io_end structure so that unwritten extents can be
converted after all the bios are finished. This will be especially
difficult later when io_end will be associated with reserved
transaction handle.
We restructure the writeback path to a relatively simple loop which
first prepares extent of pages, then maps one or more extents so that
no page is partially mapped, and once page is fully mapped it is
submitted for IO. We keep all the mapping and IO submission
information in mpage_da_data structure to somewhat reduce stack usage.
Resulting code is somewhat shorter than the old one and hopefully also
easier to read.
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Additionally change cast from long to unsigned long to follow specificator.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Currently punch hole is disabled in file systems with bigalloc
feature enabled. However the recent changes in punch hole patch should
make it easier to support punching holes on bigalloc enabled file
systems.
This commit changes partial_cluster handling in ext4_remove_blocks(),
ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
partial_cluster is unsigned long long type and it makes sure that we
will free the partial cluster if all extents has been released from that
cluster. However it has been specifically designed only for truncate.
With punch hole we can be freeing just some extents in the cluster
leaving the rest untouched. So we have to make sure that we will notice
cluster which still has some extents. To do this I've changed
partial_cluster to be signed long long type. The only scenario where
this could be a problem is when cluster_size == block size, however in
that case there would not be any partial clusters so we're safe. For
bigger clusters the signed type is enough. Now we use the negative value
in partial_cluster to mark such cluster used, hence we know that we must
not free it even if all other extents has been freed from such cluster.
This scenario can be described in simple diagram:
|FFF...FF..FF.UUU|
^----------^
punch hole
. - free space
| - cluster boundary
F - freed extent
U - used extent
Also update respective tracepoints to use signed long long type for
partial_cluster.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
->invalidatepage() aop now accepts range to invalidate so we can make
use of it in journal_invalidatepage() and all the users in ext3 file
system. Also update ext3 trace point to print out length argument.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
->invalidatepage() aop now accepts range to invalidate so we can make
use of it in all ext4 invalidatepage routines.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
regression) introduced during the 3.10-rc1 merge window. Also
included is a bug fix relating to allocating blocks after resizing an
ext3 file system when using the ext4 file system driver.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJRkZBlAAoJENNvdpvBGATwLYQP/iWOBs2z93WG23cqkgqvL8o6
ZyeJdgy9dkFCArVDX5SSnGkJXZ3iqIKi5HoTKTJKfytgMzgiDAZcLsIHVv6NczwR
UGhjgS3HEdV5tJ46E6JnpB3NLSb+rAdc5kCdlsbzU46CP+JjFiYEhxVpK7ELuM/G
yctChbIH9FY+1OwxHccacBOaJU2ELhnH6B/8Ry/6gM2H0vfKeTNOdocOHdxvbNqg
ooGjytMfVopMQEfVG8aXtTfy341NFJH5fAYEahCcXxeO9ta6Unj9yOu5JV2wVrTt
39+DBsquGX6AVQsc9IxJ6YAN6ldwWN7l3huE9/AI0o/alwGsfVi5M+M/d1MMjDqf
Fgl2EzzBpZQeKKY9UXNi4LLgYdBiILMgKDOGoRKhRb8ynSSf/JX43+24FvidEi3o
o//J4aR+oSZfaovGAeikqyF1cumayhoNN8MINRN8igIinBiC4GjBFEl/Kl/1eAY/
lREGcsmYPXOkVPpM72waRYlP4GwNdOg4QSEY0SGljpwluO+dYtKQjHXcv/s/xL5v
j3GemzYVyjx4zaq1g3PxGfuD6VKFHr0T6jvzd6cHu17lnPlw9fwznHbEm9BEcXDY
gbGx9u+a2ZTqDwYVALbeoRpf9Zz6DUCse3ts4N3rbkXUQQiBYo7tybfVopIMAukb
CexvidDE/ryJrJJFBwoK
=6cRD
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 update from Ted Ts'o:
"Fixed regressions (two stability regressions and a performance
regression) introduced during the 3.10-rc1 merge window.
Also included is a bug fix relating to allocating blocks after
resizing an ext3 file system when using the ext4 file system driver"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd,jbd2: fix oops in jbd2_journal_put_journal_head()
ext4: revert "ext4: use io_end for multiple bios"
ext4: limit group search loop for non-extent files
ext4: fix fio regression
Allow drivers to discard parts of the register cache, for example if part
of the hardware has been reset.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
This patch-set includes the following major enhancement patches.
o introduce a new gloabl lock scheme
o add tracepoints on several major functions
o fix the overall cleaning process focused on victim selection
o apply the block plugging to merge IOs as much as possible
o enhance management of free nids and its list
o enhance the readahead mode for node pages
o address several cretical deadlock conditions
o reduce lock_page calls
The other minor bug fixes and enhancements are as follows.
o calculation mistakes: overflow
o bio types: READ, READA, and READ_SYNC
o fix the recovery flow, data races, and null pointer errors
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRijCLAAoJEEAUqH6CSFDSg9kQAIqxmQzCUvCN3HcyVe8bGhKz
8xhKrAY6ySRCKMuBbFRQsNrXUhckE3A44DgzYm5/gQikr/c8zhbqPVrtZ968eCKb
wm3J+Re/uwZr5eOXlJEaHIiSkMDtERN7Cu2oYJWZi2B9wCSZcgvoWQ3c3LUVk6yF
GFdi1Y00ll5tFKbEGbXSsfdul9P8jp0MmuMnWBBQZF3TrjETXMdThA5FXN0yTf9s
XkcGE9vTCCPk8p7P3YmGGw6CwlaL8oallm0//iL4nMNpJzveq2C09IlY2BNrxU3L
iTNXeIBdbhwXpnh2zq26Cy+cIEDIp0oXYui5BYdr/LWyWU3T/INa+hjUUszsESxF
51LIUA1rA9nX/BSmj2QomswZ3lt4u5jl6rSBFKv3NG1KsFrAdb8S4tHukRSTSxAJ
gzpY6kLT1+bgciA16F5W4yhzMYPN5hPa8s6hx4LHlpoqQICQsurjtS9KW7vncLFt
ttmCMn8ehHcTzKRNNqYaBerCtSB3Z3G/uAy1y+DB7Zx2h2mqhCBXRalyRvs7RKvK
d5OyYCpHntxuzDwVuivnr9Ddp30LUP1WqexxK+ykn99Ji3leMmffHP8Oari8w96b
RxSbjoo8hOgoS5xZ4v3AaqtLDlBpxC6oWJzDaq/fJeKxOx22Z5BDFUM9mBGxrouJ
AATl8b+cW/aTZ4l7WOPU
=Hqii
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"This patch-set includes the following major enhancement patches.
- introduce a new gloabl lock scheme
- add tracepoints on several major functions
- fix the overall cleaning process focused on victim selection
- apply the block plugging to merge IOs as much as possible
- enhance management of free nids and its list
- enhance the readahead mode for node pages
- address several cretical deadlock conditions
- reduce lock_page calls
The other minor bug fixes and enhancements are as follows.
- calculation mistakes: overflow
- bio types: READ, READA, and READ_SYNC
- fix the recovery flow, data races, and null pointer errors"
* tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
f2fs: cover free_nid management with spin_lock
f2fs: optimize scan_nat_page()
f2fs: code cleanup for scan_nat_page() and build_free_nids()
f2fs: bugfix for alloc_nid_failed()
f2fs: recover when journal contains deleted files
f2fs: continue to mount after failing recovery
f2fs: avoid deadlock during evict after f2fs_gc
f2fs: modify the number of issued pages to merge IOs
f2fs: remove useless #include <linux/proc_fs.h> as we're now using sysfs as debug entry.
f2fs: fix inconsistent using of NM_WOUT_THRESHOLD
f2fs: check truncation of mapping after lock_page
f2fs: enhance alloc_nid and build_free_nids flows
f2fs: add a tracepoint on f2fs_new_inode
f2fs: check nid == 0 in add_free_nid
f2fs: add REQ_META about metadata requests for submit
f2fs: give a chance to merge IOs by IO scheduler
f2fs: avoid frequent background GC
f2fs: add tracepoints to debug checkpoint request
f2fs: add tracepoints for write page operations
f2fs: add tracepoints to debug the block allocation
...
Pull block driver updates from Jens Axboe:
"It might look big in volume, but when categorized, not a lot of
drivers are touched. The pull request contains:
- mtip32xx fixes from Micron.
- A slew of drbd updates, this time in a nicer series.
- bcache, a flash/ssd caching framework from Kent.
- Fixes for cciss"
* 'for-3.10/drivers' of git://git.kernel.dk/linux-block: (66 commits)
bcache: Use bd_link_disk_holder()
bcache: Allocator cleanup/fixes
cciss: bug fix to prevent cciss from loading in kdump crash kernel
cciss: add cciss_allow_hpsa module parameter
drivers/block/mg_disk.c: add CONFIG_PM_SLEEP to suspend/resume functions
mtip32xx: Workaround for unaligned writes
bcache: Make sure blocksize isn't smaller than device blocksize
bcache: Fix merge_bvec_fn usage for when it modifies the bvm
bcache: Correctly check against BIO_MAX_PAGES
bcache: Hack around stuff that clones up to bi_max_vecs
bcache: Set ra_pages based on backing device's ra_pages
bcache: Take data offset from the bdev superblock.
mtip32xx: mtip32xx: Disable TRIM support
mtip32xx: fix a smatch warning
bcache: Disable broken btree fuzz tester
bcache: Fix a format string overflow
bcache: Fix a minor memory leak on device teardown
bcache: Documentation updates
bcache: Use WARN_ONCE() instead of __WARN()
bcache: Add missing #include <linux/prefetch.h>
...
Pull block core updates from Jens Axboe:
- Major bit is Kents prep work for immutable bio vecs.
- Stable candidate fix for a scheduling-while-atomic in the queue
bypass operation.
- Fix for the hang on exceeded rq->datalen 32-bit unsigned when merging
discard bios.
- Tejuns changes to convert the writeback thread pool to the generic
workqueue mechanism.
- Runtime PM framework, SCSI patches exists on top of these in James'
tree.
- A few random fixes.
* 'for-3.10/core' of git://git.kernel.dk/linux-block: (40 commits)
relay: move remove_buf_file inside relay_close_buf
partitions/efi.c: replace useless kzalloc's by kmalloc's
fs/block_dev.c: fix iov_shorten() criteria in blkdev_aio_read()
block: fix max discard sectors limit
blkcg: fix "scheduling while atomic" in blk_queue_bypass_start
Documentation: cfq-iosched: update documentation help for cfq tunables
writeback: expose the bdi_wq workqueue
writeback: replace custom worker pool implementation with unbound workqueue
writeback: remove unused bdi_pending_list
aoe: Fix unitialized var usage
bio-integrity: Add explicit field for owner of bip_buf
block: Add an explicit bio flag for bios that own their bvec
block: Add bio_alloc_pages()
block: Convert some code to bio_for_each_segment_all()
block: Add bio_for_each_segment_all()
bounce: Refactor __blk_queue_bounce to not use bi_io_vec
raid1: use bio_copy_data()
pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage
pktcdvd: use bio_copy_data()
block: Add bio_copy_data()
...
Pull kvm updates from Gleb Natapov:
"Highlights of the updates are:
general:
- new emulated device API
- legacy device assignment is now optional
- irqfd interface is more generic and can be shared between arches
x86:
- VMCS shadow support and other nested VMX improvements
- APIC virtualization and Posted Interrupt hardware support
- Optimize mmio spte zapping
ppc:
- BookE: in-kernel MPIC emulation with irqfd support
- Book3S: in-kernel XICS emulation (incomplete)
- Book3S: HV: migration fixes
- BookE: more debug support preparation
- BookE: e6500 support
ARM:
- reworking of Hyp idmaps
s390:
- ioeventfd for virtio-ccw
And many other bug fixes, cleanups and improvements"
* tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
kvm: Add compat_ioctl for device control API
KVM: x86: Account for failing enable_irq_window for NMI window request
KVM: PPC: Book3S: Add API for in-kernel XICS emulation
kvm/ppc/mpic: fix missing unlock in set_base_addr()
kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
kvm/ppc/mpic: remove users
kvm/ppc/mpic: fix mmio region lists when multiple guests used
kvm/ppc/mpic: remove default routes from documentation
kvm: KVM_CAP_IOMMU only available with device assignment
ARM: KVM: iterate over all CPUs for CPU compatibility check
KVM: ARM: Fix spelling in error message
ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
KVM: ARM: Fix API documentation for ONE_REG encoding
ARM: KVM: promote vfp_host pointer to generic host cpu context
ARM: KVM: add architecture specific hook for capabilities
ARM: KVM: perform HYP initilization for hotplugged CPUs
ARM: KVM: switch to a dual-step HYP init code
ARM: KVM: rework HYP page table freeing
ARM: KVM: enforce maximum size for identity mapped code
ARM: KVM: move to a KVM provided HYP idmap
...
Pull 'full dynticks' support from Ingo Molnar:
"This tree from Frederic Weisbecker adds a new, (exciting! :-) core
kernel feature to the timer and scheduler subsystems: 'full dynticks',
or CONFIG_NO_HZ_FULL=y.
This feature extends the nohz variable-size timer tick feature from
idle to busy CPUs (running at most one task) as well, potentially
reducing the number of timer interrupts significantly.
This feature got motivated by real-time folks and the -rt tree, but
the general utility and motivation of full-dynticks runs wider than
that:
- HPC workloads get faster: CPUs running a single task should be able
to utilize a maximum amount of CPU power. A periodic timer tick at
HZ=1000 can cause a constant overhead of up to 1.0%. This feature
removes that overhead - and speeds up the system by 0.5%-1.0% on
typical distro configs even on modern systems.
- Real-time workload latency reduction: CPUs running critical tasks
should experience as little jitter as possible. The last remaining
source of kernel-related jitter was the periodic timer tick.
- A single task executing on a CPU is a pretty common situation,
especially with an increasing number of cores/CPUs, so this feature
helps desktop and mobile workloads as well.
The cost of the feature is mainly related to increased timer
reprogramming overhead when a CPU switches its tick period, and thus
slightly longer to-idle and from-idle latency.
Configuration-wise a third mode of operation is added to the existing
two NOHZ kconfig modes:
- CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
as a config option. This is the traditional Linux periodic tick
design: there's a HZ tick going on all the time, regardless of
whether a CPU is idle or not.
- CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
periodic tick when a CPU enters idle mode.
- CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
tick when a CPU is idle, also slows the tick down to 1 Hz (one
timer interrupt per second) when only a single task is running on a
CPU.
The .config behavior is compatible: existing !CONFIG_NO_HZ and
CONFIG_NO_HZ=y settings get translated to the new values, without the
user having to configure anything. CONFIG_NO_HZ_FULL is turned off by
default.
This feature is based on a lot of infrastructure work that has been
steadily going upstream in the last 2-3 cycles: related RCU support
and non-periodic cputime support in particular is upstream already.
This tree adds the final pieces and activates the feature. The pull
request is marked RFC because:
- it's marked 64-bit only at the moment - the 32-bit support patch is
small but did not get ready in time.
- it has a number of fresh commits that came in after the merge
window. The overwhelming majority of commits are from before the
merge window, but still some aspects of the tree are fresh and so I
marked it RFC.
- it's a pretty wide-reaching feature with lots of effects - and
while the components have been in testing for some time, the full
combination is still not very widely used. That it's default-off
should reduce its regression abilities and obviously there are no
known regressions with CONFIG_NO_HZ_FULL=y enabled either.
- the feature is not completely idempotent: there is no 100%
equivalent replacement for a periodic scheduler/timer tick. In
particular there's ongoing work to map out and reduce its effects
on scheduler load-balancing and statistics. This should not impact
correctness though, there are no known regressions related to this
feature at this point.
- it's a pretty ambitious feature that with time will likely be
enabled by most Linux distros, and we'd like you to make input on
its design/implementation, if you dislike some aspect we missed.
Without flaming us to crisp! :-)
Future plans:
- there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
the periodic tick altogether when there's a single busy task on a
CPU. We'd first like 1 Hz to be exposed more widely before we go
for the 0 Hz target though.
- once we reach 0 Hz we can remove the periodic tick assumption from
nr_running>=2 as well, by essentially interrupting busy tasks only
as frequently as the sched_latency constraints require us to do -
once every 4-40 msecs, depending on nr_running.
I am personally leaning towards biting the bullet and doing this in
v3.10, like the -rt tree this effort has been going on for too long -
but the final word is up to you as usual.
More technical details can be found in Documentation/timers/NO_HZ.txt"
* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
sched: Keep at least 1 tick per second for active dynticks tasks
rcu: Fix full dynticks' dependency on wide RCU nocb mode
nohz: Protect smp_processor_id() in tick_nohz_task_switch()
nohz_full: Add documentation.
cputime_nsecs: use math64.h for nsec resolution conversion helpers
nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
nohz: Reduce overhead under high-freq idling patterns
nohz: Remove full dynticks' superfluous dependency on RCU tree
nohz: Fix unavailable tick_stop tracepoint in dynticks idle
nohz: Add basic tracing
nohz: Select wide RCU nocb for full dynticks
nohz: Disable the tick when irq resume in full dynticks CPU
nohz: Re-evaluate the tick for the new task after a context switch
nohz: Prepare to stop the tick on irq exit
nohz: Implement full dynticks kick
nohz: Re-evaluate the tick from the scheduler IPI
sched: New helper to prevent from stopping the tick in full dynticks
sched: Kick full dynticks CPU that have more than one task enqueued.
perf: New helper to prevent full dynticks CPUs from stopping tick
perf: Kick full dynticks CPU if events rotation is needed
...
We (Linux Kernel Performance project) found a regression introduced
by commit:
f7fec032aa ext4: track all extent status in extent status tree
The commit causes about 20% performance decrease in fio random write
test. Profiler shows that rb_next() uses a lot of CPU time. The call
stack is:
rb_next
ext4_es_find_delayed_extent
ext4_map_blocks
_ext4_get_block
ext4_get_block_write
__blockdev_direct_IO
ext4_direct_IO
generic_file_direct_write
__generic_file_aio_write
ext4_file_write
aio_rw_vect_retry
aio_run_iocb
do_io_submit
sys_io_submit
system_call_fastpath
io_submit
td_io_getevents
io_u_queued_complete
thread_main
main
__libc_start_main
The cause is that ext4_es_find_delayed_extent() doesn't have an
upper bound, it keeps searching until a delayed extent is found.
When there are a lots of non-delayed entries in the extent state
tree, ext4_es_find_delayed_extent() may uses a lot of CPU time.
Reported-by: LKP project <lkp@linux.intel.com>
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Pull drm updates from Dave Airlie:
"This is the main drm pull request for 3.10.
Wierd bits:
- OMAP drm changes required OMAP dss changes, in drivers/video, so I
took them in here.
- one more fbcon fix for font handover
- VT switch avoidance in pm code
- scatterlist helpers for gpu drivers - have acks from akpm
Highlights:
- qxl kms driver - driver for the spice qxl virtual GPU
Nouveau:
- fermi/kepler VRAM compression
- GK110/nvf0 modesetting support.
Tegra:
- host1x core merged with 2D engine support
i915:
- vt switchless resume
- more valleyview support
- vblank fixes
- modesetting pipe config rework
radeon:
- UVD engine support
- SI chip tiling support
- GPU registers initialisation from golden values.
exynos:
- device tree changes
- fimc block support
Otherwise:
- bunches of fixes all over the place."
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (513 commits)
qxl: update to new idr interfaces.
drm/nouveau: fix build with nv50->nvc0
drm/radeon: fix handling of v6 power tables
drm/radeon: clarify family checks in pm table parsing
drm/radeon: consolidate UVD clock programming
drm/radeon: fix UPLL_REF_DIV_MASK definition
radeon: add bo tracking debugfs
drm/radeon: add new richland pci ids
drm/radeon: add some new SI PCI ids
drm/radeon: fix scratch reg handling for UVD fence
drm/radeon: allocate SA bo in the requested domain
drm/radeon: fix possible segfault when parsing pm tables
drm/radeon: fix endian bugs in atom_allocate_fb_scratch()
OMAPDSS: TFP410: return EPROBE_DEFER if the i2c adapter not found
OMAPDSS: VENC: Add error handling for venc_probe_pdata
OMAPDSS: HDMI: Add error handling for hdmi_probe_pdata
OMAPDSS: RFBI: Add error handling for rfbi_probe_pdata
OMAPDSS: DSI: Add error handling for dsi_probe_pdata
OMAPDSS: SDI: Add error handling for sdi_probe_pdata
OMAPDSS: DPI: Add error handling for dpi_probe_pdata
...
The full dynticks tree needs the latest RCU and sched
upstream updates in order to fix some dependencies.
Merge a common upstream merge point that has these
updates.
Conflicts:
include/linux/perf_event.h
kernel/rcutree.h
kernel/rcutree_plugin.h
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
feature this merge window is a new ioctl EXT4_IOC_SWAP_BOOT which
allows installation of a hidden inode designed for boot loaders.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJRfpDwAAoJENNvdpvBGATwsjwP/17V3AN6XTEhZK80p3/qN5YD
N2QIHeyYIqCGpczLs2TQEkxWX6nqpDggAPXY956wvgeEMQV+pQ+DLO4Ol9+p5WD2
hrklleYhtOjFQ3Xh4lqrEi5FzKVzWagVDLqgUjALJ+D+hkDB7ZQT/fm2sH45rzot
xBp3aVqANU8GqAAbEW4/Ng9ZGMx0dpANiU2svbjM71sv2dCLFmWAkz+GgZsMbuJZ
vnKIZP6I6plwP3LuZzEbVCA7F2PzC4ywEOJKjIEvgHpX6uMDR3FX8pD5Dlo/o6e2
eP+KLnD43mJMxBmTn22x5Sm0N6DUzJCEELRJWB9wCZoLdEvbEWRxT3qsPXfLWelG
2jj4bImXF2CqYEsJww5FV2WdXXdnuM57pZym5vMZGAFyKPSCJobA4Y3XRdXkBfXf
Gq/cFoPYv2EcBIhz3zrRj+tbY8esbO9wOnF6+x+AF10BspD2V7nuoVdWVhOf0A3v
i9ifGPwLk3e3xHr9oXheo7IWn52oviZeyD77d7D7MLhgn+xU4LaVhW3R63Q+mI4D
0TXG25R1CVcE7wyFy3gqSVXSCDO0JcQBL5LgcL+wAGXcHPAXqBpN2DFTPo+9fJH2
g3YMwr+wMbci1XRVQ2vdTt/nBZYjOCh6PgRmg3KjTz11Ra5EsjQvYjKWYwqf2RGn
QhCgbzd/qtZfNJztLvr7
=GCT2
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Mostly performance and bug fixes, plus some cleanups. The one new
feature this merge window is a new ioctl EXT4_IOC_SWAP_BOOT which
allows installation of a hidden inode designed for boot loaders."
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (50 commits)
ext4: fix type-widening bug in inode table readahead code
ext4: add check for inodes_count overflow in new resize ioctl
ext4: fix Kconfig documentation for CONFIG_EXT4_DEBUG
ext4: fix online resizing for ext3-compat file systems
jbd2: trace when lock_buffer in do_get_write_access takes a long time
ext4: mark metadata blocks using bh flags
buffer: add BH_Prio and BH_Meta flags
ext4: mark all metadata I/O with REQ_META
ext4: fix readdir error in case inline_data+^dir_index.
ext4: fix readdir error in the case of inline_data+dir_index
jbd2: use kmem_cache_zalloc instead of kmem_cache_alloc/memset
ext4: mext_insert_extents should update extent block checksum
ext4: move quota initialization out of inode allocation transaction
ext4: reserve xattr index for Rich ACL support
jbd2: reduce journal_head size
ext4: clear buffer_uninit flag when submitting IO
ext4: use io_end for multiple bios
ext4: make ext4_bio_write_page() use BH_Async_Write flags
ext4: Use kstrtoul() instead of parse_strtoul()
ext4: defragmentation code cleanup
...
Pull RCU updates from Ingo Molnar:
"The main changes in this cycle are mostly related to preparatory work
for the full-dynticks work:
- Remove restrictions on no-CBs CPUs, make RCU_FAST_NO_HZ take
advantage of numbered callbacks, do callback accelerations based on
numbered callbacks. Posted to LKML at
https://lkml.org/lkml/2013/3/18/960
- RCU documentation updates. Posted to LKML at
https://lkml.org/lkml/2013/3/18/570
- Miscellaneous fixes. Posted to LKML at
https://lkml.org/lkml/2013/3/18/594"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
rcu: Make rcu_accelerate_cbs() note need for future grace periods
rcu: Abstract rcu_start_future_gp() from rcu_nocb_wait_gp()
rcu: Rename n_nocb_gp_requests to need_future_gp
rcu: Push lock release to rcu_start_gp()'s callers
rcu: Repurpose no-CBs event tracing to future-GP events
rcu: Rearrange locking in rcu_start_gp()
rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks
rcu: Accelerate RCU callbacks at grace-period end
rcu: Export RCU_FAST_NO_HZ parameters to sysfs
rcu: Distinguish "rcuo" kthreads by RCU flavor
rcu: Add event tracing for no-CBs CPUs' grace periods
rcu: Add event tracing for no-CBs CPUs' callback registration
rcu: Introduce proper blocking to no-CBs kthreads GP waits
rcu: Provide compile-time control for no-CBs CPUs
rcu: Tone down debugging during boot-up and shutdown.
rcu: Add softirq-stall indications to stall-warning messages
rcu: Documentation update
rcu: Make bugginess of code sample more evident
rcu: Fix hlist_bl_set_first_rcu() annotation
rcu: Delete unused rcu_node "wakemask" field
...
Merge second batch of fixes from Andrew Morton:
- various misc bits
- some printk updates
- a new "SRAM" driver.
- MAINTAINERS updates
- the backlight driver queue
- checkpatch updates
- a few init/ changes
- a huge number of drivers/rtc changes
- fatfs updates
- some lib/idr.c work
- some renaming of the random driver interfaces
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (285 commits)
net: rename random32 to prandom
net/core: remove duplicate statements by do-while loop
net/core: rename random32() to prandom_u32()
net/netfilter: rename random32() to prandom_u32()
net/sched: rename random32() to prandom_u32()
net/sunrpc: rename random32() to prandom_u32()
scsi: rename random32() to prandom_u32()
lguest: rename random32() to prandom_u32()
uwb: rename random32() to prandom_u32()
video/uvesafb: rename random32() to prandom_u32()
mmc: rename random32() to prandom_u32()
drbd: rename random32() to prandom_u32()
kernel/: rename random32() to prandom_u32()
mm/: rename random32() to prandom_u32()
lib/: rename random32() to prandom_u32()
x86: rename random32() to prandom_u32()
x86: pageattr-test: remove srandom32 call
uuid: use prandom_bytes()
raid6test: use prandom_bytes()
sctp: convert sctp_assoc_set_id() to use idr_alloc_cyclic()
...
Commit 7ff9554bb5 ("printk: convert byte-buffer to variable-length
record buffer") removed start and end parameters from
call_console_drivers, but those parameters still exist in
include/trace/events/printk.h.
Without start and end parameters handling, printk tracing became more
simple as: trace_console(text, len);
Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge first batch of fixes from Andrew Morton:
- A couple of kthread changes
- A few minor audit patches
- A number of fbdev patches. Florian remains AWOL so I'm picking up
some of these.
- A few kbuild things
- ocfs2 updates
- Almost all of the MM queue
(And in the meantime, I already have the second big batch from Andrew
pending in my mailbox ;^)
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (149 commits)
memcg: take reference before releasing rcu_read_lock
mem hotunplug: fix kfree() of bootmem memory
mmKconfig: add an option to disable bounce
mm, nobootmem: do memset() after memblock_reserve()
mm, nobootmem: clean-up of free_low_memory_core_early()
fs/buffer.c: remove unnecessary init operation after allocating buffer_head.
numa, cpu hotplug: change links of CPU and node when changing node number by onlining CPU
mm: fix memory_hotplug.c printk format warning
mm: swap: mark swap pages writeback before queueing for direct IO
swap: redirty page if page write fails on swap file
mm, memcg: give exiting processes access to memory reserves
thp: fix huge zero page logic for page with pfn == 0
memcg: avoid accessing memcg after releasing reference
fs: fix fsync() error reporting
memblock: fix missing comment of memblock_insert_region()
mm: Remove unused parameter of pages_correctly_reserved()
firmware, memmap: fix firmware_map_entry leak
mm/vmstat: add note on safety of drain_zonestat
mm: thp: add split tail pages to shrink page list in page reclaim
mm: allow for outstanding swap writeback accounting
...
In user visible terms just a couple of enhancements here, though there
was a moderate amount of refactoring required in order to support the
register cache sync performance improvements.
- Support for block and asynchronous I/O during register cache syncing;
this provides a use case dependant performance improvement.
- Additional debugfs information on the memory consuption and register
set.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIbBAABAgAGBQJRfoZRAAoJELSic+t+oim9gXAP+JhAihmIQJlhUxZkXojFhClD
SKNWuFHmFC6VGndv52HPZR7nLN6hIlT4VUqk/rEw58R/RTqGuuWGc0KnKJf7ipid
6CdutuOP6q8mgs02kGKFAWRbSl++IXJ4TwvBbiyDMBmmngFoJY+gnmtnpP+PzcAd
LA3fn54jDWzBKCSlFBEC5acYxOMPmzm2uW13mO8Gy1RJrUkXfOemEFsyP0NVNJys
N0Zslp4nUUWmEu41UujuAUGZ7xXnnNQF5R4/RdS3+p22+sCEe7/mhLU1AxalUT4c
m9h9U2UKoXqRBuFQ9kRGwM2Gufjg33DoB0ExqIDEgaD2kRdAdAo/WhTHLxTiQEfq
6YXGZYwl0QUC1KcUwUWJZIq/nECibaYDAoyooNzLQNPAbbO6gdjsTIVCaZK8U/k6
D8bWAM4eRbv6xwXEd8rKW5+2f41dnsb5O3OgbdEEBZnbQ8UizI9KDGbPB3ARV2RI
Xqn+lYZV/q/99Bb3Pn0oS6Ud/tz5BqN4w3N84H0KcvcRHXvYjkdQ6ulsterRykOa
gYWfsCKTbm2C1zBLGDPXkDablodLZmzoCs4ajeIt6zIELNzuIsI3trprpT85RtrS
cjYl61ECuypPYBIW4uzxxBk/FeiEjQ4ndgQ4MgVnUfx0NpmG2N9LlDc2r6i+UgV/
EBxvYlPsEzQYLKoiJl8=
=RG1W
-----END PGP SIGNATURE-----
Merge tag 'regmap-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"In user visible terms just a couple of enhancements here, though there
was a moderate amount of refactoring required in order to support the
register cache sync performance improvements.
- Support for block and asynchronous I/O during register cache
syncing; this provides a use case dependant performance
improvement.
- Additional debugfs information on the memory consuption and
register set"
* tag 'regmap-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: (23 commits)
regmap: don't corrupt work buffer in _regmap_raw_write()
regmap: cache: Fix format specifier in dev_dbg
regmap: cache: Make regcache_sync_block_raw static
regmap: cache: Write consecutive registers in a single block write
regmap: cache: Split raw and non-raw syncs
regmap: cache: Factor out block sync
regmap: cache: Factor out reg_present support from rbtree cache
regmap: cache: Use raw I/O to sync rbtrees if we can
regmap: core: Provide regmap_can_raw_write() operation
regmap: cache: Provide a get address of value operation
regmap: Cut down on the average # of nodes in the rbtree cache
regmap: core: Make raw write available to regcache
regmap: core: Warn on invalid operation combinations
regmap: irq: Clarify error message when we fail to request primary IRQ
regmap: rbtree Expose total memory consumption in the rbtree debugfs entry
regmap: debugfs: Add a registers `range' file
regmap: debugfs: Simplify calculation of `c->max_reg'
regmap: cache: Store caches in native register format where possible
regmap: core: Split out in place value parsing
regmap: cache: Use regcache_get_value() to check if we updated
...
Use the events API to trace filemap loading and unloading of file pieces
into the page cache.
This patch aims at tracing the eviction reload cycle of executable and
shared libraries pages in a memory constrained environment.
The typical usage is to spot a specific device and inode (for example
/lib/libc.so) to see the eviction cycles, and find out if frequently
used code is rather spread across many pages (bad) or coallesced (good).
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Along with the usual minor fixes and clean ups there are a few major
changes with this pull request.
1) Multiple buffers for the ftrace facility
This feature has been requested by many people over the last few years.
I even heard that Google was about to implement it themselves. I finally
had time and cleaned up the code such that you can now create multiple
instances of the ftrace buffer and have different events go to different
buffers. This way, a low frequency event will not be lost in the noise
of a high frequency event.
Note, currently only events can go to different buffers, the tracers
(ie. function, function_graph and the latency tracers) still can only
be written to the main buffer.
2) The function tracer triggers have now been extended.
The function tracer had two triggers. One to enable tracing when a
function is hit, and one to disable tracing. Now you can record a
stack trace on a single (or many) function(s), take a snapshot of the
buffer (copy it to the snapshot buffer), and you can enable or disable
an event to be traced when a function is hit.
3) A perf clock has been added.
A "perf" clock can be chosen to be used when tracing. This will cause
ftrace to use the same clock as perf uses, and hopefully this will make
it easier to interleave the perf and ftrace data for analysis.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJRfnTPAAoJEOdOSU1xswtMqYYH/1WIdrwXmxHflErnYkCIr3sU
QtYae2K5A1HcgiqOvRJrdWMOt016iMx5CaQQyBFM1vvMiPY0sTWRmwNxDfZzz9LN
10jRvWEzZSLtzl+a9mkFWLEpr5nR/QODOxkWFCnRWscp46sp04LSTxGDYsOnPQZB
sam/AQ1h4xA+DqDBChm9BDEUEPorGleTlN54LBaCGgSFGvrbF+eAg2s4vHNAQAvQ
8d5xjSE9zC7J+FqbVxvJTbKI3+EqKL6hMsJKsKfi0SI+FuxBaFMSltXck5zKyTI4
HpNJzXCmw+v90Tju7oMkPHh6RTbESPCHoGU+wqE52fM6m7oScVeuI/kfc6USwU4=
=W1n+
-----END PGP SIGNATURE-----
Merge tag 'trace-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"Along with the usual minor fixes and clean ups there are a few major
changes with this pull request.
1) Multiple buffers for the ftrace facility
This feature has been requested by many people over the last few
years. I even heard that Google was about to implement it themselves.
I finally had time and cleaned up the code such that you can now
create multiple instances of the ftrace buffer and have different
events go to different buffers. This way, a low frequency event will
not be lost in the noise of a high frequency event.
Note, currently only events can go to different buffers, the tracers
(ie function, function_graph and the latency tracers) still can only
be written to the main buffer.
2) The function tracer triggers have now been extended.
The function tracer had two triggers. One to enable tracing when a
function is hit, and one to disable tracing. Now you can record a
stack trace on a single (or many) function(s), take a snapshot of the
buffer (copy it to the snapshot buffer), and you can enable or disable
an event to be traced when a function is hit.
3) A perf clock has been added.
A "perf" clock can be chosen to be used when tracing. This will cause
ftrace to use the same clock as perf uses, and hopefully this will
make it easier to interleave the perf and ftrace data for analysis."
* tag 'trace-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (82 commits)
tracepoints: Prevent null probe from being added
tracing: Compare to 1 instead of zero for is_signed_type()
tracing: Remove obsolete macro guard _TRACE_PROFILE_INIT
ftrace: Get rid of ftrace_profile_bits
tracing: Check return value of tracing_init_dentry()
tracing: Get rid of unneeded key calculation in ftrace_hash_move()
tracing: Reset ftrace_graph_filter_enabled if count is zero
tracing: Fix off-by-one on allocating stat->pages
kernel: tracing: Use strlcpy instead of strncpy
tracing: Update debugfs README file
tracing: Fix ftrace_dump()
tracing: Rename trace_event_mutex to trace_event_sem
tracing: Fix comment about prefix in arch_syscall_match_sym_name()
tracing: Convert trace_destroy_fields() to static
tracing: Move find_event_field() into trace_events.c
tracing: Use TRACE_MAX_PRINT instead of constant
tracing: Use pr_warn_once instead of open coded implementation
ring-buffer: Add ring buffer startup selftest
tracing: Bring Documentation/trace/ftrace.txt up to date
tracing: Add "perf" trace_clock
...
Conflicts:
kernel/trace/ftrace.c
kernel/trace/trace.c
This can help when debugging the free nid allocation flows.
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
The current irq_comm.c file contains pieces of code that are generic
across different irqchip implementations, as well as code that is
fully IOAPIC specific.
Split the generic bits out into irqchip.c.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
The trace_tick_stop() tracepoint is only available in full
dynticks. But it's also used by dynticks-idle so let's build
it for the latter config as well.
This fixes:
kernel/time/tick-sched.c: In function tick_nohz_stop_sched_tick:
kernel/time/tick-sched.c:644: error: implicit declaration of function trace_tick_stop
make[2]: *** [kernel/time/tick-sched.o] Erreur 1
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Add tracepoints to debug the various page write operation
like data pages, meta pages.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: remove unnecessary tracepoints]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Add tracepoints for tracing the garbage collector
threads in f2fs with status of collection & type.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: modify slightly to show information]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
add tracepoints for tracing the truncate operations
like truncate node/data blocks, f2fs_truncate etc.
Tracepoints are added at entry and exit of operation
to trace the success & failure of operation.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: combine and modify the tracepoint structures]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Add tracepoints in f2fs for tracing the syncing
operations like filesystem sync, file sync enter/exit.
It will helf to trace the code under debugging scenarios.
Also add tracepoints for tracing the various inode operations
like building inode, eviction of inode, link/unlike of
inodes.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: combine and modify the tracepoint structures]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
It's not obvious to find out why the full dynticks subsystem
doesn't always stop the tick: whether this is due to kthreads,
posix timers, perf events, etc...
These new tracepoints are here to help the user diagnose
the failures and test this feature.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Add support for host1x client modules, and host1x channels to submit
work to the clients.
Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Thierry Reding <thierry.reding@avionic-design.de>
Tested-by: Thierry Reding <thierry.reding@avionic-design.de>
Tested-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Add host1x, the driver for host1x and its client unit 2D. The Tegra
host1x module is the DMA engine for register access to Tegra's
graphics- and multimedia-related modules. The modules served by
host1x are referred to as clients. host1x includes some other
functionality, such as synchronization.
Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Thierry Reding <thierry.reding@avionic-design.de>
Tested-by: Thierry Reding <thierry.reding@avionic-design.de>
Tested-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
While investigating interactivity problems it was clear that processes
sometimes stall for long periods of times if an attempt is made to
lock a buffer which is undergoing writeback. It would stall in
a trace looking something like
[<ffffffff811a39de>] __lock_buffer+0x2e/0x30
[<ffffffff8123a60f>] do_get_write_access+0x43f/0x4b0
[<ffffffff8123a7cb>] jbd2_journal_get_write_access+0x2b/0x50
[<ffffffff81220f79>] __ext4_journal_get_write_access+0x39/0x80
[<ffffffff811f3198>] ext4_reserve_inode_write+0x78/0xa0
[<ffffffff811f3209>] ext4_mark_inode_dirty+0x49/0x220
[<ffffffff811f57d1>] ext4_dirty_inode+0x41/0x60
[<ffffffff8119ac3e>] __mark_inode_dirty+0x4e/0x2d0
[<ffffffff8118b9b9>] update_time+0x79/0xc0
[<ffffffff8118ba98>] file_update_time+0x98/0x100
[<ffffffff81110ffc>] __generic_file_aio_write+0x17c/0x3b0
[<ffffffff811112aa>] generic_file_aio_write+0x7a/0xf0
[<ffffffff811ea853>] ext4_file_write+0x83/0xd0
[<ffffffff81172b23>] do_sync_write+0xa3/0xe0
[<ffffffff811731ae>] vfs_write+0xae/0x180
[<ffffffff8117361d>] sys_write+0x4d/0x90
[<ffffffff8159d62d>] system_call_fastpath+0x1a/0x1f
[<ffffffffffffffff>] 0xffffffffffffffff
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This reverts commit 3a366e614d.
Wanlong Gao reports that it causes a kernel panic on his machine several
minutes after boot. Reverting it removes the panic.
Jens says:
"It's not quite clear why that is yet, so I think we should just revert
the commit for 3.9 final (which I'm assuming is pretty close).
The wifi is crap at the LSF hotel, so sending this email instead of
queueing up a revert and pull request."
Reported-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Requested-by: Jens Axboe <axboe@kernel.dk>
Cc: Tejun Heo <tj@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The macro _TRACE_PROFILE_INIT was removed a long time ago,
but an "#undef" guard was left behind. Remove it.
Link: http://lkml.kernel.org/r/514684EE.6000805@huawei.com
Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The smpboot threads rely on the park/unpark mechanism which binds per
cpu threads on a particular core. Though the functionality is racy:
CPU0 CPU1 CPU2
unpark(T) wake_up_process(T)
clear(SHOULD_PARK) T runs
leave parkme() due to !SHOULD_PARK
bind_to(CPU2) BUG_ON(wrong CPU)
We cannot let the tasks move themself to the target CPU as one of
those tasks is actually the migration thread itself, which requires
that it starts running on the target cpu right away.
The solution to this problem is to prevent wakeups in park mode which
are not from unpark(). That way we can guarantee that the association
of the task to the target cpu is working correctly.
Add a new task state (TASK_PARKED) which prevents other wakeups and
use this state explicitly for the unpark wakeup.
Peter noticed: Also, since the task state is visible to userspace and
all the parked tasks are still in the PID space, its a good hint in ps
and friends that these tasks aren't really there for the moment.
The migration thread has another related issue.
CPU0 CPU1
Bring up CPU2
create_thread(T)
park(T)
wait_for_completion()
parkme()
complete()
sched_set_stop_task()
schedule(TASK_PARKED)
The sched_set_stop_task() call is issued while the task is on the
runqueue of CPU1 and that confuses the hell out of the stop_task class
on that cpu. So we need the same synchronizaion before
sched_set_stop_task().
Reported-by: Dave Jones <davej@redhat.com>
Reported-and-tested-by: Dave Hansen <dave@sr71.net>
Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Acked-by: Peter Ziljstra <peterz@infradead.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: dhillf@gmail.com
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304091635430.21884@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The only difference between how we handle data=ordered and
data=writeback is a single call to ext4_jbd2_file_inode(). Eliminate
code duplication by factoring out redundant the code paths.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Tejun writes:
-----
This is the pull request for the earlier patchset[1] with the same
name. It's only three patches (the first one was committed to
workqueue tree) but the merge strategy is a bit involved due to the
dependencies.
* Because the conversion needs features from wq/for-3.10,
block/for-3.10/core is based on rc3, and wq/for-3.10 has conflicts
with rc3, I pulled mainline (rc5) into wq/for-3.10 to prevent those
workqueue conflicts from flaring up in block tree.
* Resolving the issue that Jan and Dave raised about debugging
requires arch-wide changes. The patchset is being worked on[2] but
it'll have to go through -mm after these changes show up in -next,
and not included in this pull request.
The three commits are located in the following git branch.
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git writeback-workqueue
Pulling it into block/for-3.10/core produces a conflict in
drivers/md/raid5.c between the following two commits.
e3620a3ad5 ("MD RAID5: Avoid accessing gendisk or queue structs when not available")
2f6db2a707 ("raid5: use bio_reset()")
The conflict is trivial - one removes an "if ()" conditional while the
other removes "rbi->bi_next = NULL" right above it. We just need to
remove both. The merged branch is available at
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git block-test-merge
so that you can use it for verification. The test merge commit has
proper merge description.
While these changes are a bit of pain to route, they make code simpler
and even have, while minute, measureable performance gain[3] even on a
workload which isn't particularly favorable to showing the benefits of
this conversion.
----
Fixed up the conflict.
Conflicts:
drivers/md/raid5.c
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Writeback implements its own worker pool - each bdi can be associated
with a worker thread which is created and destroyed dynamically. The
worker thread for the default bdi is always present and serves as the
"forker" thread which forks off worker threads for other bdis.
there's no reason for writeback to implement its own worker pool when
using unbound workqueue instead is much simpler and more efficient.
This patch replaces custom worker pool implementation in writeback
with an unbound workqueue.
The conversion isn't too complicated but the followings are worth
mentioning.
* bdi_writeback->last_active, task and wakeup_timer are removed.
delayed_work ->dwork is added instead. Explicit timer handling is
no longer necessary. Everything works by either queueing / modding
/ flushing / canceling the delayed_work item.
* bdi_writeback_thread() becomes bdi_writeback_workfn() which runs off
bdi_writeback->dwork. On each execution, it processes
bdi->work_list and reschedules itself if there are more things to
do.
The function also handles low-mem condition, which used to be
handled by the forker thread. If the function is running off a
rescuer thread, it only writes out limited number of pages so that
the rescuer can serve other bdis too. This preserves the flusher
creation failure behavior of the forker thread.
* INIT_LIST_HEAD(&bdi->bdi_list) is used to tell
bdi_writeback_workfn() about on-going bdi unregistration so that it
always drains work_list even if it's running off the rescuer. Note
that the original code was broken in this regard. Under memory
pressure, a bdi could finish unregistration with non-empty
work_list.
* The default bdi is no longer special. It now is treated the same as
any other bdi and bdi_cap_flush_forker() is removed.
* BDI_pending is no longer used. Removed.
* Some tracepoints become non-applicable. The following TPs are
removed - writeback_nothread, writeback_wake_thread,
writeback_wake_forker_thread, writeback_thread_start,
writeback_thread_stop.
Everything, including devices coming and going away and rescuer
operation under simulated memory pressure, seems to work fine in my
test setup.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Dyntick-idle CPUs need to be able to pre-announce their need for grace
periods. This can be done using something similar to the mechanism used
by no-CB CPUs to announce their need for grace periods. This commit
moves in this direction by renaming the no-CBs grace-period event tracing
to suit the new future-grace-period needs.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Does writethrough and writeback caching, handles unclean shutdown, and
has a bunch of other nifty features motivated by real world usage.
See the wiki at http://bcache.evilpiepirate.org for more.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Bunch of places in the code weren't using it where they could be -
this'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx
into a struct bvec_iter.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: "Ed L. Cashin" <ecashin@coraid.com>
CC: Nick Piggin <npiggin@kernel.dk>
CC: Jiri Kosina <jkosina@suse.cz>
CC: Jim Paris <jim@jtan.com>
CC: Geoff Levand <geoff@infradead.org>
CC: Alasdair Kergon <agk@redhat.com>
CC: dm-devel@redhat.com
CC: Neil Brown <neilb@suse.de>
CC: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Ed Cashin <ecashin@coraid.com>
In order to let triggers enable or disable events, we need a 'soft'
method for doing so. For example, if a function probe is added that
lets a user enable or disable events when a function is called, that
change must be done without taking locks or a mutex, and definitely
it can't sleep. But the full enabling of a tracepoint is expensive.
By adding a 'SOFT_DISABLE' flag, and converting the flags to be updated
without the protection of a mutex (using set/clear_bit()), this soft
disable flag can be used to allow critical sections to enable or disable
events from being traced (after the event has been placed into "SOFT_MODE").
Some caveats though: The comm recorder (to map pids with a comm) can not
be soft disabled (yet). If you disable an event with with a "soft"
disable and wait a while before reading the trace, the comm cache may be
replaced and you'll get a bunch of <...> for comms in the trace.
Reading the "enable" file for an event that is disabled will now give
you "0*" where the '*' denotes that the tracepoint is still active but
the event itself is "disabled".
[ fixed _BIT used in & operation : thanks to Dan Carpenter and smatch ]
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
As we've added __init annotation to field-defining functions, we should
add __refdata annotation to event_call variables, which reference those
functions.
Link: http://lkml.kernel.org/r/51343C1F.2050502@huawei.com
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Move duplicate code in event print functions to a helper function.
This shrinks the size of the kernel by ~13K.
text data bss dec hex filename
6596137 1743966 10138672 18478775 119f6b7 vmlinux.o.old
6583002 1743849 10138672 18465523 119c2f3 vmlinux.o.new
Link: http://lkml.kernel.org/r/51258746.2060304@huawei.com
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Pass the struct ftrace_event_file *ftrace_file to the
trace_event_buffer_lock_reserve() (new function that replaces the
trace_current_buffer_lock_reserver()).
The ftrace_file holds a pointer to the trace_array that is in use.
In the case of multiple buffers with different trace_arrays, this
allows different events to be recorded into different buffers.
Also fixed some of the stale comments in include/trace/ftrace.h
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The trace events for ftrace are all defined via global variables.
The arrays of events and event systems are linked to a global list.
This prevents multiple users of the event system (what to enable and
what not to).
By adding descriptors to represent the event/file relation, as well
as to which trace_array descriptor they are associated with, allows
for more than one set of events to be defined. Once the trace events
files have a link between the trace event and the trace_array they
are associated with, we can create multiple trace_arrays that can
record separate events in separate buffers.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Trace when we start and complete async writes, and when we start and
finish blocking for their completion. This is useful for performance
analysis of the resulting I/O patterns.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
extent cache's slab shrinker which can cause significant, user-visible
pauses when the system is under memory pressure.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJRMpM3AAoJENNvdpvBGATwiXgP/3eSg3C+M0ZUeL6lH3aXRxQO
PHxUL/Di5cfFs3GX4DksVzsD1KkTIz8B424AhdahrrgGh1jTx4/J23OrEdu9nK24
JGU5hmowoCyG8PZG1kGMbX6EYcblYTx+O2tX/RInnRExm9ajkfxb0S1g0Vl340qw
58WTSWfl2+J/3RnJ9TYX/qNVeCJdxLH3GkpFbvQbLGyylfM9hsUD5MZMAR1bpOJF
U2vNdK3n65W0AtKhLo7TYnoJ4ll2PoFRvffS0rqhEpIAcRxpVsNThFJLBcOQ1a79
6cCN5uhrJOlL5jLN/fYCViU1+03y7itCMJmtSpuyV8DtUGjf4r1tzlvWGeiSmpB9
NprZ/MgO1ROnzO/gzPM2s4nWWeGZiGaf7vMDyScIDtqF1ckfHN17jqazuSJcybN8
U83O9+KyhHkvr/+zqlySXiBX2MUSUdSE37CsMC7R+mAz7C46yjXEPuG8pLkLCWiG
gjMD30D1f6+h+K646WN497+Crxl1CurEH+ON7k158cNvVNlX1FfFHUprRHeNUXkV
tEKjiCUCf5WjNeFEc93nC/nDi4OIISD25N9LyHzp2CcV/XXRjpsrNPBFDAZjwgiK
YVUQIwocVUVlRaACzrM9sDFtSELqNzy/GLuERITu1Mb2R4sMXIyvvJkjc+EuQS0F
XVQ3BU5ypWyxJGrSGCPd
=+vcC
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 bug fixes from Ted Ts'o:
"Various bug fixes for ext4. The most important is a fix for the new
extent cache's slab shrinker which can cause significant, user-visible
pauses when the system is under memory pressure."
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: enable quotas before orphan cleanup
ext4: don't allow quota mount options when quota feature enabled
ext4: fix a warning from sparse check for ext4_dir_llseek
ext4: convert number of blocks to clusters properly
ext4: fix possible memory leak in ext4_remount()
jbd2: fix ERR_PTR dereference in jbd2__journal_start
ext4: use percpu counter for extent cache count
ext4: optimize ext4_es_shrink()
When the system is under memory pressure, ext4_es_srhink() will get
called very often. So optimize returning the number of items in the
file system's extent status cache by keeping a per-filesystem count,
instead of calculating it each time by scanning all of the inodes in
the extent status cache.
Also rename the slab used for the extent status cache to be
"ext4_extent_status" so it's obviousl the slab in question is created
by ext4.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Zheng Liu <gnehzuil.liu@gmail.com>
Pull block IO core bits from Jens Axboe:
"Below are the core block IO bits for 3.9. It was delayed a few days
since my workstation kept crashing every 2-8h after pulling it into
current -git, but turns out it is a bug in the new pstate code (divide
by zero, will report separately). In any case, it contains:
- The big cfq/blkcg update from Tejun and and Vivek.
- Additional block and writeback tracepoints from Tejun.
- Improvement of the should sort (based on queues) logic in the plug
flushing.
- _io() variants of the wait_for_completion() interface, using
io_schedule() instead of schedule() to contribute to io wait
properly.
- Various little fixes.
You'll get two trivial merge conflicts, which should be easy enough to
fix up"
Fix up the trivial conflicts due to hlist traversal cleanups (commit
b67bfe0d42: "hlist: drop the node parameter from iterators").
* 'for-3.9/core' of git://git.kernel.dk/linux-block: (39 commits)
block: remove redundant check to bd_openers()
block: use i_size_write() in bd_set_size()
cfq: fix lock imbalance with failed allocations
drivers/block/swim3.c: fix null pointer dereference
block: don't select PERCPU_RWSEM
block: account iowait time when waiting for completion of IO request
sched: add wait_for_completion_io[_timeout]
writeback: add more tracepoints
block: add block_{touch|dirty}_buffer tracepoint
buffer: make touch_buffer() an exported function
block: add @req to bio_{front|back}_merge tracepoints
block: add missing block_bio_complete() tracepoint
block: Remove should_sort judgement when flush blk_plug
block,elevator: use new hashtable implementation
cfq-iosched: add hierarchical cfq_group statistics
cfq-iosched: collect stats from dead cfqgs
cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats()
blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock
block: RCU free request_queue
blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge()
...
the "punch hole" functionality for inodes that are not using extent
maps.
In the bug fix category, we fixed some races in the AIO and fstrim
code, and some potential NULL pointer dereferences and memory leaks in
error handling code paths.
In the optimization category, we fixed a performance regression in the
jbd2 layer introduced by commit d9b0193 (introduced in v3.0) which
shows up in the AIM7 benchmark. We also further optimized jbd2 by
minimize the amount of time that transaction handles are held active.
This patch series also features some additional enhancement of the
extent status tree, which is now used to cache extent information in a
more efficient/compact form than what we use on-disk.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJRLRs7AAoJENNvdpvBGATwNb8QAML+TjGtHlJ1coDUzGT2Cq9R
yREAzI1N/+Phiohy3O0JNx55uPvYEMx6+zi+JCNSs1/gnf/OWruESTXssRbBv3Yd
WxfOiCIaK8BbOEGZlMwGsFDCzVNKfvHxRrmyeHtcyUONKLFQUmBcE/woVPHcsvlE
ya/zGnD2e58NaGwS643bqfvTrVt/azH0U0osNCNwfZepZmboEXK8fzT9b3Auh+1Q
EI28m0GSRp0V0cgwOEN54EhTtocyS30GN8sbC1K5cFHK8tGLhyVwnvIonyFDI5/D
GOkEPeRb7v2FwGpAilQ/V0jT++E//7zzyMFwvIY1U6b1dzBFCaJUuLMO1R8xoaoa
c/Qd3AFIt1anS66qZAnW3m5rRyJgU2YA3VrKJj4q0jPKCh+k3+EqVfNTOB8BPLmC
oCI/4ApUyHeYDdcErFjW4VDJ5N0debPP4yjma3uUtdM7RvQvMdQECnkAjIDCcGKe
bMc7dtI9jdUYDCPGDeOjdrvk623QpE7J4Pf6iSQ5WxA4f2QmOQ8uIuGe8CPQSVtQ
bUYjkthtWX2cX2/kHVvSYx6FzAjkgwmxCpAaiCXtGploxJIDjlWkiTXibkRYPLp4
jBmQPK8ct8bl98k/i3mdybZnJU2TxWLA45hub0zBYs0aSgi8HzFyd+y8DiCKRS0S
2sANbrsKG6TCzZ6C6ods
=KSV1
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Theodore Ts'o:
"The one new feature added in this patch series is the ability to use
the "punch hole" functionality for inodes that are not using extent
maps.
In the bug fix category, we fixed some races in the AIO and fstrim
code, and some potential NULL pointer dereferences and memory leaks in
error handling code paths.
In the optimization category, we fixed a performance regression in the
jbd2 layer introduced by commit d9b01934d5 ("jbd: fix fsync() tid
wraparound bug", introduced in v3.0) which shows up in the AIM7
benchmark. We also further optimized jbd2 by minimize the amount of
time that transaction handles are held active.
This patch series also features some additional enhancement of the
extent status tree, which is now used to cache extent information in a
more efficient/compact form than what we use on-disk."
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (65 commits)
ext4: fix free clusters calculation in bigalloc filesystem
ext4: no need to remove extent if len is 0 in ext4_es_remove_extent()
ext4: fix xattr block allocation/release with bigalloc
ext4: reclaim extents from extent status tree
ext4: adjust some functions for reclaiming extents from extent status tree
ext4: remove single extent cache
ext4: lookup block mapping in extent status tree
ext4: track all extent status in extent status tree
ext4: let ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag
ext4: rename and improbe ext4_es_find_extent()
ext4: add physical block and status member into extent status tree
ext4: refine extent status tree
ext4: use ERR_PTR() abstraction for ext4_append()
ext4: refactor code to read directory blocks into ext4_read_dirblock()
ext4: add debugging context for warning in ext4_da_update_reserve_space()
ext4: use KERN_WARNING for warning messages
jbd2: use module parameters instead of debugfs for jbd_debug
ext4: use module parameters instead of debugfs for mballoc_debug
ext4: start handle at the last possible moment when creating inodes
ext4: fix the number of credits needed for acl ops with inline data
...
- Rework of the ACPI namespace scanning code from Rafael J. Wysocki
with contributions from Bjorn Helgaas, Jiang Liu, Mika Westerberg,
Toshi Kani, and Yinghai Lu.
- ACPI power resources handling and ACPI device PM update from
Rafael J. Wysocki.
- ACPICA update to version 20130117 from Bob Moore and Lv Zheng
with contributions from Aaron Lu, Chao Guan, Jesper Juhl, and
Tim Gardner.
- Support for Intel Lynxpoint LPSS from Mika Westerberg.
- cpuidle update from Len Brown including Intel Haswell support, C1
state for intel_idle, removal of global pm_idle.
- cpuidle fixes and cleanups from Daniel Lezcano.
- cpufreq fixes and cleanups from Viresh Kumar and Fabio Baltieri
with contributions from Stratos Karafotis and Rickard Andersson.
- Intel P-states driver for Sandy Bridge processors from
Dirk Brandewie.
- cpufreq driver for Marvell Kirkwood SoCs from Andrew Lunn.
- cpufreq fixes related to ordering issues between acpi-cpufreq and
powernow-k8 from Borislav Petkov and Matthew Garrett.
- cpufreq support for Calxeda Highbank processors from Mark Langsdorf
and Rob Herring.
- cpufreq driver for the Freescale i.MX6Q SoC and cpufreq-cpu0 update
from Shawn Guo.
- cpufreq Exynos fixes and cleanups from Jonghwan Choi, Sachin Kamat,
and Inderpal Singh.
- Support for "lightweight suspend" from Zhang Rui.
- Removal of the deprecated power trace API from Paul Gortmaker.
- Assorted updates from Andreas Fleig, Colin Ian King,
Davidlohr Bueso, Joseph Salisbury, Kees Cook, Li Fei,
Nishanth Menon, ShuoX Liu, Srinivas Pandruvada, Tejun Heo,
Thomas Renninger, and Yasuaki Ishimatsu.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJRIsArAAoJEKhOf7ml8uNsD6MP/j7C4NA+GTq6RdwoJt+Yki0K
9Ep8I4pEuRFoN/oskv24EyQhpGJIk6UxWcJ/DWFBc+1VhmKORta7k2Idv/wlJA77
s7AcDveA9xcDh+TVfbh87TeuiMSXiSdDZbiaQO+wMizWJAF3F84AnjiAqqqyQcSK
bA5/Siz/vWlt9PyYDaQtHTVE4lpvPuVcQdYewsdaH2PsmUjvIg/TUzg28CTrdyvv
eHOdBK9R0/OLQLhzRbL0VOGJ//wEl+HJRO0QEhTKPgdQ1e/VH/4Zu5WSzF8P/x4C
s2f8U4IKQqulDuDHXtpMpelFm7hRWgsOqZLkcyXLs+0dvSM9CTPO6P0ZaImxUctk
5daHWEsXUnCErDQawt1mcZP8l6qnxofMQIfLXyPVzvlSnHyToTmrtXa1v2u4AuL/
hOo4MYWsFNUmRdtGFFGlExGgEDZ4G5NwiYjRBl/6XJ3v4nhnnMbuzxP8scpoe5m1
8tjroJHZFUUs/mFU/H+oRbHzSzXPmp1sddNaTg4OpVmTn3DDh6ljnFhiItd1Ndw0
5ldVbSe6ETq5RoK0TbzvQOeVpa9F3JfqbrXLQPqfd2iz/No41LQYG1uShRYuXKuA
wfEcc+c9VMd3FILu05pGwBnU8VS9VbxTYMz7xDxg6b29Ywnb7u+Q1ycCk2gFYtkS
E2oZDuyewTJxaskzYsNr
=wijn
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management updates from Rafael Wysocki:
- Rework of the ACPI namespace scanning code from Rafael J. Wysocki
with contributions from Bjorn Helgaas, Jiang Liu, Mika Westerberg,
Toshi Kani, and Yinghai Lu.
- ACPI power resources handling and ACPI device PM update from Rafael
J Wysocki.
- ACPICA update to version 20130117 from Bob Moore and Lv Zheng with
contributions from Aaron Lu, Chao Guan, Jesper Juhl, and Tim Gardner.
- Support for Intel Lynxpoint LPSS from Mika Westerberg.
- cpuidle update from Len Brown including Intel Haswell support, C1
state for intel_idle, removal of global pm_idle.
- cpuidle fixes and cleanups from Daniel Lezcano.
- cpufreq fixes and cleanups from Viresh Kumar and Fabio Baltieri with
contributions from Stratos Karafotis and Rickard Andersson.
- Intel P-states driver for Sandy Bridge processors from Dirk
Brandewie.
- cpufreq driver for Marvell Kirkwood SoCs from Andrew Lunn.
- cpufreq fixes related to ordering issues between acpi-cpufreq and
powernow-k8 from Borislav Petkov and Matthew Garrett.
- cpufreq support for Calxeda Highbank processors from Mark Langsdorf
and Rob Herring.
- cpufreq driver for the Freescale i.MX6Q SoC and cpufreq-cpu0 update
from Shawn Guo.
- cpufreq Exynos fixes and cleanups from Jonghwan Choi, Sachin Kamat,
and Inderpal Singh.
- Support for "lightweight suspend" from Zhang Rui.
- Removal of the deprecated power trace API from Paul Gortmaker.
- Assorted updates from Andreas Fleig, Colin Ian King, Davidlohr Bueso,
Joseph Salisbury, Kees Cook, Li Fei, Nishanth Menon, ShuoX Liu,
Srinivas Pandruvada, Tejun Heo, Thomas Renninger, and Yasuaki
Ishimatsu.
* tag 'pm+acpi-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (267 commits)
PM idle: remove global declaration of pm_idle
unicore32 idle: delete stray pm_idle comment
openrisc idle: delete pm_idle
mn10300 idle: delete pm_idle
microblaze idle: delete pm_idle
m32r idle: delete pm_idle, and other dead idle code
ia64 idle: delete pm_idle
cris idle: delete idle and pm_idle
ARM64 idle: delete pm_idle
ARM idle: delete pm_idle
blackfin idle: delete pm_idle
sparc idle: rename pm_idle to sparc_idle
sh idle: rename global pm_idle to static sh_idle
x86 idle: rename global pm_idle to static x86_idle
APM idle: register apm_cpu_idle via cpuidle
cpufreq / intel_pstate: Add kernel command line option disable intel_pstate.
cpufreq / intel_pstate: Change to disallow module build
tools/power turbostat: display SMI count by default
intel_idle: export both C1 and C1E
ACPI / hotplug: Fix concurrency issues and memory leaks
...
Pull workqueue changes from Tejun Heo:
"A lot of reorganization is going on mostly to prepare for worker pools
with custom attributes so that workqueue can replace custom pool
implementations in places including writeback and btrfs and make CPU
assignment in crypto more flexible.
workqueue evolved from purely per-cpu design and implementation, so
there are a lot of assumptions regarding being bound to CPUs and even
unbound workqueues are implemented as an extension of the model -
workqueues running on the special unbound CPU. Bulk of changes this
round are about promoting worker_pools as the top level abstraction
replacing global_cwq (global cpu workqueue). At this point, I'm
fairly confident about getting custom worker pools working pretty soon
and ready for the next merge window.
Lai's patches are replacing the convoluted mb() dancing workqueue has
been doing with much simpler mechanism which only depends on
assignment atomicity of long. For details, please read the commit
message of 0b3dae68ac ("workqueue: simplify is-work-item-queued-here
test"). While the change ends up adding one pointer to struct
delayed_work, the inflation in percentage is less than five percent
and it decouples delayed_work logic a lot more cleaner from usual work
handling, removes the unusual memory barrier dancing, and allows for
further simplification, so I think the trade-off is acceptable.
There will be two more workqueue related pull requests and there are
some shared commits among them. I'll write further pull requests
assuming this pull request is pulled first."
* 'for-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (37 commits)
workqueue: un-GPL function delayed_work_timer_fn()
workqueue: rename cpu_workqueue to pool_workqueue
workqueue: reimplement is_chained_work() using current_wq_worker()
workqueue: fix is_chained_work() regression
workqueue: pick cwq instead of pool in __queue_work()
workqueue: make get_work_pool_id() cheaper
workqueue: move nr_running into worker_pool
workqueue: cosmetic update in try_to_grab_pending()
workqueue: simplify is-work-item-queued-here test
workqueue: make work->data point to pool after try_to_grab_pending()
workqueue: add delayed_work->wq to simplify reentrancy handling
workqueue: make work_busy() test WORK_STRUCT_PENDING first
workqueue: replace WORK_CPU_NONE/LAST with WORK_CPU_END
workqueue: post global_cwq removal cleanups
workqueue: rename nr_running variables
workqueue: remove global_cwq
workqueue: remove worker_pool->gcwq
workqueue: replace for_each_worker_pool() with for_each_std_worker_pool()
workqueue: make freezing/thawing per-pool
workqueue: make hotplug processing per-pool
...
Pull perf changes from Ingo Molnar:
"There are lots of improvements, the biggest changes are:
Main kernel side changes:
- Improve uprobes performance by adding 'pre-filtering' support, by
Oleg Nesterov.
- Make some POWER7 events available in sysfs, equivalent to what was
done on x86, from Sukadev Bhattiprolu.
- tracing updates by Steve Rostedt - mostly misc fixes and smaller
improvements.
- Use perf/event tracing to report PCI Express advanced errors, by
Tony Luck.
- Enable northbridge performance counters on AMD family 15h, by Jacob
Shin.
- This tracing commit:
tracing: Remove the extra 4 bytes of padding in events
changes the ABI. All involved parties (PowerTop in particular)
seem to agree that it's safe to do now with the introduction of
libtraceevent, but the devil is in the details ...
Main tooling side changes:
- Add 'event group view', from Namyung Kim:
To use it, 'perf record' should group events when recording. And
then perf report parses the saved group relation from file header
and prints them together if --group option is provided. You can
use the 'perf evlist' command to see event group information:
$ perf record -e '{ref-cycles,cycles}' noploop 1
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.385 MB perf.data (~16807 samples) ]
$ perf evlist --group
{ref-cycles,cycles}
With this example, default perf report will show you each event
separately.
You can use --group option to enable event group view:
$ perf report --group
...
# group: {ref-cycles,cycles}
# ========
# Samples: 7K of event 'anon group { ref-cycles, cycles }'
# Event count (approx.): 6876107743
#
# Overhead Command Shared Object Symbol
# ................ ....... ................. ..........................
99.84% 99.76% noploop noploop [.] main
0.07% 0.00% noploop ld-2.15.so [.] strcmp
0.03% 0.00% noploop [kernel.kallsyms] [k] timerqueue_del
0.03% 0.03% noploop [kernel.kallsyms] [k] sched_clock_cpu
0.02% 0.00% noploop [kernel.kallsyms] [k] account_user_time
0.01% 0.00% noploop [kernel.kallsyms] [k] __alloc_pages_nodemask
0.00% 0.00% noploop [kernel.kallsyms] [k] native_write_msr_safe
0.00% 0.11% noploop [kernel.kallsyms] [k] _raw_spin_lock
0.00% 0.06% noploop [kernel.kallsyms] [k] find_get_page
0.00% 0.02% noploop [kernel.kallsyms] [k] rcu_check_callbacks
0.00% 0.02% noploop [kernel.kallsyms] [k] __current_kernel_time
As you can see the Overhead column now contains both of ref-cycles
and cycles and header line shows group information also - 'anon
group { ref-cycles, cycles }'. The output is sorted by period of
group leader first.
- Initial GTK+ annotate browser, from Namhyung Kim.
- Add option for runtime switching perf data file in perf report,
just press 's' and a menu with the valid files found in the current
directory will be presented, from Feng Tang.
- Add support to display whole group data for raw columns, from Jiri
Olsa.
- Add per processor socket count aggregation in perf stat, from
Stephane Eranian.
- Add interval printing in 'perf stat', from Stephane Eranian.
- 'perf test' improvements
- Add support for wildcards in tracepoint system name, from Jiri
Olsa.
- Add anonymous huge page recognition, from Joshua Zhu.
- perf build-id cache now can show DSOs present in a perf.data file
that are not in the cache, to integrate with build-id servers being
put in place by organizations such as Fedora.
- perf top now shares more of the evsel config/creation routines with
'record', paving the way for further integration like 'top'
snapshots, etc.
- perf top now supports DWARF callchains.
- Fix mmap limitations on 32-bit, fix from David Miller.
- 'perf bench numa mem' NUMA performance measurement suite
- ... and lots of fixes, performance improvements, cleanups and other
improvements I failed to list - see the shortlog and git log for
details."
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (270 commits)
perf/x86/amd: Enable northbridge performance counters on AMD family 15h
perf/hwbp: Fix cleanup in case of kzalloc failure
perf tools: Fix build with bison 2.3 and older.
perf tools: Limit unwind support to x86 archs
perf annotate: Make it to be able to skip unannotatable symbols
perf gtk/annotate: Fail early if it can't annotate
perf gtk/annotate: Show source lines with gray color
perf gtk/annotate: Support multiple event annotation
perf ui/gtk: Implement basic GTK2 annotation browser
perf annotate: Fix warning message on a missing vmlinux
perf buildid-cache: Add --update option
uprobes/perf: Avoid uprobe_apply() whenever possible
uprobes/perf: Teach trace_uprobe/perf code to use UPROBE_HANDLER_REMOVE
uprobes/perf: Teach trace_uprobe/perf code to pre-filter
uprobes/perf: Teach trace_uprobe/perf code to track the active perf_event's
uprobes: Introduce uprobe_apply()
perf: Introduce hw_perf_event->tp_target and ->tp_list
uprobes/perf: Always increment trace_uprobe->nhit
uprobes/tracing: Kill uprobe_trace_consumer, embed uprobe_consumer into trace_uprobe
uprobes/tracing: Introduce is_trace_uprobe_enabled()
...
Although extent status is loaded on-demand, we also need to reclaim
extent from the tree when we are under a heavy memory pressure because
in some cases fragmented extent tree causes status tree costs too much
memory.
Here we maintain a lru list in super_block. When the extent status of
an inode is accessed and changed, this inode will be move to the tail
of the list. The inode will be dropped from this list when it is
cleared. In the inode, a counter is added to count the number of
cached objects in extent status tree. Here only written/unwritten/hole
extent is counted because delayed extent doesn't be reclaimed due to
fiemap, bigalloc and seek_data/hole need it. The counter will be
increased as a new extent is allocated, and it will be decreased as a
extent is freed.
In this commit we use normal shrinker framework to reclaim memory from
the status tree. ext4_es_reclaim_extents_count() traverses the lru list
to count the number of reclaimable extents. ext4_es_shrink() tries to
reclaim written/unwritten/hole extents from extent status tree. The
inode that has been shrunk is moved to the tail of lru list.
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan kara <jack@suse.cz>
After tracking all extent status, we already have a extent cache in
memory. Every time we want to lookup a block mapping, we can first
try to lookup it in extent status tree to avoid a potential disk I/O.
A new function called ext4_es_lookup_extent is defined to finish this
work. When we try to lookup a block mapping, we always call
ext4_map_blocks and/or ext4_da_map_blocks. So in these functions we
first try to lookup a block mapping in extent status tree.
A new flag EXT4_GET_BLOCKS_NO_PUT_HOLE is used in ext4_da_map_blocks
in order not to put a hole into extent status tree because this hole
will be converted to delayed extent in the tree immediately.
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan kara <jack@suse.cz>
This commit renames ext4_es_find_extent with ext4_es_find_delayed_extent
and improve this function. First, we split input and output parameter.
Second, this function never return the first block of the next delayed
extent after 'es'.
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan kara <jack@suse.cz>
This commit adds two members in extent_status structure to let it record
physical block and extent status. Here es_pblk is used to record both
of them because physical block only has 48 bits. So extent status could
be stashed into it so that we can save some memory. Now written,
unwritten, delayed and hole are defined as status.
Due to new member is added into extent status tree, all interfaces need
to be adjusted.
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
This commit refines the extent status tree code.
1) A prefix 'es_' is added to to the extent status tree structure
members.
2) Refactored es_remove_extent() so that __es_remove_extent() can be
used by es_insert_extent() to remove the old extent entry(-ies) before
inserting a new one.
3) Rename extent_status_end() to ext4_es_end()
4) ext4_es_can_be_merged() is define to check whether two extents can
be merged or not.
5) Update and clarified comments.
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
workqueue has moved away from global_cwqs to worker_pools and with the
scheduled custom worker pools, wforkqueues will be associated with
pools which don't have anything to do with CPUs. The workqueue code
went through significant amount of changes recently and mass renaming
isn't likely to hurt much additionally. Let's replace 'cpu' with
'pool' so that it reflects the current design.
* s/struct cpu_workqueue_struct/struct pool_workqueue/
* s/cpu_wq/pool_wq/
* s/cwq/pwq/
This patch is purely cosmetic.
Signed-off-by: Tejun Heo <tj@kernel.org>
Handles which stay open a long time are problematic when it comes time
to close down a transaction so it can be committed. These tracepoints
will help us determine which ones are the problematic ones, and to
validate whether changes makes things better or worse.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Track the delay between when we first request that the commit begin
and when it actually begins, so we can see how much of a gap exists.
In theory, this should just be the remaining scheduling quantuum of
the thread which requested the commit (assuming it was not a
synchronous operation which triggered the commit request) plus
scheduling overhead; however, it's possible that real time processes
might get in the way of letting the kjournald thread from executing.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
doctorture.2013.01.11a: Changes to rcutorture and to RCU documentation.
fixes.2013.01.26a: Miscellaneous fixes.
tagcb.2013.01.24a: Tag RCU callbacks with grace-period number to
simplify callback advancement.
tiny.2013.01.29b: Enhancements to uniprocessor handling in tiny RCU.
The text in Documentation said it would be removed in 2.6.41;
the text in the Kconfig said removal in the 3.1 release. Either
way you look at it, we are well past both, so push it off a cliff.
Note that the POWER_CSTATE and the POWER_PSTATE are part of the
legacy tracing API. Remove all tracepoints which use these flags.
As can be seen from context, most already have a trace entry via
trace_cpu_idle anyways.
Also, the cpufreq/cpufreq.c PSTATE one is actually unpaired, as
compared to the CSTATE ones which all have a clear start/stop.
As part of this, the trace_power_frequency also becomes orphaned,
so it too is deleted.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Move gcwq->cpu to pool->cpu. This introduces a couple places where
gcwq->pools[0].cpu is used. These will soon go away as gcwq is
further reduced.
This is part of an effort to remove global_cwq and make worker_pool
the top level abstraction, which in turn will help implementing worker
pools with user-specified attributes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
This patch adds a tracepoint in ext4_punch_hole.
CC: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Add tracepoints for page dirtying, writeback_single_inode start, inode
dirtying and writeback. For the latter two inode events, a pair of
events are defined to denote start and end of the operations (the
starting one has _start suffix and the one w/o suffix happens after
the operation is complete). These inode ops are FS specific and can
be non-trivial and having enclosing tracepoints is useful for external
tracers.
This is part of tracepoint additions to improve visiblity into
dirtying / writeback operations for io tracer and userland.
v2: writeback_dirty_inode[_start] TPs may be called for files on
pseudo FSes w/ unregistered bdi. Check whether bdi->dev is %NULL
before dereferencing.
v3: buffer dirtying moved to a block TP.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The former is triggered from touch_buffer() and the latter
mark_buffer_dirty().
This is part of tracepoint additions to improve visiblity into
dirtying / writeback operations for io tracer and userland.
v2: Transformed writeback_dirty_buffer to block_dirty_buffer and made
it share TP definition with block_touch_buffer.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
bio_{front|back}_merge tracepoints report a bio merging into an
existing request but didn't specify which request the bio is being
merged into. Add @req to it. This makes it impossible to share the
event template with block_bio_queue - split it out.
@req isn't used or exported to userland at this point and there is no
userland visible behavior change. Later changes will make use of the
extra parameter.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
bio completion didn't kick block_bio_complete TP. Only dm was
explicitly triggering the TP on IO completion. This makes
block_bio_complete TP useless for tracers which want to know about
bios, and all other bio based drivers skip generating blktrace
completion events.
This patch makes all bio completions via bio_endio() generate
block_bio_complete TP.
* Explicit trace_block_bio_complete() invocation removed from dm and
the trace point is unexported.
* @rq dropped from trace_block_bio_complete(). bios may fly around
w/o queue associated. Verifying and accessing the assocaited queue
belongs to TP probes.
* blktrace now gets both request and bio completions. Make it ignore
bio completions if request completion path is happening.
This makes all bio based drivers generate blktrace completion events
properly and makes the block_bio_complete TP actually useful.
v2: With this change, block_bio_complete TP could be invoked on sg
commands which have bio's with %NULL bi_bdev. Update TP
assignment code to check whether bio->bi_bdev is %NULL before
dereferencing.
Signed-off-by: Tejun Heo <tj@kernel.org>
Original-patch-by: Namhyung Kim <namhyung@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This commit adds event tracing for callback acceleration to allow better
tracking of callbacks through the system.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
When the type of global variable blimit changed from int to long, the
type of the blimit argument of trace_rcu_batch_start() needed to have
changed. This commit fixes this issue.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Currently, rcutorture traces every read-side access. This can be
problematic because even a two-minute rcutorture run on a two-CPU system
can generate 28,853,363 reads. Normally, only a failing read is of
interest, so this commit traces adjusts rcutorture's tracing to only
trace failing reads. The resulting event tracing records the time
and the ->completed value captured at the beginning of the RCU read-side
critical section, allowing correlation with other event-tracing messages.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
[ paulmck: Add fix to build problem located by Randy Dunlap based on
diagnosis by Steven Rostedt. ]
Add a new capability, KVM_CAP_S390_CSS_SUPPORT, which will pass
intercepts for channel I/O instructions to userspace. Only I/O
instructions interacting with I/O interrupts need to be handled
in-kernel:
- TEST PENDING INTERRUPTION (tpi) dequeues and stores pending
interrupts entirely in-kernel.
- TEST SUBCHANNEL (tsch) dequeues pending interrupts in-kernel
and exits via KVM_EXIT_S390_TSCH to userspace for subchannel-
related processing.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This header file will define a new trace event that will be triggered when
a AER event occurs. The following data will be provided to the trace
event.
char * dev_name - The name of the slot where the device resides
([domain:]bus:device.function).
u32 status - Either the correctable or uncorrectable register
indicating what error or errors have been see.
u8 severity - error severity 0:NONFATAL 1:FATAL 2:CORRECTED
The trace event will also provide a trace string that may look like:
"0000:05:00.0 PCIe Bus Error:severity=Uncorrected (Non-Fatal), Poisoned
TLP"
Signed-off-by: Lance Ortiz <lance.ortiz@hp.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Boris Petkov <bp@alien8.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
which could cause file system corruptions when performing file punch
operations.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJQ374OAAoJENNvdpvBGATwEGAP/jKUwjQhBZiF0k9dg1kQ5eTz
bdli4fy1vxrEMIOym8IZa4nBQJVCkArwRgjc28gCBD6k9u6X3GPa26vUydsoPfP6
odPdc9c9HtsbYQGuaq1SohID5HfjxHewTcUmCs4X4SpGcSurUcT7eQYWqSuIxFHR
0nKk8NO4EcWh2uqIoGPrc8QpSdor0DXXYYjZmHCeVLH1n6PyoMsnrFMfO9KqMLUL
vNR54CX9n1GRTfAfJNkNzcwfs8IfNkDUyv5hFpDh15tLltogU0TqnlAl3vSeZGSx
vVfhwHmQTK/bJyC3YaoRZqq9CQJVk2f/OTBpJDFY/USaapuitJd6vqbmh7NiRNAN
LaKmFt99MPfwyjEhIA7+J0LCTraAxc536q43oWWK5dAJhWI7DW0lbHARVeQTixNy
KJ1Lp0pmmz1mX8/lugOnK1SPBF525kTaoiz2bWqg4oQgn7mBzUlgj+EV22/6Rq83
TpKOKstl4BiZi8t5AhmFiwqtknCDiT5vUKQNy2kuM/oXtPJID/lM/TJbR5viYD3l
AH3Ef7xj61CynFZ0oBeraGwtXc2BHJpJdWz+8uj0/VhFfC+uNUYapSLFwyiAVZKO
xxaItT3ylfKpa0AWK6HBc2SLuL72SCHAPks06YKFtSyHtr5C8SCcafxU2DSOSi7K
VrhkcH6STa77Br7a1ORt
=9R/D
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 bug fixes from Ted Ts'o:
"Various bug fixes for ext4. Perhaps the most serious bug fixed is one
which could cause file system corruptions when performing file punch
operations."
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: avoid hang when mounting non-journal filesystems with orphan list
ext4: lock i_mutex when truncating orphan inodes
ext4: do not try to write superblock on ro remount w/o journal
ext4: include journal blocks in df overhead calcs
ext4: remove unaligned AIO warning printk
ext4: fix an incorrect comment about i_mutex
ext4: fix deadlock in journal_unmap_buffer()
ext4: split off ext4_journalled_invalidatepage()
jbd2: fix assertion failure in jbd2_journal_flush()
ext4: check dioread_nolock on remount
ext4: fix extent tree corruption caused by hole punch
In data=journal mode we don't need delalloc or DIO handling in invalidatepage
and similarly in other modes we don't need the journal handling. So split
invalidatepage implementations.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This flag is used to indicate to the callees that this allocation is a
kernel allocation in process context, and should be accounted to current's
memcg.
Signed-off-by: Glauber Costa <glommer@parallels.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: JoonSoo Kim <js1304@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull btrfs update from Chris Mason:
"A big set of fixes and features.
In terms of line count, most of the code comes from Stefan, who added
the ability to replace a single drive in place. This is different
from how btrfs normally replaces drives, and is much much much faster.
Josef is plowing through our synchronous write performance. This pull
request does not include the DIO_OWN_WAITING patch that was discussed
on the list, but it has a number of other improvements to cut down our
latencies and CPU time during fsync/O_DIRECT writes.
Miao Xie has a big series of fixes and is spreading out ordered
operations over more CPUs. This improves performance and reduces
contention.
I've put in fixes for error handling around hash collisions. These
are going back to individual stable kernels as I test against them.
Otherwise we have a lot of fixes and cleanups, thanks everyone!
raid5/6 is being rebased against the device replacement code. I'll
have it posted this Friday along with a nice series of benchmarks."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (115 commits)
Btrfs: fix a bug of per-file nocow
Btrfs: fix hash overflow handling
Btrfs: don't take inode delalloc mutex if we're a free space inode
Btrfs: fix autodefrag and umount lockup
Btrfs: fix permissions of empty files not affected by umask
Btrfs: put raid properties into global table
Btrfs: fix BUG() in scrub when first superblock reading gives EIO
Btrfs: do not call file_update_time in aio_write
Btrfs: only unlock and relock if we have to
Btrfs: use tokens where we can in the tree log
Btrfs: optimize leaf_space_used
Btrfs: don't memset new tokens
Btrfs: only clear dirty on the buffer if it is marked as dirty
Btrfs: move checks in set_page_dirty under DEBUG
Btrfs: log changed inodes based on the extent map tree
Btrfs: add path->really_keep_locks
Btrfs: do not mark ems as prealloc if we are writing to them
Btrfs: keep track of the extents original block length
Btrfs: inline csums if we're fsyncing
Btrfs: don't bother copying if we're only logging the inode
...
Value 0 is not a tree id, so besides an upper limit, a lower limit is
necessary as well while parsing root types of tracepoint.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
inline data, which allows small files or directories to be stored in
the in-inode extended attribute area. (This requires that the file
system use inodes which are at least 256 bytes or larger; 128 byte
inodes do not have any room for in-inode xattrs.)
The second new feature is SEEK_HOLE/SEEK_DATA support. This is
enabled by the extent status tree patches, and this infrastructure
will be used to further optimize ext4 in the future.
Beyond that, we have the usual collection of code cleanups and bug
fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJQzTaLAAoJENNvdpvBGATwpqEQAM0WO9Kva3R8SoaD6NYOg4lN
8oxRlht6yogSd6wwYZm1c4YF9UrhloS9kHyWcH3Wmr9fhM5vig1ec12eDsDGrjBc
Wb+x+YrmczSJzK380JLxmYnVSXQVFl7/hNqaRowffTOJwgySmp8oLrI88ZcaCmVU
+qWG2x6eVhCEQrpin9Mv3D6pHkx2hfg9w5sB0K+kpgsdjqLZsmPRmxU9nx0nEJYC
gmbpo8Dcsfqra6DJosQGo7eFq7J3fm9v1ql+QOxOjc9/zD2XwdQE1JZImehvno5i
Ekwr9771fsw34/QHJebYRC/OkftmOn4OPuQejd+AKNdBR4mO8G/AsLCroD17uLNi
NrtMkE6ecJPb3SflarZruNYTUhJfj3H6V9P/8wggpyPzT3l19sqP+2F6GwZspZiV
EJb2iTKn0Phc2OD1MqO9gFP0g+IMH0kktYdxEf0V2QOQqhQHnPwxF+2Tp6bVQcQs
KCetN37y60qJ+zKH9xukcXmWQJvnjgmWqZqpomoA4lrwgKazTNDJJ+R+N+r5HKMj
5cz2ntAhF8FfPhqVf+8DHgjKNUwm6C++O1+Lb9swZ0FkFi5Ob3OlwWaC75Gf4H+P
2DslBapfM79bX14a9BKaBjly5FsAha7OzR+xo0MZN+fEcMLEk33kcRovcY8DHqxU
aadriOatYYixvSZ5lL3m
=aNOf
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 update from Ted Ts'o:
"There are two major features for this merge window. The first is
inline data, which allows small files or directories to be stored in
the in-inode extended attribute area. (This requires that the file
system use inodes which are at least 256 bytes or larger; 128 byte
inodes do not have any room for in-inode xattrs.)
The second new feature is SEEK_HOLE/SEEK_DATA support. This is
enabled by the extent status tree patches, and this infrastructure
will be used to further optimize ext4 in the future.
Beyond that, we have the usual collection of code cleanups and bug
fixes."
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (63 commits)
ext4: zero out inline data using memset() instead of empty_zero_page
ext4: ensure Inode flags consistency are checked at build time
ext4: Remove CONFIG_EXT4_FS_XATTR
ext4: remove unused variable from ext4_ext_in_cache()
ext4: remove redundant initialization in ext4_fill_super()
ext4: remove redundant code in ext4_alloc_inode()
ext4: use sync_inode_metadata() when syncing inode metadata
ext4: enable ext4 inline support
ext4: let fallocate handle inline data correctly
ext4: let ext4_truncate handle inline data correctly
ext4: evict inline data out if we need to strore xattr in inode
ext4: let fiemap work with inline data
ext4: let ext4_rename handle inline dir
ext4: let empty_dir handle inline dir
ext4: let ext4_delete_entry() handle inline data
ext4: make ext4_delete_entry generic
ext4: let ext4_find_entry handle inline data
ext4: create a new function search_dir
ext4: let ext4_readdir handle inline data
ext4: let add_dir_entry handle inline data properly
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQIcBAABAgAGBQJQx0kQAAoJEHzG/DNEskfi4fQP/R5PRovayroZALBMLnVJDaLD
Ttr9p40VNXbiJ+MfRgatJjSSJZ4Jl+fC3NEqBhcwVZhckZZb9R2s0WtrSQo5+ZbB
vdRfiuKoCaKM4cSZ08C12uTvsF6xjhjd27CTUlMkyOcDoKxMEFKelv0hocSxe4Wo
xqlv3eF+VsY7kE1BNbgBP06SX4tDpIHRxXfqJPMHaSKQmre+cU0xG2GcEu3QGbHT
DEDTI788YSaWLmBfMC+kWoaQl1+bV/FYvavIAS8/o4K9IKvgR42VzrXmaFaqrbgb
72ksa6xfAi57yTmZHqyGmts06qYeBbPpKI+yIhCMInxA9CY3lPbvHppRf0RQOyzj
YOi4hovGEMJKE+BCILukhJcZ9jCTtS3zut6v1rdvR88f4y7uhR9RfmRfsxuW7PNj
3Rmh191+n0lVWDmhOs2psXuCLJr3LEiA0dFffN1z8REUTtTAZMsj8Rz+SvBNAZDR
hsJhERVeXB6X5uQ5rkLDzbn1Zic60LjVw7LIp6SF2OYf/YKaF8vhyWOA8dyCEu8W
CGo7AoG0BO8tIIr8+LvFe8CweypysZImx4AjCfIs4u9pu/v11zmBvO9NO5yfuObF
BreEERYgTes/UITxn1qdIW4/q+Nr0iKO3CTqsmu6L1GfCz3/XzPGs3U26fUhllqi
Ka0JKgnWvsa6ez6FSzKI
=ivQa
-----END PGP SIGNATURE-----
Merge tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma
Pull Automatic NUMA Balancing bare-bones from Mel Gorman:
"There are three implementations for NUMA balancing, this tree
(balancenuma), numacore which has been developed in tip/master and
autonuma which is in aa.git.
In almost all respects balancenuma is the dumbest of the three because
its main impact is on the VM side with no attempt to be smart about
scheduling. In the interest of getting the ball rolling, it would be
desirable to see this much merged for 3.8 with the view to building
scheduler smarts on top and adapting the VM where required for 3.9.
The most recent set of comparisons available from different people are
mel: https://lkml.org/lkml/2012/12/9/108
mingo: https://lkml.org/lkml/2012/12/7/331
tglx: https://lkml.org/lkml/2012/12/10/437
srikar: https://lkml.org/lkml/2012/12/10/397
The results are a mixed bag. In my own tests, balancenuma does
reasonably well. It's dumb as rocks and does not regress against
mainline. On the other hand, Ingo's tests shows that balancenuma is
incapable of converging for this workloads driven by perf which is bad
but is potentially explained by the lack of scheduler smarts. Thomas'
results show balancenuma improves on mainline but falls far short of
numacore or autonuma. Srikar's results indicate we all suffer on a
large machine with imbalanced node sizes.
My own testing showed that recent numacore results have improved
dramatically, particularly in the last week but not universally.
We've butted heads heavily on system CPU usage and high levels of
migration even when it shows that overall performance is better.
There are also cases where it regresses. Of interest is that for
specjbb in some configurations it will regress for lower numbers of
warehouses and show gains for higher numbers which is not reported by
the tool by default and sometimes missed in treports. Recently I
reported for numacore that the JVM was crashing with
NullPointerExceptions but currently it's unclear what the source of
this problem is. Initially I thought it was in how numacore batch
handles PTEs but I'm no longer think this is the case. It's possible
numacore is just able to trigger it due to higher rates of migration.
These reports were quite late in the cycle so I/we would like to start
with this tree as it contains much of the code we can agree on and has
not changed significantly over the last 2-3 weeks."
* tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma: (50 commits)
mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable
mm/rmap: Convert the struct anon_vma::mutex to an rwsem
mm: migrate: Account a transhuge page properly when rate limiting
mm: numa: Account for failed allocations and isolations as migration failures
mm: numa: Add THP migration for the NUMA working set scanning fault case build fix
mm: numa: Add THP migration for the NUMA working set scanning fault case.
mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node
mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUG
mm: sched: numa: Control enabling and disabling of NUMA balancing
mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrate
mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships
mm: numa: migrate: Set last_nid on newly allocated page
mm: numa: split_huge_page: Transfer last_nid on tail page
mm: numa: Introduce last_nid to the page frame
sched: numa: Slowly increase the scanning period as NUMA faults are handled
mm: numa: Rate limit setting of pte_numa if node is saturated
mm: numa: Rate limit the amount of memory that is migrated between nodes
mm: numa: Structures for Migrate On Fault per NUMA migration rate limiting
mm: numa: Migrate pages handled during a pmd_numa hinting fault
mm: numa: Migrate on reference policy
...
Pull perf updates from Ingo Molnar:
"Lots of activity:
211 files changed, 8328 insertions(+), 4116 deletions(-)
most of it on the tooling side.
Main changes:
* ftrace enhancements and fixes from Steve Rostedt.
* uprobes fixes, cleanups and preparation for the ARM port from Oleg
Nesterov.
* UAPI fixes, from David Howels - prepares the arch/x86 UAPI
transition
* Separate perf tests into multiple objects, one per test, from Jiri
Olsa.
* Make hardware event translations available in sysfs, from Jiri
Olsa.
* Fixes to /proc/pid/maps parsing, preparatory to supporting data
maps, from Namhyung Kim
* Implement ui_progress for GTK, from Namhyung Kim
* Add framework for automated perf_event_attr tests, where tools with
different command line options will be run from a 'perf test', via
python glue, and the perf syscall will be intercepted to verify
that the perf_event_attr fields set by the tool are those expected,
from Jiri Olsa
* Add a 'link' method for hists, so that we can have the leader with
buckets for all the entries in all the hists. This new method is
now used in the default 'diff' output, making the sum of the
'baseline' column be 100%, eliminating blind spots.
* libtraceevent fixes for compiler warnings trying to make perf it
build on some distros, like fedora 14, 32-bit, some of the warnings
really pointed to real bugs.
* Add a browser for 'perf script' and make it available from the
report and annotate browsers. It does filtering to find the
scripts that handle events found in the perf.data file used. From
Feng Tang
* perf inject changes to allow showing where a task sleeps, from
Andrew Vagin.
* Makefile improvements from Namhyung Kim.
* Add --pre and --post command hooks in 'stat', from Peter Zijlstra.
* Don't stop synthesizing threads when one vanishes, this is for the
existing threads when we start a tool like trace.
* Use sched:sched_stat_runtime to provide a thread summary, this
produces the same output as the 'trace summary' subcommand of
tglx's original "trace" tool.
* Support interrupted syscalls in 'trace'
* Add an event duration column and filter in 'trace'.
* There are references to the man pages in some tools, so try to
build Documentation when installing, warning the user if that is
not possible, from Borislav Petkov.
* Give user better message if precise is not supported, from David
Ahern.
* Try to find cross-built objdump path by using the session
environment information in the perf.data file header, from Irina
Tirdea, original patch and idea by Namhyung Kim.
* Diplays more output on features check for make V=1, so that one can
figure out what is happening by looking at gcc output, etc. From
Jiri Olsa.
* Add on_exit implementation for systems without one, e.g. Android,
from Bernhard Rosenkraenzer.
* Only process events for vcpus of interest, helps handling large
number of events, from David Ahern.
* Cross compilation fixes for Android, from Irina Tirdea.
* Add documentation on compiling for Android, from Irina Tirdea.
* perf diff improvements from Jiri Olsa.
* Target (task/user/cpu/syswide) handling improvements, from Namhyung
Kim.
* Add support in 'trace' for tracing workload given by command line,
from Namhyung Kim.
* ... and much more."
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (194 commits)
uprobes: Use percpu_rw_semaphore to fix register/unregister vs dup_mmap() race
perf evsel: Introduce is_group_member method
perf powerpc: Use uapi/unistd.h to fix build error
tools: Pass the target in descend
tools: Honour the O= flag when tool build called from a higher Makefile
tools: Define a Makefile function to do subdir processing
perf ui: Always compile browser setup code
perf ui: Add ui_progress__finish()
perf ui gtk: Implement ui_progress functions
perf ui: Introduce generic ui_progress helper
perf ui tui: Move progress.c under ui/tui directory
perf tools: Add basic event modifier sanity check
perf tools: Omit group members from perf_evlist__disable/enable
perf tools: Ensure single disable call per event in record comand
perf tools: Fix 'disabled' attribute config for record command
perf tools: Fix attributes for '{}' defined event groups
perf tools: Use sscanf for parsing /proc/pid/maps
perf tools: Add gtk.<command> config option for launching GTK browser
perf tools: Fix compile error on NO_NEWT=1 build
perf hists: Initialize all of he->stat with zeroes
...
Pull RCU update from Ingo Molnar:
"The major features of this tree are:
1. A first version of no-callbacks CPUs. This version prohibits
offlining CPU 0, but only when enabled via CONFIG_RCU_NOCB_CPU=y.
Relaxing this constraint is in progress, but not yet ready
for prime time. These commits were posted to LKML at
https://lkml.org/lkml/2012/10/30/724.
2. Changes to SRCU that allows statically initialized srcu_struct
structures. These commits were posted to LKML at
https://lkml.org/lkml/2012/10/30/296.
3. Restructuring of RCU's debugfs output. These commits were posted
to LKML at https://lkml.org/lkml/2012/10/30/341.
4. Additional CPU-hotplug/RCU improvements, posted to LKML at
https://lkml.org/lkml/2012/10/30/327.
Note that the commit eliminating __stop_machine() was judged to
be too-high of risk, so is deferred to 3.9.
5. Changes to RCU's idle interface, most notably a new module
parameter that redirects normal grace-period operations to
their expedited equivalents. These were posted to LKML at
https://lkml.org/lkml/2012/10/30/739.
6. Additional diagnostics for RCU's CPU stall warning facility,
posted to LKML at https://lkml.org/lkml/2012/10/30/315.
The most notable change reduces the
default RCU CPU stall-warning time from 60 seconds to 21 seconds,
so that it once again happens sooner than the softlockup timeout.
7. Documentation updates, which were posted to LKML at
https://lkml.org/lkml/2012/10/30/280.
A couple of late-breaking changes were posted at
https://lkml.org/lkml/2012/11/16/634 and
https://lkml.org/lkml/2012/11/16/547.
8. Miscellaneous fixes, which were posted to LKML at
https://lkml.org/lkml/2012/10/30/309.
9. Finally, a fix for an lockdep-RCU splat was posted to LKML
at https://lkml.org/lkml/2012/11/7/486."
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
context_tracking: New context tracking susbsystem
sched: Mark RCU reader in sched_show_task()
rcu: Separate accounting of callbacks from callback-free CPUs
rcu: Add callback-free CPUs
rcu: Add documentation for the new rcuexp debugfs trace file
rcu: Update documentation for TREE_RCU debugfs tracing
rcu: Reduce default RCU CPU stall warning timeout
rcu: Fix TINY_RCU rcu_is_cpu_rrupt_from_idle check
rcu: Clarify memory-ordering properties of grace-period primitives
rcu: Add new rcutorture module parameters to start/end test messages
rcu: Remove list_for_each_continue_rcu()
rcu: Fix batch-limit size problem
rcu: Add tracing for synchronize_sched_expedited()
rcu: Remove old debugfs interfaces and also RCU flavor name
rcu: split 'rcuhier' to each flavor
rcu: split 'rcugp' to each flavor
rcu: split 'rcuboost' to each flavor
rcu: split 'rcubarrier' to each flavor
rcu: Fix tracing formatting
rcu: Remove the interface "rcudata.csv"
...
The maximum oom_score_adj is 1000 and the minimum oom_score_adj is -1000,
so this range can be represented by the signed short type with no
functional change. The extra space this frees up in struct signal_struct
will be used for per-thread oom kill flags in the next patch.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The pgmigrate_success and pgmigrate_fail vmstat counters tells the user
about migration activity but not the type or the reason. This patch adds
a tracepoint to identify the type of page migration and why the page is
being migrated.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
This reverts commits a50915394f and
d7c3b937bd.
This is a revert of a revert of a revert. In addition, it reverts the
even older i915 change to stop using the __GFP_NO_KSWAPD flag due to the
original commits in linux-next.
It turns out that the original patch really was bogus, and that the
original revert was the correct thing to do after all. We thought we
had fixed the problem, and then reverted the revert, but the problem
really is fundamental: waking up kswapd simply isn't the right thing to
do, and direct reclaim sometimes simply _is_ the right thing to do.
When certain allocations fail, we simply should try some direct reclaim,
and if that fails, fail the allocation. That's the right thing to do
for THP allocations, which can easily fail, and the GPU allocations want
to do that too.
So starting kswapd is sometimes simply wrong, and removing the flag that
said "don't start kswapd" was a mistake. Let's hope we never revisit
this mistake again - and certainly not this many times ;)
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:
arch/x86/kernel/ptrace.c
Pull the latest RCU tree from Paul E. McKenney:
" The major features of this series are:
1. A first version of no-callbacks CPUs. This version prohibits
offlining CPU 0, but only when enabled via CONFIG_RCU_NOCB_CPU=y.
Relaxing this constraint is in progress, but not yet ready
for prime time. These commits were posted to LKML at
https://lkml.org/lkml/2012/10/30/724, and are at branch rcu/nocb.
2. Changes to SRCU that allows statically initialized srcu_struct
structures. These commits were posted to LKML at
https://lkml.org/lkml/2012/10/30/296, and are at branch rcu/srcu.
3. Restructuring of RCU's debugfs output. These commits were posted
to LKML at https://lkml.org/lkml/2012/10/30/341, and are at
branch rcu/tracing.
4. Additional CPU-hotplug/RCU improvements, posted to LKML at
https://lkml.org/lkml/2012/10/30/327, and are at branch rcu/hotplug.
Note that the commit eliminating __stop_machine() was judged to
be too-high of risk, so is deferred to 3.9.
5. Changes to RCU's idle interface, most notably a new module
parameter that redirects normal grace-period operations to
their expedited equivalents. These were posted to LKML at
https://lkml.org/lkml/2012/10/30/739, and are at branch rcu/idle.
6. Additional diagnostics for RCU's CPU stall warning facility,
posted to LKML at https://lkml.org/lkml/2012/10/30/315, and
are at branch rcu/stall. The most notable change reduces the
default RCU CPU stall-warning time from 60 seconds to 21 seconds,
so that it once again happens sooner than the softlockup timeout.
7. Documentation updates, which were posted to LKML at
https://lkml.org/lkml/2012/10/30/280, and are at branch rcu/doc.
A couple of late-breaking changes were posted at
https://lkml.org/lkml/2012/11/16/634 and
https://lkml.org/lkml/2012/11/16/547.
8. Miscellaneous fixes, which were posted to LKML at
https://lkml.org/lkml/2012/10/30/309, along with a late-breaking
change posted at Fri, 16 Nov 2012 11:26:25 -0800 with message-ID
<20121116192625.GA447@linux.vnet.ibm.com>, but which lkml.org
seems to have missed. These are at branch rcu/fixes.
9. Finally, a fix for an lockdep-RCU splat was posted to LKML
at https://lkml.org/lkml/2012/11/7/486. This is at rcu/next. "
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It apepars that this patch was innocent, and we hope that "mm: avoid
waking kswapd for THP allocations when compaction is deferred or
contended" will fix the final kswapd-spinning cause.
Cc: Zdenek Kabelac <zkabelac@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction
based on failures" reverted, Zdenek Kabelac reported the following
Hmm, so it's just took longer to hit the problem and observe
kswapd0 spinning on my CPU again - it's not as endless like before -
but still it easily eats minutes - it helps to turn off Firefox
or TB (memory hungry apps) so kswapd0 stops soon - and restart
those apps again. (And I still have like >1GB of cached memory)
kswapd0 R running task 0 30 2 0x00000000
Call Trace:
preempt_schedule+0x42/0x60
_raw_spin_unlock+0x55/0x60
put_super+0x31/0x40
drop_super+0x22/0x30
prune_super+0x149/0x1b0
shrink_slab+0xba/0x510
The sysrq+m indicates the system has no swap so it'll never reclaim
anonymous pages as part of reclaim/compaction. That is one part of the
problem but not the root cause as file-backed pages could also be
reclaimed.
The likely underlying problem is that kswapd is woken up or kept awake
for each THP allocation request in the page allocator slow path.
If compaction fails for the requesting process then compaction will be
deferred for a time and direct reclaim is avoided. However, if there
are a storm of THP requests that are simply rejected, it will still be
the the case that kswapd is awake for a prolonged period of time as
pgdat->kswapd_max_order is updated each time. This is noticed by the
main kswapd() loop and it will not call kswapd_try_to_sleep(). Instead
it will loopp, shrinking a small number of pages and calling
shrink_slab() on each iteration.
The temptation is to supply a patch that checks if kswapd was woken for
THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not
backed up by proper testing. As 3.7 is very close to release and this
is not a bug we should release with, a safer path is to revert "mm:
remove __GFP_NO_KSWAPD" for now and revisit it with the view to ironing
out the balance_pgdat() logic in general.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Zdenek Kabelac <zkabelac@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
RCU callback execution can add significant OS jitter and also can
degrade both scheduling latency and, in asymmetric multiprocessors,
energy efficiency. This commit therefore adds the ability for selected
CPUs ("rcu_nocbs=" boot parameter) to have their callbacks offloaded
to kthreads. If the "rcu_nocb_poll" boot parameter is also specified,
these kthreads will do polling, removing the need for the offloaded
CPUs to do wakeups. At least one CPU must be doing normal callback
processing: currently CPU 0 cannot be selected as a no-CBs CPU.
In addition, attempts to offline the last normal-CBs CPU will fail.
This feature was inspired by Jim Houston's and Joe Korty's JRCU, and
this commit includes fixes to problems located by Fengguang Wu's
kbuild test robot.
[ paulmck: Added gfp.h include file as suggested by Fengguang Wu. ]
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
When doing per-cpu helper optimizing work, find that this code is so puzzled.
1. It's mark as comment text, maybe a sample function for guidelines
or a todo work.
2. But, this sample code is odd where struct perf_trace_buf is nonexistent.
commit ce71b9 delete struct perf_trace_buf definition.
Author: Frederic Weisbecker <fweisbec@gmail.com>
Date: Sun Nov 22 05:26:55 2009 +0100
tracing: Use the perf recursion protection from trace event
Is it necessary to keep there?
just compile test.
Link: http://lkml.kernel.org/r/50949FC9.6050202@gmail.com
Signed-off-by: Shan Wei <davidshan@tencent.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This patch adds some tracepoints in extent status tree.
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When we use trace_ext4_ext/ind_map_blocks_exit, print the value of
map->m_flags in order that we can understand the extent's current
status.
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
In trace_ext4_ext_handle_uninitialized_extents we don't care about the
value of map->m_flags because this value is probably 0, and we prefer
to get the value of flags because we can know how to handle this
extent in this function.
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Have the ring buffer commit function use the irq_work infrastructure to
wake up any waiters waiting on the ring buffer for new data. The irq_work
was created for such a purpose, where doing the actual wake up at the
time of adding data is too dangerous, as an event or function trace may
be in the midst of the work queue locks and cause deadlocks. The irq_work
will either delay the action to the next timer interrupt, or trigger an IPI
to itself forcing an interrupt to do the work (in a safe location).
With irq_work, all ring buffer commits can safely do wakeups, removing
the need for the ring buffer commit "nowake" variants, which were used
by events and function tracing. All commits can now safely use the
normal commit, and the "nowake" variants can be removed.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The functions defined in include/trace/syscalls.h are not used directly
since struct ftrace_event_class was introduced. Remove them from the
header file and rearrange the ftrace_event_class declarations in
trace_syscalls.c.
Link: http://lkml.kernel.org/r/1339112785-21806-2-git-send-email-vnagarnaik@google.com
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Remove ftrace_format_syscall() declaration; it is neither defined nor
used. Also update a comment and formatting.
Link: http://lkml.kernel.org/r/1339112785-21806-1-git-send-email-vnagarnaik@google.com
Signed-off-by: David Sharp <dhsharp@google.com>
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
As Mukesh explained it, the MMUEXT_TLB_FLUSH_ALL allows the
hypervisor to do a TLB flush on all active vCPUs. If instead
we were using the generic one (which ends up being xen_flush_tlb)
we end up making the MMUEXT_TLB_FLUSH_LOCAL hypercall. But
before we make that hypercall the kernel will IPI all of the
vCPUs (even those that were asleep from the hypervisor
perspective). The end result is that we needlessly wake them
up and do a TLB flush when we can just let the hypervisor
do it correctly.
This patch gives around 50% speed improvement when migrating
idle guest's from one host to another.
Oracle-bug: 14630170
CC: stable@vger.kernel.org
Tested-by: Jingjie Jiang <jingjie.jiang@oracle.com>
Suggested-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Pull btrfs update from Chris Mason:
"This is a large pull, with the bulk of the updates coming from:
- Hole punching
- send/receive fixes
- fsync performance
- Disk format extension allowing more hardlinks inside a single
directory (btrfs-progs patch required to enable the compat bit for
this one)
I'm cooking more unrelated RAID code, but I wanted to make sure this
original batch makes it in. The largest updates here are relatively
old and have been in testing for some time."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (121 commits)
btrfs: init ref_index to zero in add_inode_ref
Btrfs: remove repeated eb->pages check in, disk-io.c/csum_dirty_buffer
Btrfs: fix page leakage
Btrfs: do not warn_on when we cannot alloc a page for an extent buffer
Btrfs: don't bug on enomem in readpage
Btrfs: cleanup pages properly when ENOMEM in compression
Btrfs: make filesystem read-only when submitting barrier fails
Btrfs: detect corrupted filesystem after write I/O errors
Btrfs: make compress and nodatacow mount options mutually exclusive
btrfs: fix message printing
Btrfs: don't bother committing delayed inode updates when fsyncing
btrfs: move inline function code to header file
Btrfs: remove unnecessary IS_ERR in bio_readpage_error()
btrfs: remove unused function btrfs_insert_some_items()
Btrfs: don't commit instead of overcommitting
Btrfs: confirmation of value is added before trace_btrfs_get_extent() is called
Btrfs: be smarter about dropping things from the tree log
Btrfs: don't lookup csums for prealloc extents
Btrfs: cache extent state when writing out dirty metadata pages
Btrfs: do not hold the file extent leaf locked when adding extent item
...
When transparent huge pages were introduced, memory compaction and swap
storms were an issue, and the kernel had to be careful to not make THP
allocations cause pageout or compaction.
Now that we have working compaction deferral, kswapd is smart enough to
invoke compaction and the quadratic behaviour around isolate_free_pages
has been fixed, it should be safe to remove __GFP_NO_KSWAPD.
[minchan@kernel.org: Comment fix]
[mgorman@suse.de: Avoid direct reclaim for deferred compaction]
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
using the meta_bg feature. This allows us to resize file systems
which are greater than 16TB. In addition, the speed of online
resizing has been improved in general.
We also fix a number of races, some of which could lead to deadlocks,
in ext4's Asynchronous I/O and online defrag support, thanks to good
work by Dmitry Monakhov.
There are also a large number of more minor bug fixes and cleanups
from a number of other ext4 contributors, quite of few of which have
submitted fixes for the first time.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJQbxMXAAoJENNvdpvBGATwlg4QAJZ4mHNSL2eaaxjRtTbL1pAz
+FVXpJ3lhw1lSfE9hJGqPVE8EfU2fWjIqxEI7dgh95Tukc5pUnPAQ2/hBz8ZA0qq
o0AFMk3mRnvCEh6HsZfumsV83eqpR3k/zEy4uFH+KtxBskPe2sEKy3B7qOxvgdKW
Gh8B2WqF2BpIj9WIT1P9G6xsxZW64EMHTbWcgRhuoRD7bakDNnwQ3kElz/TJQU5q
bM/5wE7pqKwU2J1L0Ho0mxDi0f/BbXeJdA9k1tQy2KM1pZwHtpj4Ls0qmfoi49GE
KyZqQOXlFbAz/9tidPDceY5KoRRQm1MwZ+1MimQX1P+40cs/w3pNu3yiibcaXIru
UZ63AQMCj5JHMcFNVi20sVCwjU/ibNtEO75cfDD4bzPgHJvfCj73EbHTLl21nbTu
izIMffhJEHmRnmRXiiortYVuI4b19oIfnXg7eclrJoUWSuGwKKsJOc5nMjDqidG4
B7Gq4TD89sGkIYzx+50E+ll2ispcBN0BQnGqp4k2BzgDyEHhuFYk7VuVQvJgCGTi
eobzQJj7JUXPWxyemcAVkQTtUq4vVbkm/IwS+/GA9b9Z80X8hR8x6EVHUW5lX3qC
YHoBSCU4XKZXXWqzx0fIVCXyKKFiBzM+OXcgHOKH90vK8k6kPmPODhNCxvV3pITU
jfl9q+X1dY4SpybZjLt5
=iYeV
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"The big new feature added this time is supporting online resizing
using the meta_bg feature. This allows us to resize file systems
which are greater than 16TB. In addition, the speed of online
resizing has been improved in general.
We also fix a number of races, some of which could lead to deadlocks,
in ext4's Asynchronous I/O and online defrag support, thanks to good
work by Dmitry Monakhov.
There are also a large number of more minor bug fixes and cleanups
from a number of other ext4 contributors, quite of few of which have
submitted fixes for the first time."
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (69 commits)
ext4: fix ext4_flush_completed_IO wait semantics
ext4: fix mtime update in nodelalloc mode
ext4: fix ext_remove_space for punch_hole case
ext4: punch_hole should wait for DIO writers
ext4: serialize truncate with owerwrite DIO workers
ext4: endless truncate due to nonlocked dio readers
ext4: serialize unlocked dio reads with truncate
ext4: serialize dio nonlocked reads with defrag workers
ext4: completed_io locking cleanup
ext4: fix unwritten counter leakage
ext4: give i_aiodio_unwritten a more appropriate name
ext4: ext4_inode_info diet
ext4: convert to use leXX_add_cpu()
ext4: ext4_bread usage audit
fs: reserve fallocate flag codepoint
ext4: remove redundant offset check in mext_check_arguments()
ext4: don't clear orphan list on ro mount with errors
jbd2: fix assertion failure in commit code due to lacking transaction credits
ext4: release donor reference when EXT4_IOC_MOVE_EXT ioctl fails
ext4: enable FITRIM ioctl on bigalloc file system
...
Convert #include "..." to #include <path/...> in kernel system headers.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>
Pull the trivial tree from Jiri Kosina:
"Tiny usual fixes all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
doc: fix old config name of kprobetrace
fs/fs-writeback.c: cleanup riteback_sb_inodes kerneldoc
btrfs: fix the commment for the action flags in delayed-ref.h
btrfs: fix trivial typo for the comment of BTRFS_FREE_INO_OBJECTID
vfs: fix kerneldoc for generic_fh_to_parent()
treewide: fix comment/printk/variable typos
ipr: fix small coding style issues
doc: fix broken utf8 encoding
nfs: comment fix
platform/x86: fix asus_laptop.wled_type module parameter
mfd: printk/comment fixes
doc: getdelays.c: remember to close() socket on error in create_nl_socket()
doc: aliasing-test: close fd on write error
mmc: fix comment typos
dma: fix comments
spi: fix comment/printk typos in spi
Coccinelle: fix typo in memdup_user.cocci
tmiofb: missing NULL pointer checks
tools: perf: Fix typo in tools/perf
tools/testing: fix comment / output typos
...
When allocating memory fails, page is NULL. page_to_pfn() will
cause the kernel panicked if we don't use sparsemem vmemmap.
Link: http://lkml.kernel.org/r/505AB1FF.8020104@cn.fujitsu.com
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable <stable@vger.kernel.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Most hardware architectures require that data (including struct fields)
have to be aligned in memory. To make it happen compiler inserts padding
between struct fields if they are not aligned correctly.
Reorder fields to remove paddings and make structures denser. Making data
smaller saves some memory that is very important for trace events.
Tracing buffer has limited size and making objects smaller we can put more
of them without overflowing the tracing buffer.
To find data struct holes I used 'pahole -H 1 -E -I vmlinux.o' from
'dwarves' package.
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Pull perf fixes from Ingo Molnar:
"Fix merge window fallout and fix sleep profiling (this was always
broken, so it's not a fix for the merge window - we can skip this one
from the head of the tree)."
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/trace: Add ability to set a target task for events
perf/x86: Fix USER/KERNEL tagging of samples properly
perf/x86/intel/uncore: Make UNCORE_PMU_HRTIMER_INTERVAL 64-bit
Merge Andrew's second set of patches:
- MM
- a few random fixes
- a couple of RTC leftovers
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits)
rtc/rtc-88pm80x: remove unneed devm_kfree
rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
tmpfs: distribute interleave better across nodes
mm: remove redundant initialization
mm: warn if pg_data_t isn't initialized with zero
mips: zero out pg_data_t when it's allocated
memcg: gix memory accounting scalability in shrink_page_list
mm/sparse: remove index_init_lock
mm/sparse: more checks on mem_section number
mm/sparse: optimize sparse_index_alloc
memcg: add mem_cgroup_from_css() helper
memcg: further prevent OOM with too many dirty pages
memcg: prevent OOM with too many dirty pages
mm: mmu_notifier: fix freed page still mapped in secondary MMU
mm: memcg: only check anon swapin page charges for swap cache
mm: memcg: only check swap cache pages for repeated charging
mm: memcg: split swapin charge function into private and public part
mm: memcg: remove needless !mm fixup to init_mm when charging
mm: memcg: remove unneeded shmem charge type
...
from interrupts for /dev/random and /dev/urandom. The goal is to
addresses weaknesses discussed in the paper "Mining your Ps and Qs:
Detection of Widespread Weak Keys in Network Devices", by Nadia
Heninger, Zakir Durumeric, Eric Wustrow, J. Alex Halderman, which will
be published in the Proceedings of the 21st Usenix Security Symposium,
August 2012. (See https://factorable.net for more information and an
extended version of the paper.)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJQF/0DAAoJENNvdpvBGATwIowQAOep9QKtLrBvb2lwIRVmeiy8
lRf7V/tYZnz4FePbR0W92JQfKYkCV8yyOO0bmeRzWL3v4m+lRwDTSyA1DDyQMoH+
LOMzvDKSLJMSXTXdSOIr1WYACphViCR/9CrbMBCKSkYfZLJ1MdaEDxT3rcpTGD0T
6iknUweiSkHHhkerU5yQL7FKzD5kYUe0hsF47w7QVlHRHJsW2fsZqkFoh+RpnhNw
03u+djxNGBo9qV81vZ9D1b0vA9uRlEjoWOOEG2XE4M2iq6TUySueA72dQnCwunfi
3kG/u1Swv2dgq6aRrP3H7zdwhYSourGxziu3jNhEKwKEohrxYY7xjNX3RVeTqP67
AzlKsOTWpRLIDrzjSLlb8VxRQiZewu8Unex3e1G+eo20sbcIObHGrxNp7K00zZvd
QZiMHhOwItwFTe4lBO+XbqH2JKbL9/uJmwh5EipMpQTraKO9E6N3CJiUHjzBLo2K
iGDZxRMKf4gVJRwDxbbP6D70JPVu8ZJ09XVIpsXQ3Z1xNqaMF0QdCmP3ty56q1o0
NvkSXxPKrijZs8Sk0rVDqnJ3ll8PuDnXMv5eDtL42VT818I5WxESn9djjwEanGv0
TYxbFub/NRxmPEE5B2Js5FBpqsLf5f282OSMeS/5WLBbnHJR1OoPoAhGVpHvxntC
bi5FC1OolqhvzVIdsqgt
=u7KM
-----END PGP SIGNATURE-----
Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull random subsystem patches from Ted Ts'o:
"This patch series contains a major revamp of how we collect entropy
from interrupts for /dev/random and /dev/urandom.
The goal is to addresses weaknesses discussed in the paper "Mining
your Ps and Qs: Detection of Widespread Weak Keys in Network Devices",
by Nadia Heninger, Zakir Durumeric, Eric Wustrow, J. Alex Halderman,
which will be published in the Proceedings of the 21st Usenix Security
Symposium, August 2012. (See https://factorable.net for more
information and an extended version of the paper.)"
Fix up trivial conflicts due to nearby changes in
drivers/{mfd/ab3100-core.c, usb/gadget/omap_udc.c}
* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: (33 commits)
random: mix in architectural randomness in extract_buf()
dmi: Feed DMI table to /dev/random driver
random: Add comment to random_initialize()
random: final removal of IRQF_SAMPLE_RANDOM
um: remove IRQF_SAMPLE_RANDOM which is now a no-op
sparc/ldc: remove IRQF_SAMPLE_RANDOM which is now a no-op
[ARM] pxa: remove IRQF_SAMPLE_RANDOM which is now a no-op
board-palmz71: remove IRQF_SAMPLE_RANDOM which is now a no-op
isp1301_omap: remove IRQF_SAMPLE_RANDOM which is now a no-op
pxa25x_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
omap_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
goku_udc: remove IRQF_SAMPLE_RANDOM which was commented out
uartlite: remove IRQF_SAMPLE_RANDOM which is now a no-op
drivers: hv: remove IRQF_SAMPLE_RANDOM which is now a no-op
xen-blkfront: remove IRQF_SAMPLE_RANDOM which is now a no-op
n2_crypto: remove IRQF_SAMPLE_RANDOM which is now a no-op
pda_power: remove IRQF_SAMPLE_RANDOM which is now a no-op
i2c-pmcmsp: remove IRQF_SAMPLE_RANDOM which is now a no-op
input/serio/hp_sdc.c: remove IRQF_SAMPLE_RANDOM which is now a no-op
mfd: remove IRQF_SAMPLE_RANDOM which is now a no-op
...
__GFP_MEMALLOC will allow the allocation to disregard the watermarks, much
like PF_MEMALLOC. It allows one to pass along the memalloc state in
object related allocation flags as opposed to task related flags, such as
sk->sk_allocation. This removes the need for ALLOC_PFMEMALLOC as callers
using __GFP_MEMALLOC can get the ALLOC_NO_WATERMARK flag which is now
enough to identify allocations related to page reclaim.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A few events are interesting not only for a current task.
For example, sched_stat_* events are interesting for a task
which wakes up. For this reason, it will be good if such
events will be delivered to a target task too.
Now a target task can be set by using __perf_task().
The original idea and a draft patch belongs to Peter Zijlstra.
I need these events for profiling sleep times. sched_switch is used for
getting callchains and sched_stat_* is used for getting time periods.
These events are combined in user space, then it can be analyzed by
perf tools.
Inspired-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arun Sharma <asharma@fb.com>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1342016098-213063-1-git-send-email-avagin@openvz.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86/mm changes from Peter Anvin:
"The big change here is the patchset by Alex Shi to use INVLPG to flush
only the affected pages when we only need to flush a small page range.
It also removes the special INVALIDATE_TLB_VECTOR interrupts (32
vectors!) and replace it with an ordinary IPI function call."
Fix up trivial conflicts in arch/x86/include/asm/apic.h (added code next
to changed line)
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tlb: Fix build warning and crash when building for !SMP
x86/tlb: do flush_tlb_kernel_range by 'invlpg'
x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR
x86/tlb: enable tlb flush range support for x86
mm/mmu_gather: enable tlb flush range in generic mmu_gather
x86/tlb: add tlb_flushall_shift knob into debugfs
x86/tlb: add tlb_flushall_shift for specific CPU
x86/tlb: fall back to flush all when meet a THP large page
x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range
x86/tlb_info: get last level TLB entry number of CPU
x86: Add read_mostly declaration/definition to variables from smp.h
x86: Define early read-mostly per-cpu macros
Pull workqueue changes from Tejun Heo:
"There are three major changes.
- WQ_HIGHPRI has been reimplemented so that high priority work items
are served by worker threads with -20 nice value from dedicated
highpri worker pools.
- CPU hotplug support has been reimplemented such that idle workers
are kept across CPU hotplug events. This makes CPU hotplug cheaper
(for PM) and makes the code simpler.
- flush_kthread_work() has been reimplemented so that a work item can
be freed while executing. This removes an annoying behavior
difference between kthread_worker and workqueue."
* 'for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix spurious CPU locality WARN from process_one_work()
kthread_worker: reimplement flush_kthread_work() to allow freeing the work item being executed
kthread_worker: reorganize to prepare for flush_kthread_work() reimplementation
workqueue: simplify CPU hotplug code
workqueue: remove CPU offline trustee
workqueue: don't butcher idle workers on an offline CPU
workqueue: reimplement CPU online rebinding to handle idle workers
workqueue: drop @bind from create_worker()
workqueue: use mutex for global_cwq manager exclusion
workqueue: ROGUE workers are UNBOUND workers
workqueue: drop CPU_DYING notifier operation
workqueue: perform cpu down operations from low priority cpu_notifier()
workqueue: reimplement WQ_HIGHPRI using a separate worker_pool
workqueue: introduce NR_WORKER_POOLS and for_each_worker_pool()
workqueue: separate out worker_pool flags
workqueue: use @pool instead of @gcwq or @cpu where applicable
workqueue: factor out worker_pool from global_cwq
workqueue: don't use WQ_HIGHPRI for unbound workqueues
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQDRDNAAoJEI7yEDeUysxlkl8P/3C2AHx2webOU8sVzhfU6ONZ
ZoGevwBjyZIeJEmiWVpFTTEew1l0PXtpyOocXGNUXIddVnhXTQOKr/Scj4uFbmx8
ROqgK8NSX9+xOGrBPCoN7SlJkmp+m6uYtwYkl2SGnsEVLWMKkc7J7oqmszCcTQvN
UXMf7G47/Ul2NUSBdv4Yvizhl4kpvWxluiweDw3E/hIQKN0uyP7CY58qcAztw8nG
csZBAnnuPFwIAWxHXW3eBBv4UP138HbNDqJ/dujjocM6GnOxmXJmcZ6b57gh+Y64
3+w9IR4qrRWnsErb/I8inKLJ1Jdcf7yV2FmxYqR4pIXay2Yzo1BsvFd6EB+JavUv
pJpixrFiDDFoQyXlh4tGpsjpqdXNMLqyG4YpqzSZ46C8naVv9gKE7SXqlXnjyDlb
Llx3hb9Fop8O5ykYEGHi+gIISAK5eETiQl4yw9RUBDpxydH4qJtqGIbLiDy8y9wi
Xyi8PBlNl+biJFsK805lxURqTp/SJTC3+Zb7A7CzYEQm5xZw3W/CKZx1ZYBfpaa/
pWaP6tB7JwgLIVXi4HQayLWqMVwH0soZIn9yazpOEFv6qO8d5QH5RAxAW2VXE3n5
JDlrajar/lGIdiBVWfwTJLb86gv3QDZtIWoR9mZuLKeKWE/6PRLe7HQpG1pJovsm
2AsN5bS0BWq+aqPpZHa5
=pECD
-----END PGP SIGNATURE-----
Merge tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Avi Kivity:
"Highlights include
- full big real mode emulation on pre-Westmere Intel hosts (can be
disabled with emulate_invalid_guest_state=0)
- relatively small ppc and s390 updates
- PCID/INVPCID support in guests
- EOI avoidance; 3.6 guests should perform better on 3.6 hosts on
interrupt intensive workloads)
- Lockless write faults during live migration
- EPT accessed/dirty bits support for new Intel processors"
Fix up conflicts in:
- Documentation/virtual/kvm/api.txt:
Stupid subchapter numbering, added next to each other.
- arch/powerpc/kvm/booke_interrupts.S:
PPC asm changes clashing with the KVM fixes
- arch/s390/include/asm/sigp.h, arch/s390/kvm/sigp.c:
Duplicated commits through the kvm tree and the s390 tree, with
subsequent edits in the KVM tree.
* tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits)
KVM: fix race with level interrupts
x86, hyper: fix build with !CONFIG_KVM_GUEST
Revert "apic: fix kvm build on UP without IOAPIC"
KVM guest: switch to apic_set_eoi_write, apic_write
apic: add apic_set_eoi_write for PV use
KVM: VMX: Implement PCID/INVPCID for guests with EPT
KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check
KVM: PPC: Critical interrupt emulation support
KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests
KVM: PPC64: booke: Set interrupt computation mode for 64-bit host
KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt
KVM: PPC: bookehv64: Add support for std/ld emulation.
booke: Added crit/mc exception handler for e500v2
booke/bookehv: Add host crit-watchdog exception support
KVM: MMU: document mmu-lock and fast page fault
KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint
KVM: MMU: trace fast page fault
KVM: MMU: fast path of handling guest page fault
KVM: MMU: introduce SPTE_MMU_WRITEABLE bit
KVM: MMU: fold tlb flush judgement into mmu_spte_update
...
Pull perf events changes from Ingo Molnar:
"- kernel side:
- Intel uncore PMU support for Nehalem and Sandy Bridge CPUs, we
support both the events available via the MSR and via the PCI
access space.
- various uprobes cleanups and restructurings
- PMU driver quirks by microcode version and required x86 microcode
loader cleanups/robustization
- various tracing robustness updates
- static keys: remove obsolete static_branch()
- tooling side:
- GTK browser improvements
- perf report browser: support screenshots to file
- more automated tests
- perf kvm improvements
- perf bench refinements
- build environment improvements
- pipe mode improvements
- libtraceevent updates, we have now hopefully merged most bits with
the out of tree forked code base
... and many other goodies."
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (138 commits)
tracing: Check for allocation failure in __tracing_open()
perf/x86: Fix intel_perfmon_event_mapformatting
jump label: Remove static_branch()
tracepoint: Use static_key_false(), since static_branch() is deprecated
perf/x86: Uncore filter support for SandyBridge-EP
perf/x86: Detect number of instances of uncore CBox
perf/x86: Fix event constraint for SandyBridge-EP C-Box
perf/x86: Use 0xff as pseudo code for fixed uncore event
perf/x86: Save a few bytes in 'struct x86_pmu'
perf/x86: Add a microcode revision check for SNB-PEBS
perf/x86: Improve debug output in check_hw_exists()
perf/x86/amd: Unify AMD's generic and family 15h pmus
perf/x86: Move Intel specific code to intel_pmu_init()
perf/x86: Rename Intel specific macros
perf/x86: Fix USER/KERNEL tagging of samples
perf tools: Split event symbols arrays to hw and sw parts
perf tools: Split out PE_VALUE_SYM parsing token to SW and HW tokens
perf tools: Add empty rule for new line in event syntax parsing
perf test: Use ARRAY_SIZE in parse events tests
tools lib traceevent: Cleanup realloc use
...
Move worklist and all worker management fields from global_cwq into
the new struct worker_pool. worker_pool points back to the containing
gcwq. worker and cpu_workqueue_struct are updated to point to
worker_pool instead of gcwq too.
This change is mechanical and doesn't introduce any functional
difference other than rearranging of fields and an added level of
indirection in some places. This is to prepare for multiple pools per
gcwq.
v2: Comment typo fixes as suggested by Namhyung.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
This commit adds event tracing for _rcu_barrier() execution. This
is defined only if RCU_TRACE=y.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
x86 has no flush_tlb_range support in instruction level. Currently the
flush_tlb_range just implemented by flushing all page table. That is not
the best solution for all scenarios. In fact, if we just use 'invlpg' to
flush few lines from TLB, we can get the performance gain from later
remain TLB lines accessing.
But the 'invlpg' instruction costs much of time. Its execution time can
compete with cr3 rewriting, and even a bit more on SNB CPU.
So, on a 512 4KB TLB entries CPU, the balance points is at:
(512 - X) * 100ns(assumed TLB refill cost) =
X(TLB flush entries) * 100ns(assumed invlpg cost)
Here, X is 256, that is 1/2 of 512 entries.
But with the mysterious CPU pre-fetcher and page miss handler Unit, the
assumed TLB refill cost is far lower then 100ns in sequential access. And
2 HT siblings in one core makes the memory access more faster if they are
accessing the same memory. So, in the patch, I just do the change when
the target entries is less than 1/16 of whole active tlb entries.
Actually, I have no data support for the percentage '1/16', so any
suggestions are welcomed.
As to hugetlb, guess due to smaller page table, and smaller active TLB
entries, I didn't see benefit via my benchmark, so no optimizing now.
My micro benchmark show in ideal scenarios, the performance improves 70
percent in reading. And in worst scenario, the reading/writing
performance is similar with unpatched 3.4-rc4 kernel.
Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP
'always':
multi thread testing, '-t' paramter is thread number:
with patch unpatched 3.4-rc4
./mprotect -t 1 14ns 24ns
./mprotect -t 2 13ns 22ns
./mprotect -t 4 12ns 19ns
./mprotect -t 8 14ns 16ns
./mprotect -t 16 28ns 26ns
./mprotect -t 32 54ns 51ns
./mprotect -t 128 200ns 199ns
Single process with sequencial flushing and memory accessing:
with patch unpatched 3.4-rc4
./mprotect 7ns 11ns
./mprotect -p 4096 -l 8 -n 10240
21ns 21ns
[ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com
has additional performance numbers. ]
Signed-off-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This is a preparatory patch for the KVM/ARM implementation. KVM/ARM will use
the KVM_IRQ_LINE ioctl, which is currently conditional on
__KVM_HAVE_IOAPIC, but ARM obviously doesn't have any IOAPIC support and we
need a separate define.
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The list of exit reasons for the kvm_userspace_exit event was
missing recent additions; bring it into sync again.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
In the current code, a short dyntick-idle interval (where there is
at least one non-lazy callback on the CPU) and a long dyntick-idle
interval (where there are only lazy callbacks on the CPU) are traced
identically, which can be less than helpful. This commit therefore
emits different event traces in these two cases.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>
There is little motiviation for reclaim_mode_t once RECLAIM_MODE_[A]SYNC
and lumpy reclaim have been removed. This patch gets rid of
reclaim_mode_t as well and improves the documentation about what
reclaim/compaction is and when it is triggered.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ying Han <yinghan@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch stops reclaim/compaction entering sync reclaim as this was
only intended for lumpy reclaim and an oversight. Page migration has
its own logic for stalling on writeback pages if necessary and memory
compaction is already using it.
Waiting on page writeback is bad for a number of reasons but the primary
one is that waiting on writeback to a slow device like USB can take a
considerable length of time. Page reclaim instead uses
wait_iff_congested() to throttle if too many dirty pages are being
scanned.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ying Han <yinghan@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This series removes lumpy reclaim and some stalling logic that was
unintentionally being used by memory compaction. The end result is that
stalling on dirty pages during page reclaim now depends on
wait_iff_congested().
Four kernels were compared
3.3.0 vanilla
3.4.0-rc2 vanilla
3.4.0-rc2 lumpyremove-v2 is patch one from this series
3.4.0-rc2 nosync-v2r3 is the full series
Removing lumpy reclaim saves almost 900 bytes of text whereas the full
series removes 1200 bytes.
text data bss dec hex filename
6740375 1927944 2260992 10929311 a6c49f vmlinux-3.4.0-rc2-vanilla
6739479 1927944 2260992 10928415 a6c11f vmlinux-3.4.0-rc2-lumpyremove-v2
6739159 1927944 2260992 10928095 a6bfdf vmlinux-3.4.0-rc2-nosync-v2
There are behaviour changes in the series and so tests were run with
monitoring of ftrace events. This disrupts results so the performance
results are distorted but the new behaviour should be clearer.
fs-mark running in a threaded configuration showed little of interest as
it did not push reclaim aggressively
FS-Mark Multi Threaded
3.3.0-vanilla rc2-vanilla lumpyremove-v2r3 nosync-v2r3
Files/s min 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%)
Files/s mean 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%)
Files/s stddev 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
Files/s max 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%)
Overhead min 508667.00 ( 0.00%) 521350.00 (-2.49%) 544292.00 (-7.00%) 547168.00 (-7.57%)
Overhead mean 551185.00 ( 0.00%) 652690.73 (-18.42%) 991208.40 (-79.83%) 570130.53 (-3.44%)
Overhead stddev 18200.69 ( 0.00%) 331958.29 (-1723.88%) 1579579.43 (-8578.68%) 9576.81 (47.38%)
Overhead max 576775.00 ( 0.00%) 1846634.00 (-220.17%) 6901055.00 (-1096.49%) 585675.00 (-1.54%)
MMTests Statistics: duration
Sys Time Running Test (seconds) 309.90 300.95 307.33 298.95
User+Sys Time Running Test (seconds) 319.32 309.67 315.69 307.51
Total Elapsed Time (seconds) 1187.85 1193.09 1191.98 1193.73
MMTests Statistics: vmstat
Page Ins 80532 82212 81420 79480
Page Outs 111434984 111456240 111437376 111582628
Swap Ins 0 0 0 0
Swap Outs 0 0 0 0
Direct pages scanned 44881 27889 27453 34843
Kswapd pages scanned 25841428 25860774 25861233 25843212
Kswapd pages reclaimed 25841393 25860741 25861199 25843179
Direct pages reclaimed 44881 27889 27453 34843
Kswapd efficiency 99% 99% 99% 99%
Kswapd velocity 21754.791 21675.460 21696.029 21649.127
Direct efficiency 100% 100% 100% 100%
Direct velocity 37.783 23.375 23.031 29.188
Percentage direct scans 0% 0% 0% 0%
ftrace showed that there was no stalling on writeback or pages submitted
for IO from reclaim context.
postmark was similar and while it was more interesting, it also did not
push reclaim heavily.
POSTMARK
3.3.0-vanilla rc2-vanilla lumpyremove-v2r3 nosync-v2r3
Transactions per second: 16.00 ( 0.00%) 20.00 (25.00%) 18.00 (12.50%) 17.00 ( 6.25%)
Data megabytes read per second: 18.80 ( 0.00%) 24.27 (29.10%) 22.26 (18.40%) 20.54 ( 9.26%)
Data megabytes written per second: 35.83 ( 0.00%) 46.25 (29.08%) 42.42 (18.39%) 39.14 ( 9.24%)
Files created alone per second: 28.00 ( 0.00%) 38.00 (35.71%) 34.00 (21.43%) 30.00 ( 7.14%)
Files create/transact per second: 8.00 ( 0.00%) 10.00 (25.00%) 9.00 (12.50%) 8.00 ( 0.00%)
Files deleted alone per second: 556.00 ( 0.00%) 1224.00 (120.14%) 3062.00 (450.72%) 6124.00 (1001.44%)
Files delete/transact per second: 8.00 ( 0.00%) 10.00 (25.00%) 9.00 (12.50%) 8.00 ( 0.00%)
MMTests Statistics: duration
Sys Time Running Test (seconds) 113.34 107.99 109.73 108.72
User+Sys Time Running Test (seconds) 145.51 139.81 143.32 143.55
Total Elapsed Time (seconds) 1159.16 899.23 980.17 1062.27
MMTests Statistics: vmstat
Page Ins 13710192 13729032 13727944 13760136
Page Outs 43071140 42987228 42733684 42931624
Swap Ins 0 0 0 0
Swap Outs 0 0 0 0
Direct pages scanned 0 0 0 0
Kswapd pages scanned 99416139937443 9939085 9929154
Kswapd pages reclaimed 9940926 9936751 9938397 9928465
Direct pages reclaimed 0 0 0 0
Kswapd efficiency 99% 99% 99% 99%
Kswapd velocity 8576.567 11051.058 10140.164 9347.109
Direct efficiency 100% 100% 100% 100%
Direct velocity 0.000 0.000 0.000 0.000
It looks like here that the full series regresses performance but as
ftrace showed no usage of wait_iff_congested() or sync reclaim I am
assuming it's a disruption due to monitoring. Other data such as memory
usage, page IO, swap IO all looked similar.
Running a benchmark with a plain DD showed nothing very interesting.
The full series stalled in wait_iff_congested() slightly less but stall
times on vanilla kernels were marginal.
Running a benchmark that hammered on file-backed mappings showed stalls
due to congestion but not in sync writebacks
MICRO
3.3.0-vanilla rc2-vanilla lumpyremove-v2r3 nosync-v2r3
MMTests Statistics: duration
Sys Time Running Test (seconds) 308.13 294.50 298.75 299.53
User+Sys Time Running Test (seconds) 330.45 316.28 318.93 320.79
Total Elapsed Time (seconds) 1814.90 1833.88 1821.14 1832.91
MMTests Statistics: vmstat
Page Ins 108712 120708 97224 110344
Page Outs 155514576 156017404 155813676 156193256
Swap Ins 0 0 0 0
Swap Outs 0 0 0 0
Direct pages scanned 2599253 1550480 2512822 2414760
Kswapd pages scanned 69742364 71150694 68839041 69692533
Kswapd pages reclaimed 34824488 34773341 34796602 34799396
Direct pages reclaimed 53693 94750 61792 75205
Kswapd efficiency 49% 48% 50% 49%
Kswapd velocity 38427.662 38797.901 37799.972 38022.889
Direct efficiency 2% 6% 2% 3%
Direct velocity 1432.174 845.464 1379.807 1317.446
Percentage direct scans 3% 2% 3% 3%
Page writes by reclaim 0 0 0 0
Page writes file 0 0 0 0
Page writes anon 0 0 0 0
Page reclaim immediate 0 0 0 1218
Page rescued immediate 0 0 0 0
Slabs scanned 15360 16384 13312 16384
Direct inode steals 0 0 0 0
Kswapd inode steals 4340 4327 1630 4323
FTrace Reclaim Statistics: congestion_wait
Direct number congest waited 0 0 0 0
Direct time congest waited 0ms 0ms 0ms 0ms
Direct full congest waited 0 0 0 0
Direct number conditional waited 900 870 754 789
Direct time conditional waited 0ms 0ms 0ms 20ms
Direct full conditional waited 0 0 0 0
KSwapd number congest waited 2106 2308 2116 1915
KSwapd time congest waited 139924ms 157832ms 125652ms 132516ms
KSwapd full congest waited 1346 1530 1202 1278
KSwapd number conditional waited 12922 16320 10943 14670
KSwapd time conditional waited 0ms 0ms 0ms 0ms
KSwapd full conditional waited 0 0 0 0
Reclaim statistics are not radically changed. The stall times in kswapd
are massive but it is clear that it is due to calls to congestion_wait()
and that is almost certainly the call in balance_pgdat(). Otherwise
stalls due to dirty pages are non-existant.
I ran a benchmark that stressed high-order allocation. This is very
artifical load but was used in the past to evaluate lumpy reclaim and
compaction. Generally I look at allocation success rates and latency
figures.
STRESS-HIGHALLOC
3.3.0-vanilla rc2-vanilla lumpyremove-v2r3 nosync-v2r3
Pass 1 81.00 ( 0.00%) 28.00 (-53.00%) 24.00 (-57.00%) 28.00 (-53.00%)
Pass 2 82.00 ( 0.00%) 39.00 (-43.00%) 38.00 (-44.00%) 43.00 (-39.00%)
while Rested 88.00 ( 0.00%) 87.00 (-1.00%) 88.00 ( 0.00%) 88.00 ( 0.00%)
MMTests Statistics: duration
Sys Time Running Test (seconds) 740.93 681.42 685.14 684.87
User+Sys Time Running Test (seconds) 2922.65 3269.52 3281.35 3279.44
Total Elapsed Time (seconds) 1161.73 1152.49 1159.55 1161.44
MMTests Statistics: vmstat
Page Ins 4486020 2807256 2855944 2876244
Page Outs 7261600 7973688 7975320 7986120
Swap Ins 31694 0 0 0
Swap Outs 98179 0 0 0
Direct pages scanned 53494 57731 34406 113015
Kswapd pages scanned 6271173 1287481 1278174 1219095
Kswapd pages reclaimed 2029240 1281025 1260708 1201583
Direct pages reclaimed 1468 14564 16649 92456
Kswapd efficiency 32% 99% 98% 98%
Kswapd velocity 5398.133 1117.130 1102.302 1049.641
Direct efficiency 2% 25% 48% 81%
Direct velocity 46.047 50.092 29.672 97.306
Percentage direct scans 0% 4% 2% 8%
Page writes by reclaim 1616049 0 0 0
Page writes file 1517870 0 0 0
Page writes anon 98179 0 0 0
Page reclaim immediate 103778 27339 9796 17831
Page rescued immediate 0 0 0 0
Slabs scanned 1096704 986112 980992 998400
Direct inode steals 223 215040 216736 247881
Kswapd inode steals 175331 61548 68444 63066
Kswapd skipped wait 21991 0 1 0
THP fault alloc 1 135 125 134
THP collapse alloc 393 311 228 236
THP splits 25 13 7 8
THP fault fallback 0 0 0 0
THP collapse fail 3 5 7 7
Compaction stalls 865 1270 1422 1518
Compaction success 370 401 353 383
Compaction failures 495 869 1069 1135
Compaction pages moved 870155 3828868 4036106 4423626
Compaction move failure 26429 23865 29742 27514
Success rates are completely hosed for 3.4-rc2 which is almost certainly
due to commit fe2c2a1066 ("vmscan: reclaim at order 0 when compaction
is enabled"). I expected this would happen for kswapd and impair
allocation success rates (https://lkml.org/lkml/2012/1/25/166) but I did
not anticipate this much a difference: 80% less scanning, 37% less
reclaim by kswapd
In comparison, reclaim/compaction is not aggressive and gives up easily
which is the intended behaviour. hugetlbfs uses __GFP_REPEAT and would
be much more aggressive about reclaim/compaction than THP allocations
are. The stress test above is allocating like neither THP or hugetlbfs
but is much closer to THP.
Mainline is now impaired in terms of high order allocation under heavy
load although I do not know to what degree as I did not test with
__GFP_REPEAT. Keep this in mind for bugs related to hugepage pool
resizing, THP allocation and high order atomic allocation failures from
network devices.
In terms of congestion throttling, I see the following for this test
FTrace Reclaim Statistics: congestion_wait
Direct number congest waited 3 0 0 0
Direct time congest waited 0ms 0ms 0ms 0ms
Direct full congest waited 0 0 0 0
Direct number conditional waited 957 512 1081 1075
Direct time conditional waited 0ms 0ms 0ms 0ms
Direct full conditional waited 0 0 0 0
KSwapd number congest waited 36 4 3 5
KSwapd time congest waited 3148ms 400ms 300ms 500ms
KSwapd full congest waited 30 4 3 5
KSwapd number conditional waited 88514 197 332 542
KSwapd time conditional waited 4980ms 0ms 0ms 0ms
KSwapd full conditional waited 49 0 0 0
The "conditional waited" times are the most interesting as this is
directly impacted by the number of dirty pages encountered during scan.
As lumpy reclaim is no longer scanning contiguous ranges, it is finding
fewer dirty pages. This brings wait times from about 5 seconds to 0.
kswapd itself is still calling congestion_wait() so it'll still stall but
it's a lot less.
In terms of the type of IO we were doing, I see this
FTrace Reclaim Statistics: mm_vmscan_writepage
Direct writes anon sync 0 0 0 0
Direct writes anon async 0 0 0 0
Direct writes file sync 0 0 0 0
Direct writes file async 0 0 0 0
Direct writes mixed sync 0 0 0 0
Direct writes mixed async 0 0 0 0
KSwapd writes anon sync 0 0 0 0
KSwapd writes anon async 91682 0 0 0
KSwapd writes file sync 0 0 0 0
KSwapd writes file async 822629 0 0 0
KSwapd writes mixed sync 0 0 0 0
KSwapd writes mixed async 0 0 0 0
In 3.2, kswapd was doing a bunch of async writes of pages but
reclaim/compaction was never reaching a point where it was doing sync
IO. This does not guarantee that reclaim/compaction was not calling
wait_on_page_writeback() but I would consider it unlikely. It indicates
that merging patches 2 and 3 to stop reclaim/compaction calling
wait_on_page_writeback() should be safe.
This patch:
Lumpy reclaim had a purpose but in the mind of some, it was to kick the
system so hard it trashed. For others the purpose was to complicate
vmscan.c. Over time it was giving softer shoes and a nicer attitude but
memory compaction needs to step up and replace it so this patch sends
lumpy reclaim to the farm.
The tracepoint format changes for isolating LRU pages with this patch
applied. Furthermore reclaim/compaction can no longer queue dirty pages
in pageout() if the underlying BDI is congested. Lumpy reclaim used
this logic and reclaim/compaction was using it in error.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ying Han <yinghan@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The swap token code no longer fits in with the current VM model. It
does not play well with cgroups or the better NUMA placement code in
development, since we have only one swap token globally.
It also has the potential to mess with scalability of the system, by
increasing the number of non-reclaimable pages on the active and
inactive anon LRU lists.
Last but not least, the swap token code has been broken for a year
without complaints, as reported by Konstantin Khlebnikov. This suggests
we no longer have much use for it.
The days of sub-1G memory systems with heavy use of swap are over. If
we ever need thrashing reducing code in the future, we will have to
implement something that does scale.
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hughd@google.com>
Acked-by: Bob Picco <bpicco@meloft.net>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJPw2J/AAoJECvKgwp+S8Ja5jkP/3uMxkhf8XQpXCI3O1QVfaQr
uZFfM8sINqIPDVm1dtFjFj7f8Bw9mhE2KAnnJ1rKT8tQwqq9yAse1QPlhCG1ZqoP
+AnMDDXHtx7WmQZXhBvS9b+unpZ7Jr6r6pO5XrmTL2kRL3YJPUhZ2+xbTT5belTB
KoAu4WqORZRxfXoC76S7U8K+D4NcAGhAOxCClsIjmY+oocCiCag4FZOyzYIFViqc
ghUN/+rLQ3fqGGv2yO7Ylx1gUM7sxIwkZQ/h962jFAtxz9czImr2NmRoMliOaOkS
tvcnIf+E3u0n/zIjzFvzhxKgHJPP8PkcPMk60d3jKmFngBkqFTzNUeVTP8md7HrV
4DlXisWr+z7YVyWUCFaNcJLmjiWSwQ8DV/clRLobeBf9EJKan5F1PjFgl6PLJM5F
Qr1+LHMNaetdulBwMRTyveZTzYqw9RmDnD9dWMo4mX/kTpvtC4jTPVV7hkRD+Qlv
5vTRR+VXL3Q50yClLf0AQMSKTnH2gBuepM/b+7cShLGfsMln8DtUjmbigv+niL63
BibcCIbIlP2uWGnl37VhsC34AT+RKt3lggrBOpn/7XJMq/wKR7IRP/7V9TfYgaUN
NBa+wtnLDa1pZEn/X7izdcQP62PzDtmB+ObvYT0Yb40A4+2ud3qF/lB53c1A1ewF
/9c4zxxekjHZnn2oooEa
=oLXf
-----END PGP SIGNATURE-----
Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux
Pull writeback tree from Wu Fengguang:
"Mainly from Jan Kara to avoid iput() in the flusher threads."
* tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: Avoid iput() from flusher thread
vfs: Rename end_writeback() to clear_inode()
vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
writeback: Refactor writeback_single_inode()
writeback: Remove wb->list_lock from writeback_single_inode()
writeback: Separate inode requeueing after writeback
writeback: Move I_DIRTY_PAGES handling
writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
writeback: Move clearing of I_SYNC into inode_sync_complete()
writeback: initialize global_dirty_limit
fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
mm: page-writeback.c: local functions should not be exposed globally
Pull ext2, ext3 and quota fixes from Jan Kara:
"Interesting bits are:
- removal of a special i_mutex locking subclass (I_MUTEX_QUOTA) since
quota code does not need i_mutex anymore in any unusual way.
- backport (from ext4) of a fix of a checkpointing bug (missing cache
flush) that could lead to fs corruption on power failure
The rest are just random small fixes & cleanups."
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2: trivial fix to comment for ext2_free_blocks
ext2: remove the redundant comment for ext2_export_ops
ext3: return 32/64-bit dir name hash according to usage type
quota: Get rid of nested I_MUTEX_QUOTA locking subclass
quota: Use precomputed value of sb_dqopt in dquot_quota_sync
ext2: Remove i_mutex use from ext2_quota_write()
reiserfs: Remove i_mutex use from reiserfs_quota_write()
ext4: Remove i_mutex use from ext4_quota_write()
ext3: Remove i_mutex use from ext3_quota_write()
quota: Fix double lock in add_dquot_ref() with CONFIG_QUOTA_DEBUG
jbd: Write journal superblock with WRITE_FUA after checkpointing
jbd: protect all log tail updates with j_checkpoint_mutex
jbd: Split updating of journal superblock and marking journal empty
ext2: do not register write_super within VFS
ext2: Remove s_dirt handling
ext2: write superblock only once on unmount
ext3: update documentation with barrier=1 default
ext3: remove max_debt in find_group_orlov()
jbd: Refine commit writeout logic
Pull user namespace enhancements from Eric Biederman:
"This is a course correction for the user namespace, so that we can
reach an inexpensive, maintainable, and reasonably complete
implementation.
Highlights:
- Config guards make it impossible to enable the user namespace and
code that has not been converted to be user namespace safe.
- Use of the new kuid_t type ensures the if you somehow get past the
config guards the kernel will encounter type errors if you enable
user namespaces and attempt to compile in code whose permission
checks have not been updated to be user namespace safe.
- All uids from child user namespaces are mapped into the initial
user namespace before they are processed. Removing the need to add
an additional check to see if the user namespace of the compared
uids remains the same.
- With the user namespaces compiled out the performance is as good or
better than it is today.
- For most operations absolutely nothing changes performance or
operationally with the user namespace enabled.
- The worst case performance I could come up with was timing 1
billion cache cold stat operations with the user namespace code
enabled. This went from 156s to 164s on my laptop (or 156ns to
164ns per stat operation).
- (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
Most uid/gid setting system calls treat these value specially
anyway so attempting to use -1 as a uid would likely cause
entertaining failures in userspace.
- If setuid is called with a uid that can not be mapped setuid fails.
I have looked at sendmail, login, ssh and every other program I
could think of that would call setuid and they all check for and
handle the case where setuid fails.
- If stat or a similar system call is called from a context in which
we can not map a uid we lie and return overflowuid. The LFS
experience suggests not lying and returning an error code might be
better, but the historical precedent with uids is different and I
can not think of anything that would break by lying about a uid we
can't map.
- Capabilities are localized to the current user namespace making it
safe to give the initial user in a user namespace all capabilities.
My git tree covers all of the modifications needed to convert the core
kernel and enough changes to make a system bootable to runlevel 1."
Fix up trivial conflicts due to nearby independent changes in fs/stat.c
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
userns: Silence silly gcc warning.
cred: use correct cred accessor with regards to rcu read lock
userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
userns: Convert cgroup permission checks to use uid_eq
userns: Convert tmpfs to use kuid and kgid where appropriate
userns: Convert sysfs to use kgid/kuid where appropriate
userns: Convert sysctl permission checks to use kuid and kgids.
userns: Convert proc to use kuid/kgid where appropriate
userns: Convert ext4 to user kuid/kgid where appropriate
userns: Convert ext3 to use kuid/kgid where appropriate
userns: Convert ext2 to use kuid/kgid where appropriate.
userns: Convert devpts to use kuid/kgid where appropriate
userns: Convert binary formats to use kuid/kgid where appropriate
userns: Add negative depends on entries to avoid building code that is userns unsafe
userns: signal remove unnecessary map_cred_ns
userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
userns: Convert stat to return values mapped from kuids and kgids
userns: Convert user specfied uids and gids in chown into kuids and kgid
userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
...
* Implementation of opportunistic suspend (autosleep) and user space interface
for manipulating wakeup sources.
* Hibernate updates from Bojan Smojver and Minho Ban.
* Updates of the runtime PM core and generic PM domains framework related to
PM QoS.
* Assorted fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQIcBAABAgAGBQJPu+jwAAoJEKhOf7ml8uNsOw0P/0w1FqXD64a1laE43JIlBe9w
yHEcLHc9MXN+8lS0XQ6jFiL/VC3U5Sj7Ro+DFKcL2MWX//dfDcZcwA9ep/qh4tHV
tJ987IijdWqJV14pde3xQafhp/9i12rArLxns7S5fzkdfVk0iDjhZZaZy4afFJYM
SuCsDhCwWefZh89+oLikByiFPnhW+f2ZC9YQeokBM/XvZLtxmOiVfL6duloT/Cr+
58jkrJ8xz/5kmmN4bXM4Wlpf9ZIYFXbvtbKrq3GZOXc+LpNKlWQyFgg/pIuxBewC
uSgsNXXV0LFDi5JfER/8l9MMLtJwwc4VHzpLvMnRv+GtwO2/FKIIr9Fcv000IL2N
0/Ppr52M7XpRruM/k+YroUQ4F1oBX6HB4e3rwqC+XG6n5bwn/Jc7kdy7aUojqNLG
Nlr5f0vBjLTSF66Jnel71Bn+gbA1ogER7E+esSTMpyX+RgGJAUVt5oX9IjbXl3PI
bk8xW1csSRxBI2NkFOd9EM3vMzdGc5uu+iOoy7iBvcAK0AEfo2Ml9YuSVFQeqAu0
A96MUW155A+GKMC7I/LK8pTgMvYDedWhVW9uyXpMRjwdFC5/ywZU1aM00tL9HMpG
pzHOFJgsYrf/6VCV8BwqgudRYd0K5EPSGeITCg973os/XzJIOCfJuy+Pn5V/F0ew
lTbi8ipQD0Hh8A/Xt0QB
=Q2vo
-----END PGP SIGNATURE-----
Merge tag 'pm-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
- Implementation of opportunistic suspend (autosleep) and user space
interface for manipulating wakeup sources.
- Hibernate updates from Bojan Smojver and Minho Ban.
- Updates of the runtime PM core and generic PM domains framework
related to PM QoS.
- Assorted fixes.
* tag 'pm-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (25 commits)
epoll: Fix user space breakage related to EPOLLWAKEUP
PM / Domains: Make it possible to add devices to inactive domains
PM / Hibernate: Use get_gendisk to verify partition if resume_file is integer format
PM / Domains: Fix computation of maximum domain off time
PM / Domains: Fix link checking when add subdomain
PM / Sleep: User space wakeup sources garbage collector Kconfig option
PM / Sleep: Make the limit of user space wakeup sources configurable
PM / Documentation: suspend-and-cpuhotplug.txt: Fix typo
PM / Domains: Cache device stop and domain power off governor results, v3
PM / Domains: Make device removal more straightforward
PM / Sleep: Fix a mistake in a conditional in autosleep_store()
epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready
PM / QoS: Create device constraints objects on notifier registration
PM / Runtime: Remove device fields related to suspend time, v2
PM / Domains: Rework default domain power off governor function, v2
PM / Domains: Rework default device stop governor function, v2
PM / Sleep: Add user space interface for manipulating wakeup sources, v3
PM / Sleep: Add "prevent autosleep time" statistics to wakeup sources
PM / Sleep: Implement opportunistic sleep, v2
PM / Sleep: Add wakeup_source_activate and wakeup_source_deactivate tracepoints
...
This is the first big chunk for 3.5 merges of sound stuff.
There are a few big changes in different areas. First off, the
streaming logic of USB-audio endpoints has been largely rewritten
for the better support of "implicit feedback". If anything about USB
got broken, this change has to be checked.
For HD-audio, the resume procedure was changed; instead of delaying
the resume of the hardware until the first use, now waking up immediately
at resume. This is for buggy BIOS.
For ASoC, dynamic PCM support and the improved support for digital links
between off-SoC devices are major framework changes.
Some highlights are below:
* HD-audio
- Avoid the accesses of invalid pin-control bits that may stall the codec
- V-ref setup cleanups
- Fix the races in power-saving code
- Fix the races in codec cache hashes and connection lists
- Split some common codes for BIOS auto-parser to hda_auto_parser.c
- Changed the PM resume code to wake up immediately for buggy BIOS
- Creative SoundCore3D support
- Add Conexant CX20751/2/3/4 codec support
* ASoC
- Dynamic PCM support, allowing support for SoCs with internal routing
through components with tight sequencing and formatting constraints
within their internal paths or where there are multiple components
connected with CPU managed DMA controllers inside the SoC.
- Greatly improved support for direct digital links between off-SoC
devices, providing a much simpler way of connecting things like digital
basebands to CODECs.
- Much more fine grained and robust locking, cleaning up some of the
confusion that crept in with multi-component.
- CPU support for nVidia Tegra 30 I2S and audio hub controllers and
ST-Ericsson MSP I2S controolers
- New CODEC drivers for Cirrus CS42L52, LAPIS Semiconductor ML26124, Texas
Instruments LM49453.
- Some regmap changes needed by the Tegra I2S driver.
- mc13783 audio support.
* Misc
- Rewrite with module_pci_driver()
- Xonar DGX support for snd-oxygen
- Improvement of packet handling in snd-firewire driver
- New USB-endpoint streaming logic
- Enhanced M-audio FTU quirks and relevant cleanups
- Increment the support of OSS devices to 256
- snd-aloop accuracy improvement
There are a few more pending changes for 3.5, but they will be
sent slightly later as partly depending on the changes of DRM.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQIcBAABAgAGBQJPvD/9AAoJEGwxgFQ9KSmkPsIP/AuBGpAZy7b7FiEEIy1Hhdws
US8WVuPzyDslMVdzZ8OFqyPXanIcL9gscoOGMZOEy7UFtMBiR4GuYiPRPubEMxuP
/gopUqK4SqIsIwT238qqYszSJSxE7gNEZ/2jhSGtkX4EkaSZ4bAskn0iOKX5uw2f
kTUQknA1rNLIGba2z6rJbgIW7hdxGfpFy05ruv3ct81nO+5JlgyLuP/v5R6jL+do
cum0N4dJFRd9YSEi2BG612gdz8LJyzOgPqBKmxMEva6BfqLkR8EdP80FtE3eEOiP
Et1q2LhZwOlBt0BEjsjjOVxMsgxVax6ps9cuNRTk5ECEOldU5dbDatC45L/e9mSD
OQVUjYAX1mQAtYva4U4PPn6WU6ma2L5yjy4peCObtyCMkEchXk1bfs4CEfVqCXUP
yFYN8C+y6osZOyWE3+Enn9ifZdWyLeSVq6CT33Yt+fyKlswp6gRkhKYiEPqTA5aU
p71X59Pp7q1y3tQwiMJNpf2QdkxuxfKURHswdc4BS9ct0mdZhQX0GyDS7OffkTd4
Lq5UkVMHA1rLlF9oRPd2C9P4BuMEuvLjf662YCKiw+mWFYdBC036DHLLjm1Hcwuj
UkpQ2PSrrdHG1u0c3ooZ9dQj1BNX4LoABLqvaMtce6sESD/hJ5gcprYJWvtituwM
ZzZiJavIWsoJ+SWQWBHe
=+JSm
-----END PGP SIGNATURE-----
Merge tag 'sound-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"This is the first big chunk for 3.5 merges of sound stuff.
There are a few big changes in different areas. First off, the
streaming logic of USB-audio endpoints has been largely rewritten for
the better support of "implicit feedback". If anything about USB got
broken, this change has to be checked.
For HD-audio, the resume procedure was changed; instead of delaying
the resume of the hardware until the first use, now waking up
immediately at resume. This is for buggy BIOS.
For ASoC, dynamic PCM support and the improved support for digital
links between off-SoC devices are major framework changes.
Some highlights are below:
* HD-audio
- Avoid accesses of invalid pin-control bits that may stall the codec
- V-ref setup cleanups
- Fix the races in power-saving code
- Fix the races in codec cache hashes and connection lists
- Split some common codes for BIOS auto-parser to hda_auto_parser.c
- Changed the PM resume code to wake up immediately for buggy BIOS
- Creative SoundCore3D support
- Add Conexant CX20751/2/3/4 codec support
* ASoC
- Dynamic PCM support, allowing support for SoCs with internal
routing through components with tight sequencing and formatting
constraints within their internal paths or where there are multiple
components connected with CPU managed DMA controllers inside the
SoC.
- Greatly improved support for direct digital links between off-SoC
devices, providing a much simpler way of connecting things like
digital basebands to CODECs.
- Much more fine grained and robust locking, cleaning up some of the
confusion that crept in with multi-component.
- CPU support for nVidia Tegra 30 I2S and audio hub controllers and
ST-Ericsson MSP I2S controolers
- New CODEC drivers for Cirrus CS42L52, LAPIS Semiconductor ML26124,
Texas Instruments LM49453.
- Some regmap changes needed by the Tegra I2S driver.
- mc13783 audio support.
* Misc
- Rewrite with module_pci_driver()
- Xonar DGX support for snd-oxygen
- Improvement of packet handling in snd-firewire driver
- New USB-endpoint streaming logic
- Enhanced M-audio FTU quirks and relevant cleanups
- Increment the support of OSS devices to 256
- snd-aloop accuracy improvement
There are a few more pending changes for 3.5, but they will be sent
slightly later as partly depending on the changes of DRM."
Fix up conflicts in regmap (due to duplicate patches, with some further
updates then having already come in from the regmap tree). Also some
fairly trivial context conflicts in the imx and mcx soc drivers.
* tag 'sound-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (280 commits)
ALSA: snd-usb: fix stream info output in /proc
ALSA: pcm - Add proper state checks to snd_pcm_drain()
ALSA: sh: Fix up namespace collision in sh_dac_audio.
ALSA: hda/realtek - Fix unused variable compile warning
ASoC: sh: fsi: enable chip specific data transfer mode
ASoC: sh: fsi: call fsi_hw_startup/shutdown from fsi_dai_trigger()
ASoC: sh: fsi: use same format for IN/OUT
ASoC: sh: fsi: add fsi_version() and removed meaningless version check
ASoC: sh: fsi: use register field macro name on IN/OUT_DMAC
ASoC: tegra: Add machine driver for WM8753 codec
ALSA: hda - Fix possible races of accesses to connection list array
ASoC: OMAP: HDMI: Introduce codec
ARM: mx31_3ds: Add sound support
ASoC: imx-mc13783 cleanup
mx31moboard: Add sound support
ASoC: mc13783 codec cleanups
ASoC: add imx-mc13783 sound support
ASoC: Add mc13783 codec
mfd: mc13xxx: add codec platform data
ASoC: don't flip master of DT-instantiated DAI links
...
Pull trivial updates from Jiri Kosina:
"As usual, it's mostly typo fixes, redundant code elimination and some
documentation updates."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (57 commits)
edac, mips: don't change code that has been removed in edac/mips tree
xtensa: Change mail addresses of Hannes Weiner and Oskar Schirmer
lib: Change mail address of Oskar Schirmer
net: Change mail address of Oskar Schirmer
arm/m68k: Change mail address of Sebastian Hess
i2c: Change mail address of Oskar Schirmer
net: Fix tcp_build_and_update_options comment in struct tcp_sock
atomic64_32.h: fix parameter naming mismatch
Kconfig: replace "--- help ---" with "---help---"
c2port: fix bogus Kconfig "default no"
edac: Fix spelling errors.
qla1280: Remove redundant NULL check before release_firmware() call
remoteproc: remove redundant NULL check before release_firmware()
qla2xxx: Remove redundant NULL check before release_firmware() call.
aic94xx: Get rid of redundant NULL check before release_firmware() call
tehuti: delete redundant NULL check before release_firmware()
qlogic: get rid of a redundant test for NULL before call to release_firmware()
bna: remove redundant NULL test before release_firmware()
tg3: remove redundant NULL test before release_firmware() call
typhoon: get rid of redundant conditional before all to release_firmware()
...
If journal superblock is written only in disk's caches and other transaction
starts reusing space of the transaction cleaned from the log, it can happen
blocks of a new transaction reach the disk before journal superblock. When
power failure happens in such case, subsequent journal replay would still try
to replay the old transaction but some of it's blocks may be already
overwritten by the new transaction. For this reason we must use WRITE_FUA when
updating log tail and we must first write new log tail to disk and update
in-memory information only after that.
Signed-off-by: Jan Kara <jack@suse.cz>
There are three case of updating journal superblock. In the first case, we want
to mark journal as empty (setting s_sequence to 0), in the second case we want
to update log tail, in the third case we want to update s_errno. Split these
cases into separate functions. It makes the code slightly more straightforward
and later patches will make the distinction even more important.
Signed-off-by: Jan Kara <jack@suse.cz>
The current RCU_FAST_NO_HZ assumes that timers do not migrate unless a
CPU goes offline, in which case it assumes that the CPU will have to come
out of dyntick-idle mode (cancelling the timer) in order to go offline.
This is important because when RCU_FAST_NO_HZ permits a CPU to enter
dyntick-idle mode despite having RCU callbacks pending, it posts a timer
on that CPU to force a wakeup on that CPU. This wakeup ensures that the
CPU will eventually handle the end of the grace period, including invoking
its RCU callbacks.
However, Pascal Chapperon's test setup shows that the timer handler
rcu_idle_gp_timer_func() really does get invoked in some cases. This is
problematic because this can cause the CPU that entered dyntick-idle
mode despite still having RCU callbacks pending to remain in
dyntick-idle mode indefinitely, which means that its RCU callbacks might
never be invoked. This situation can result in grace-period delays or
even system hangs, which matches Pascal's observations of slow boot-up
and shutdown (https://lkml.org/lkml/2012/4/5/142). See also the bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=806548
This commit therefore causes the "should never be invoked" timer handler
rcu_idle_gp_timer_func() to use smp_call_function_single() to wake up
the CPU for which the timer was intended, allowing that CPU to invoke
its RCU callbacks in a timely manner.
Reported-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
When writeback_single_inode() is called on inode which has I_SYNC already
set while doing WB_SYNC_NONE, inode is moved to b_more_io list. However
this makes sense only if the caller is flusher thread. For other callers of
writeback_single_inode() it doesn't really make sense and may be even wrong
- flusher thread may be doing WB_SYNC_ALL writeback in parallel.
So we move requeueing from writeback_single_inode() to writeback_sb_inodes().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Add tracepoints to wakeup_source_activate and wakeup_source_deactivate.
Useful for checking that specific wakeup sources overlap as expected.
Signed-off-by: Arve Hjønnevåg <arve@android.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Traces of rcu_prep_idle events can be confusing because
rcu_cleanup_after_idle() does no tracing. This commit therefore adds
this tracing.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Fixes the following build warning on x86_64.
In file included from include/trace/ftrace.h:567:0,
from include/trace/define_trace.h:86,
from include/trace/events/asoc.h:410,
from sound/soc/soc-core.c:45:
include/trace/events/asoc.h: In function 'ftrace_raw_event_snd_soc_dapm_output_path':
include/trace/events/asoc.h:246:1: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
include/trace/events/asoc.h: In function 'ftrace_raw_event_snd_soc_dapm_input_path':
include/trace/events/asoc.h:275:1: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
Signed-off-by: Liam Girdwood <lrg@ti.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
In preparation for ASoC DSP support.
Add a DAPM API call to determine whether a DAPM audio path is valid between
source and sink widgets. This also takes into account all kcontrol mux and mixer
settings in between the source and sink widgets to validate the audio path.
This will be used by the DSP core to determine the runtime DAI mappings
between FE and BE DAIs in order to run PCM operations.
Signed-off-by: Liam Girdwood <lrg@ti.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Currently we write out all journal buffers in WRITE_SYNC mode. This improves
performance for fsync heavy workloads but hinders performance when writes
are mostly asynchronous, most noticably it slows down readers and users
complain about slow desktop response etc.
So submit writes as asynchronous in the normal case and only submit writes as
WRITE_SYNC if we detect someone is waiting for current transaction commit.
I've gathered some numbers to back this change. The first is the read latency
test. It measures time to read 1 MB after several seconds of sleeping in
presence of streaming writes.
Top 10 times (out of 90) in us:
Before After
2131586 697473
1709932 557487
1564598 535642
1480462 347573
1478579 323153
1408496 222181
1388960 181273
1329565 181070
1252486 172832
1223265 172278
Average:
619377 82180
So the improvement in both maximum and average latency is massive.
I've measured fsync throughput by:
fs_mark -n 100 -t 1 -s 16384 -d /mnt/fsync/ -S 1 -L 4
in presence of streaming reader. The numbers (fsyncs/s) are:
Before After
9.9 6.3
6.8 6.0
6.3 6.2
5.8 6.1
So fsync performance seems unharmed by this change.
Signed-off-by: Jan Kara <jack@suse.cz>
workqueue_execute_end() is called after the callback function,
not before.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
1. TRACE_EVENT(sched_process_exec) forgets to actually use the
old pid argument, it sets ->old_pid = p->pid.
2. search_binary_handler() uses the wrong pid number. tracepoint
needs the global pid_t from the root namespace, while old_pid
is the virtual pid number as it seen by the tracer/parent.
With this patch we have two pid_t's in search_binary_handler(),
not really nice. Perhaps we should switch to "struct pid*", but
in this case it would be better to cleanup the current code
first and move the "depth == 0" code outside.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: David Smith <dsmith@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Link: http://lkml.kernel.org/r/20120330162636.GA4857@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull btrfs fixes and features from Chris Mason:
"We've merged in the error handling patches from SuSE. These are
already shipping in the sles kernel, and they give btrfs the ability
to abort transactions and go readonly on errors. It involves a lot of
churn as they clarify BUG_ONs, and remove the ones we now properly
deal with.
Josef reworked the way our metadata interacts with the page cache.
page->private now points to the btrfs extent_buffer object, which
makes everything faster. He changed it so we write an whole extent
buffer at a time instead of allowing individual pages to go down,,
which will be important for the raid5/6 code (for the 3.5 merge
window ;)
Josef also made us more aggressive about dropping pages for metadata
blocks that were freed due to COW. Overall, our metadata caching is
much faster now.
We've integrated my patch for metadata bigger than the page size.
This allows metadata blocks up to 64KB in size. In practice 16K and
32K seem to work best. For workloads with lots of metadata, this cuts
down the size of the extent allocation tree dramatically and fragments
much less.
Scrub was updated to support the larger block sizes, which ended up
being a fairly large change (thanks Stefan Behrens).
We also have an assortment of fixes and updates, especially to the
balancing code (Ilya Dryomov), the back ref walker (Jan Schmidt) and
the defragging code (Liu Bo)."
Fixed up trivial conflicts in fs/btrfs/scrub.c that were just due to
removal of the second argument to k[un]map_atomic() in commit
7ac687d9e0.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (75 commits)
Btrfs: update the checks for mixed block groups with big metadata blocks
Btrfs: update to the right index of defragment
Btrfs: do not bother to defrag an extent if it is a big real extent
Btrfs: add a check to decide if we should defrag the range
Btrfs: fix recursive defragment with autodefrag option
Btrfs: fix the mismatch of page->mapping
Btrfs: fix race between direct io and autodefrag
Btrfs: fix deadlock during allocating chunks
Btrfs: show useful info in space reservation tracepoint
Btrfs: don't use crc items bigger than 4KB
Btrfs: flush out and clean up any block device pages during mount
btrfs: disallow unequal data/metadata blocksize for mixed block groups
Btrfs: enhance superblock sanity checks
Btrfs: change scrub to support big blocks
Btrfs: minor cleanup in scrub
Btrfs: introduce common define for max number of mirrors
Btrfs: fix infinite loop in btrfs_shrink_device()
Btrfs: fix memory leak in resolver code
Btrfs: allow dup for data chunks in mixed mode
Btrfs: validate target profiles only if we are going to use them
...
The changes to export dirty_writeback_interval are from Artem's s_dirt
cleanup patch series. The same is true of the change to remove the
s_dirt helper functions which never got used by anyone in-tree. I've
run these changes by Al Viro, and am carrying them so that Artem can
more easily fix up the rest of the file systems during the next merge
window. (Originally we had hopped to remove the use of s_dirt from
ext4 during this merge window, but his patches had some bugs, so I
ultimately ended dropping them from the ext4 tree.)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABCAAGBQJPb39rAAoJENNvdpvBGATwVz8P/3V1NqSsk20VJOLbmEE45GxL
GDzQJ6OsFG0UiQk6ISSrSdwxfav/KTCGySsU9UtAoOdPcBwnnsf8S7wc6OggwwuC
hBFGwwFzk6YSQaZ58sUxWRGeOJuP/FPem6Id6buC4DQ1KIcznP/hEEgEnh/ir4Ec
vrsfexY93TR8BE2Mi23v2epDVLU0B6bY/w9nDqbTXif3xN/gh/ypoHHouuM6Bs2n
TyWHOwD15NwfnvRHd8PfDDqQM/D29x3QI0FMrWj9McpwIz4d4cBfhN4LQ/G+yLDY
izv5DM10GbinwHPrsOTGVAW3KIdSS9rP3jCJGVuOrJZ9ufGXosvHuIYVhI7J3SBK
JhBu6QEsN1IsvlVYpz9q8mqVKaDXQLsz2eaTw+i4yfmyOk1kOX7nIEOxYFF78G+V
Of/W1SpIpJQaXvLHRcDj9fDj0fZTciUZA8v7/HOFS+co2dzIl0iZbcfBFp0/56RY
sWdQoeRlx1ciVDPR+w2TQO5w3VWQw1gT5aqux0NiPj0XFoiUHScxgNGAYbqENMQw
v9chvyDMlorqj0rF/Vey5SssgEDi7MTdYuYTi4YyMqr7pcvOJaO85pf+wH9g2eKW
XhW33PhPGuwCJDP5Pg8Y0Z2Hp/Q3DCqhLqhGfTyAs/NG9+hR4wgp3VWb8CUqhA1t
C/yzNeOYqScAefCzQx2V
=+9zk
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates for 3.4 from Ted Ts'o:
"Ext4 commits for 3.3 merge window; mostly cleanups and bug fixes
The changes to export dirty_writeback_interval are from Artem's s_dirt
cleanup patch series. The same is true of the change to remove the
s_dirt helper functions which never got used by anyone in-tree. I've
run these changes by Al Viro, and am carrying them so that Artem can
more easily fix up the rest of the file systems during the next merge
window. (Originally we had hopped to remove the use of s_dirt from
ext4 during this merge window, but his patches had some bugs, so I
ultimately ended dropping them from the ext4 tree.)"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (66 commits)
vfs: remove unused superblock helpers
mm: export dirty_writeback_interval
ext4: remove useless s_dirt assignment
ext4: write superblock only once on unmount
ext4: do not mark superblock as dirty unnecessarily
ext4: correct ext4_punch_hole return codes
ext4: remove restrictive checks for EOFBLOCKS_FL
ext4: always set then trimmed blocks count into len
ext4: fix trimmed block count accunting
ext4: fix start and len arguments handling in ext4_trim_fs()
ext4: update s_free_{inodes,blocks}_count during online resize
ext4: change some printk() calls to use ext4_msg() instead
ext4: avoid output message interleaving in ext4_error_<foo>()
ext4: remove trailing newlines from ext4_msg() and ext4_error() messages
ext4: add no_printk argument validation, fix fallout
ext4: remove redundant "EXT4-fs: " from uses of ext4_msg
ext4: give more helpful error message in ext4_ext_rm_leaf()
ext4: remove unused code from ext4_ext_map_blocks()
ext4: rewrite punch hole to use ext4_ext_remove_space()
jbd2: cleanup journal tail after transaction commit
...
"[RFC PATCH 0/2] audit of linux/device.h users in include/*"
https://lkml.org/lkml/2012/3/4/159
--
Nearly every subsystem has some kind of header with a proto like:
void foo(struct device *dev);
and yet there is no reason for most of these guys to care about the
sub fields within the device struct. This allows us to significantly
reduce the scope of headers including headers. For this instance, a
reduction of about 40% is achieved by replacing the include with the
simple fact that the device is some kind of a struct.
Unlike the much larger module.h cleanup, this one is simply two
commits. One to fix the implicit <linux/device.h> users, and then
one to delete the device.h includes from the linux/include/ dir
wherever possible.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJPbNxLAAoJEOvOhAQsB9HWR6QQAMRUZ94O2069/nW9h4TO/xTr
Hq/80lo/TBBiRmob3iWBP76lzgeeMPPVEX1I6N7YYlhL3IL7HsaJH1DvpIPPHXQP
GFKcBsZ5ZLV8c4CBDSr+/HFNdhXc0bw0awBjBvR7gAsWuZpNFn4WbhizJi4vWAoE
4ydhPu55G1G8TkBtYLJQ8xavxsmiNBSDhd2i+0vn6EVpgmXynjOMG8qXyaS97Jvg
pZLwnN5Wu21coj6+xH3QUKCl1mJ+KGyamWX5gFBVIfsDB3k5H4neijVm7t1en4b0
cWxmXeR/JE3VLEl/17yN2dodD8qw1QzmTWzz1vmwJl2zK+rRRAByBrL0DP7QCwCZ
ppeJbdhkMBwqjtknwrmMwsuAzUdJd79GXA+6Vm+xSEkr6FEPK1M0kGbvaqV9Usgd
ohMewewbO6ddgR9eF7Kw2FAwo0hwkPNEplXIym9rZzFG1h+T0STGSHvkn7LV765E
ul1FapSV3GCxEVRwWTwD28FLU2+0zlkOZ5sxXwNPTT96cNmW+R7TGuslZKNaMNjX
q7eBZxo8DtVt/jqJTntR8bs8052c8g1Ac1IKmlW8VSmFwT1M6VBGRn1/JWAhuUgv
dBK/FF+I1GJTAJWIhaFcKXLHvmV9uhS6JaIhLMDOetoOkpqSptJ42hDG+89WkFRk
o55GQ5TFdoOpqxVzGbvE
=3j4+
-----END PGP SIGNATURE-----
Merge tag 'device-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
Pull <linux/device.h> avoidance patches from Paul Gortmaker:
"Nearly every subsystem has some kind of header with a proto like:
void foo(struct device *dev);
and yet there is no reason for most of these guys to care about the
sub fields within the device struct. This allows us to significantly
reduce the scope of headers including headers. For this instance, a
reduction of about 40% is achieved by replacing the include with the
simple fact that the device is some kind of a struct.
Unlike the much larger module.h cleanup, this one is simply two
commits. One to fix the implicit <linux/device.h> users, and then one
to delete the device.h includes from the linux/include/ dir wherever
possible."
* tag 'device-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
device.h: audit and cleanup users in main include dir
device.h: cleanup users outside of linux/include (C files)
New features include:
- Add NFS client support for containers.
This should enable most of the necessary functionality, including
lockd support, and support for rpc.statd, NFSv4 idmapper and
RPCSEC_GSS upcalls into the correct network namespace from
which the mount system call was issued.
- NFSv4 idmapper scalability improvements
Base the idmapper cache on the keyring interface to allow concurrent
access to idmapper entries. Start the process of migrating users from
the single-threaded daemon-based approach to the multi-threaded
request-key based approach.
- NFSv4.1 implementation id.
Allows the NFSv4.1 client and server to mutually identify each other
for logging and debugging purposes.
- Support the 'vers=4.1' mount option for mounting NFSv4.1 instead of
having to use the more counterintuitive 'vers=4,minorversion=1'.
- SUNRPC tracepoints.
Start the process of adding tracepoints in order to improve debugging
of the RPC layer.
- pNFS object layout support for autologin.
Important bugfixes include:
- Fix a bug in rpc_wake_up/rpc_wake_up_status that caused them to fail
to wake up all tasks when applied to priority waitqueues.
- Ensure that we handle read delegations correctly, when we try to
truncate a file.
- A number of fixes for NFSv4 state manager loops (mostly to do with
delegation recovery).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJPalZbAAoJEGcL54qWCgDyCi4P+QHcmzQhJO7HWx3Pzjs67bFT
xMSYaKHGWS4AJKUBVl5OKBxUExfrMHBNbElV3IKUIwBlDx8RVtnwfptKSe146iki
dn4TrRO5es8nmI4hRDcGMlzJDZq4y0Qg//qiUFmojiNW/Avw0ljfMoVUejJJ09FV
oeDk4EGtcxkEyH+g48ZjYbyspRnG8qtD3atf70Z3lYE0ELdG/B5Dyzw1RDrA5p73
xJX3lqy8p/4ROzw/dmNoxdAXOrr3Q4/T58Bvp/lUglPy/EHyPmWzFoH0MU0C/PFu
5VnAl6QDbNCTcIw9FvJlX/mIyErpNG9eKzUskUc9L9SA+B+J/i4rIap4KATRN3nH
7QhE5qUacPuJnvxml7MPmlQTuft3fkAQ7NhKIWrbRi1QS9FmJC5NxctIb8loqlFn
yIXdKeLfMshB+NyuFS9uzStX7SmV3eMgVd+5ZxRjYxm+PKJLw2KXeudArL6M5mHK
3QeKZpqwaYQ3RfaTNpvAp0doiXHCO5UbWfI0Pe8xQs/QcMCNReffqV2G4IJKFAu6
WpoN2UDQC9LCBifLw2nS7kku8+ZVXLQU8OC1NVl3TG15xD9cNLXuk3/y5llPGq4O
odo52uLFpJohbDaHMj5RTKOfchTQCm2iyuVmxZEeAySypMSiAXmW7COSKHs/HxI1
VBm+EI00Pvmm5+fUjIlp
=LuHE
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates for Linux 3.4 from Trond Myklebust:
"New features include:
- Add NFS client support for containers.
This should enable most of the necessary functionality, including
lockd support, and support for rpc.statd, NFSv4 idmapper and
RPCSEC_GSS upcalls into the correct network namespace from which
the mount system call was issued.
- NFSv4 idmapper scalability improvements
Base the idmapper cache on the keyring interface to allow
concurrent access to idmapper entries. Start the process of
migrating users from the single-threaded daemon-based approach to
the multi-threaded request-key based approach.
- NFSv4.1 implementation id.
Allows the NFSv4.1 client and server to mutually identify each
other for logging and debugging purposes.
- Support the 'vers=4.1' mount option for mounting NFSv4.1 instead of
having to use the more counterintuitive 'vers=4,minorversion=1'.
- SUNRPC tracepoints.
Start the process of adding tracepoints in order to improve
debugging of the RPC layer.
- pNFS object layout support for autologin.
Important bugfixes include:
- Fix a bug in rpc_wake_up/rpc_wake_up_status that caused them to
fail to wake up all tasks when applied to priority waitqueues.
- Ensure that we handle read delegations correctly, when we try to
truncate a file.
- A number of fixes for NFSv4 state manager loops (mostly to do with
delegation recovery)."
* tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (224 commits)
NFS: fix sb->s_id in nfs debug prints
xprtrdma: Remove assumption that each segment is <= PAGE_SIZE
xprtrdma: The transport should not bug-check when a dup reply is received
pnfs-obj: autologin: Add support for protocol autologin
NFS: Remove nfs4_setup_sequence from generic rename code
NFS: Remove nfs4_setup_sequence from generic unlink code
NFS: Remove nfs4_setup_sequence from generic read code
NFS: Remove nfs4_setup_sequence from generic write code
NFS: Fix more NFS debug related build warnings
SUNRPC/LOCKD: Fix build warnings when CONFIG_SUNRPC_DEBUG is undefined
nfs: non void functions must return a value
SUNRPC: Kill compiler warning when RPC_DEBUG is unset
SUNRPC/NFS: Add Kbuild dependencies for NFS_DEBUG/RPC_DEBUG
NFS: Use cond_resched_lock() to reduce latencies in the commit scans
NFSv4: It is not safe to dereference lsp->ls_state in release_lockowner
NFS: ncommit count is being double decremented
SUNRPC: We must not use list_for_each_entry_safe() in rpc_wake_up()
Try using machine credentials for RENEW calls
NFSv4.1: Fix a few issues in filelayout_commit_pagelist
NFSv4.1: Clean ups and bugfixes for the pNFS read/writeback/commit code
...
seeing a trickle of new features coming in they're getting much smaller
than they were. It's also nice to have some features which support
other subsystems building infrastructure on top of regmap. Highlights
include:
- Support for padding between the register and the value when
interacting with the device, sometimes needed for fast interfaces.
- Support for applying register updates to the device when restoring the
register state. This is intended to be used to apply updates supplied by
manufacturers for tuning the performance of the device (many of which
are to undocumented registers which aren't otherwise covered).
- Support for multi-register operations on cached registers.
- Support for syncing only part of the register cache.
- Stubs and parameter query functions intended to make it easier for other
subsystems to build infrastructure on top of the regmap API.
plus a few driver updates making use of the new features which it was
easier to merge via this tree.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJPYJuOAAoJEBus8iNuMP3dauMP/1mYgILz0lpRHjGmUF86vQre
AcualwUE4UY/WacyUkke72kxa9jcznwzbFjKKNSvL3rLnNy+QPY8Z9v6zBDL90or
D9Ok8nRVRldIIDlDE708b10AP9sDSB25ra9IVVPzOEX/0NKoE+Y7ZkXcn0s3zGgI
Y+bLwd1uufFopMpV3m5gXipi1/+PEK+jO7q6vgdUp3C1TcMzOqSyCg+uuHWffHGp
iO/1XzdxNGx9BTDO/XDEqxUMRnjsQg/VS9JN3CMz8gXwxXD3zrWB/9+SMIfDb5Iy
/iXqc58uJ6PTY87t5q9TEGyRKo0Xj7NEPnW4isXg/3r0UUb8kls3frXKigtLEUb7
wnwQD/GCRvXOTbC6TUkFDiZ3OX1qLmnk8YMQ6xhQlbNGM7jJfzj/fFiwBdre58BC
iKPdF9gfL/gyH5yefySau/YeYqJUbVLzdOAfYVDkjApmQJv67CrPPd96xAsEsTFU
YojkF9NcapBnk6Vs4adzjxD1YCTThaXnFtUSu/bBNZu1xNFD12TORl5fs0OedUe8
zvPMZEEKrE5CxHhQNB6j2Z0zajNOgsh183mNSr2VJK1vI4o4pY7MBENYYPzFiPB4
BfX8KFftxu8O50OVZnweZ80LKVZ9fAo57oWlgR8lfaEbetjY0WdRYOyDT8w5jrtW
nU+mtlQLc5SmugTs+CiD
=4Eo9
-----END PGP SIGNATURE-----
Merge tag 'regmap-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"Things are really quieting down with the regmap API, while we're still
seeing a trickle of new features coming in they're getting much
smaller than they were. It's also nice to have some features which
support other subsystems building infrastructure on top of regmap.
Highlights include:
- Support for padding between the register and the value when
interacting with the device, sometimes needed for fast interfaces.
- Support for applying register updates to the device when restoring
the register state. This is intended to be used to apply updates
supplied by manufacturers for tuning the performance of the device
(many of which are to undocumented registers which aren't otherwise
covered).
- Support for multi-register operations on cached registers.
- Support for syncing only part of the register cache.
- Stubs and parameter query functions intended to make it easier for
other subsystems to build infrastructure on top of the regmap API.
plus a few driver updates making use of the new features which it was
easier to merge via this tree."
* tag 'regmap-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: (41 commits)
regmap: Fix future missing prototype of devres_alloc() and friends
regmap: Rejig struct declarations for stubbed API
regmap: Fix rbtree block base in sync
regcache: Make sure we sync register 0 in an rbtree cache
regmap: delete unused module.h from drivers/base/regmap files
regmap: Add stub for regcache_sync_region()
mfd: Improve performance of later WM1811 revisions
regmap: Fix x86_64 breakage
regmap: Allow drivers to sync only part of the register cache
regmap: Supply ranges to the sync operations
regmap: Add tracepoints for cache only and cache bypass
regmap: Mark the cache as clean after a successful sync
regmap: Remove default cache sync implementation
regmap: Skip hardware defaults for LZO caches
regmap: Expose the driver name in debugfs
mfd: wm8400: Convert to devm_regmap_init_i2c()
mfd: wm831x: Convert to devm_regmap_init()
mfd: wm8994: Convert to devm_regmap_init()
mfd/ASoC: Convert WM8994 driver to use regmap patches
mfd: Add __devinit and __devexit annotations in wm8994
...
Pull trivial tree from Jiri Kosina:
"It's indeed trivial -- mostly documentation updates and a bunch of
typo fixes from Masanari.
There are also several linux/version.h include removals from Jesper."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (101 commits)
kcore: fix spelling in read_kcore() comment
constify struct pci_dev * in obvious cases
Revert "char: Fix typo in viotape.c"
init: fix wording error in mm_init comment
usb: gadget: Kconfig: fix typo for 'different'
Revert "power, max8998: Include linux/module.h just once in drivers/power/max8998_charger.c"
writeback: fix fn name in writeback_inodes_sb_nr_if_idle() comment header
writeback: fix typo in the writeback_control comment
Documentation: Fix multiple typo in Documentation
tpm_tis: fix tis_lock with respect to RCU
Revert "media: Fix typo in mixer_drv.c and hdmi_drv.c"
Doc: Update numastat.txt
qla4xxx: Add missing spaces to error messages
compiler.h: Fix typo
security: struct security_operations kerneldoc fix
Documentation: broken URL in libata.tmpl
Documentation: broken URL in filesystems.tmpl
mtd: simplify return logic in do_map_probe()
mm: fix comment typo of truncate_inode_pages_range
power: bq27x00: Fix typos in comment
...
Pull perf events changes for v3.4 from Ingo Molnar:
- New "hardware based branch profiling" feature both on the kernel and
the tooling side, on CPUs that support it. (modern x86 Intel CPUs
with the 'LBR' hardware feature currently.)
This new feature is basically a sophisticated 'magnifying glass' for
branch execution - something that is pretty difficult to extract from
regular, function histogram centric profiles.
The simplest mode is activated via 'perf record -b', and the result
looks like this in perf report:
$ perf record -b any_call,u -e cycles:u branchy
$ perf report -b --sort=symbol
52.34% [.] main [.] f1
24.04% [.] f1 [.] f3
23.60% [.] f1 [.] f2
0.01% [k] _IO_new_file_xsputn [k] _IO_file_overflow
0.01% [k] _IO_vfprintf_internal [k] _IO_new_file_xsputn
0.01% [k] _IO_vfprintf_internal [k] strchrnul
0.01% [k] __printf [k] _IO_vfprintf_internal
0.01% [k] main [k] __printf
This output shows from/to branch columns and shows the highest
percentage (from,to) jump combinations - i.e. the most likely taken
branches in the system. "branches" can also include function calls
and any other synchronous and asynchronous transitions of the
instruction pointer that are not 'next instruction' - such as system
calls, traps, interrupts, etc.
This feature comes with (hopefully intuitive) flat ascii and TUI
support in perf report.
- Various 'perf annotate' visual improvements for us assembly junkies.
It will now recognize function calls in the TUI and by hitting enter
you can follow the call (recursively) and back, amongst other
improvements.
- Multiple threads/processes recording support in perf record, perf
stat, perf top - which is activated via a comma-list of PIDs:
perf top -p 21483,21485
perf stat -p 21483,21485 -ddd
perf record -p 21483,21485
- Support for per UID views, via the --uid paramter to perf top, perf
report, etc. For example 'perf top --uid mingo' will only show the
tasks that I am running, excluding other users, root, etc.
- Jump label restructurings and improvements - this includes the
factoring out of the (hopefully much clearer) include/linux/static_key.h
generic facility:
struct static_key key = STATIC_KEY_INIT_FALSE;
...
if (static_key_false(&key))
do unlikely code
else
do likely code
...
static_key_slow_inc();
...
static_key_slow_inc();
...
The static_key_false() branch will be generated into the code with as
little impact to the likely code path as possible. the
static_key_slow_*() APIs flip the branch via live kernel code patching.
This facility can now be used more widely within the kernel to
micro-optimize hot branches whose likelihood matches the static-key
usage and fast/slow cost patterns.
- SW function tracer improvements: perf support and filtering support.
- Various hardenings of the perf.data ABI, to make older perf.data's
smoother on newer tool versions, to make new features integrate more
smoothly, to support cross-endian recording/analyzing workflows
better, etc.
- Restructuring of the kprobes code, the splitting out of 'optprobes',
and a corner case bugfix.
- Allow the tracing of kernel console output (printk).
- Improvements/fixes to user-space RDPMC support, allowing user-space
self-profiling code to extract PMU counts without performing any
system calls, while playing nice with the kernel side.
- 'perf bench' improvements
- ... and lots of internal restructurings, cleanups and fixes that made
these features possible. And, as usual this list is incomplete as
there were also lots of other improvements
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (120 commits)
perf report: Fix annotate double quit issue in branch view mode
perf report: Remove duplicate annotate choice in branch view mode
perf/x86: Prettify pmu config literals
perf report: Enable TUI in branch view mode
perf report: Auto-detect branch stack sampling mode
perf record: Add HEADER_BRANCH_STACK tag
perf record: Provide default branch stack sampling mode option
perf tools: Make perf able to read files from older ABIs
perf tools: Fix ABI compatibility bug in print_event_desc()
perf tools: Enable reading of perf.data files from different ABI rev
perf: Add ABI reference sizes
perf report: Add support for taken branch sampling
perf record: Add support for sampling taken branch
perf tools: Add code to support PERF_SAMPLE_BRANCH_STACK
x86/kprobes: Split out optprobe related code to kprobes-opt.c
x86/kprobes: Fix a bug which can modify kernel code permanently
x86/kprobes: Fix instruction recovery on optimized path
perf: Add callback to flush branch_stack on context switch
perf: Disable PERF_SAMPLE_BRANCH_* when not supported
perf/x86: Add LBR software filter support for Intel CPUs
...
The <linux/device.h> header includes a lot of stuff, and
it in turn gets a lot of use just for the basic "struct device"
which appears so often.
Clean up the users as follows:
1) For those headers only needing "struct device" as a pointer
in fcn args, replace the include with exactly that.
2) For headers not really using anything from device.h, simply
delete the include altogether.
3) For headers relying on getting device.h implicitly before
being included themselves, now explicitly include device.h
4) For files in which doing #1 or #2 uncovers an implicit
dependency on some other header, fix by explicitly adding
the required header(s).
Any C files that were implicitly relying on device.h to be
present have already been dealt with in advance.
Total removals from #1 and #2: 51. Total additions coming
from #3: 9. Total other implicit dependencies from #4: 7.
As of 3.3-rc1, there were 110, so a net removal of 42 gives
about a 38% reduction in device.h presence in include/*
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
When we reach jbd2_cleanup_journal_tail(), there is no guarantee that
checkpointed buffers are on a stable storage - especially if buffers were
written out by jbd2_log_do_checkpoint(), they are likely to be only in disk's
caches. Thus when we update journal superblock effectively removing old
transaction from journal, this write of superblock can get to stable storage
before those checkpointed buffers which can result in filesystem corruption
after a crash. Thus we must unconditionally issue a cache flush before we
update journal superblock in these cases.
A similar problem can also occur if journal superblock is written only in
disk's caches, other transaction starts reusing space of the transaction
cleaned from the log and power failure happens. Subsequent journal replay would
still try to replay the old transaction but some of it's blocks may be already
overwritten by the new transaction. For this reason we must use WRITE_FUA when
updating log tail and we must first write new log tail to disk and update
in-memory information only after that.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
There are three case of updating journal superblock. In the first case, we want
to mark journal as empty (setting s_sequence to 0), in the second case we want
to update log tail, in the third case we want to update s_errno. Split these
cases into separate functions. It makes the code slightly more straightforward
and later patches will make the distinction even more important.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The major features of this series are:
- making RCU more aggressive about entering dyntick-idle mode in order to
improve energy efficiency
- converting a few more call_rcu()s to kfree_rcu()s
- applying a number of rcutree fixes and cleanups to rcutiny
- removing CONFIG_SMP #ifdefs from treercu
- allowing RCU CPU stall times to be set via sysfs
- adding CPU-stall capability to rcutorture
- adding more RCU-abuse diagnostics
- updating documentation
- fixing yet more issues located by the still-ongoing top-to-bottom
inspection of RCU, this time with a special focus on the
CPU-hotplug code path.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Added a minimal exec tracepoint. Exec is an important major event
in the life of a task, like fork(), clone() or exit(), all of
which we already trace.
[ We also do scheduling re-balancing during exec() - so it's useful
from a scheduler instrumentation POV as well. ]
If you want to watch a task start up, when it gets exec'ed is a good place
to start. With the addition of this tracepoint, exec's can be monitored
and better picture of general system activity can be obtained. This
tracepoint will also enable better process life tracking, allowing you to
answer questions like "what process keeps starting up binary X?".
This tracepoint can also be useful in ftrace filtering and trigger
conditions: i.e. starting or stopping filtering when exec is called.
Signed-off-by: David Smith <dsmith@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/4F314D19.7030504@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit 1ac9bc69 ("sched/tracing: Add a new tracepoint for sleeptime")
added a new sched:sched_stat_sleeptime tracepoint.
It's broken: the first sample we get on a task might be bad because
of a stale sleep_start value that wasn't reset at the last task switch
because the tracepoint was not active.
It also breaks the existing schedstat samples due to the side
effects of:
- se->statistics.sleep_start = 0;
...
- se->statistics.block_start = 0;
Nor do I see means to fix it without adding overhead to the scheduler
fast path, which I'm not willing to for the sake of redundant
instrumentation.
Most importantly, sleep time information can already be constructed
by tracing context switches and wakeups, and taking the timestamp
difference between the schedule-out, the wakeup and the schedule-in.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-pc4c9qhl8q6vg3bs4j6k0rbd@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When CONFIG_RCU_FAST_NO_HZ is enabled, RCU will allow a given CPU to
enter dyntick-idle mode even if it still has RCU callbacks queued.
RCU avoids system hangs in this case by scheduling a timer for several
jiffies in the future. However, if all of the callbacks on that CPU
are from kfree_rcu(), there is no reason to wake the CPU up, as it is
not a problem to defer freeing of memory.
This commit therefore tracks the number of callbacks on a given CPU
that are from kfree_rcu(), and avoids scheduling the timer if all of
a given CPU's callbacks are from kfree_rcu().
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This patch adds trace_jbd2_drop_transaction and
trace_jbd2_update_superblock_end because there are similar tracepoints
in jbd and they are needed in jbd2 as well.
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Add a printk.console trace point to record any printk
messages into the trace, regardless of the current
console loglevel. This can help correlate (existing)
printk debugging with other tracing.
Link: http://lkml.kernel.org/r/1322161388.5366.54.camel@jlt3.sipsolutions.net
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The power and cpuidle tracepoints are called within a rcu_idle_exit()
section, and must be denoted with the _rcuidle() version of the tracepoint.
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This patch adds three trace points to the status routines
in the sunrpc state machine.
The goal of these trace points is to give an Admin
the ability to check on binding status or connection
status to see if there is a potential problem.
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The reporting of the RPC queue name needs to use the __string()
event interface.
Reported-by: Neil Horman <nhorman@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When a SD card is hot removed without umount, del_gendisk() will call
bdi_unregister() without destroying/freeing it. This leaves the bdi in
the bdi->dev = NULL, bdi->wb.task = NULL, bdi->bdi_list removed state.
When sync(2) gets the bdi before bdi_unregister() and calls
bdi_queue_work() after the unregister, trace_writeback_queue will be
dereferencing the NULL bdi->dev. Fix it with a simple test for NULL.
LKML-reference: http://lkml.org/lkml/2012/1/18/346
Cc: stable@kernel.org
Reported-by: Rabin Vincent <rabin@rab.in>
Tested-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
The code is not going to be removed, so remove the comment stating
that it will be.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
bdi_prune_sb() resets sb->s_bdi to default_backing_dev_info when the
tearing down the original bdi. Fix trace_writeback_single_inode to
use sb->s_bdi=default_backing_dev_info rather than bdi->dev=NULL for a
teared down bdi.
Cc: <stable@kernel.org>
Reported-by: Rabin Vincent <rabin@rab.in>
Tested-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (62 commits)
Btrfs: use larger system chunks
Btrfs: add a delalloc mutex to inodes for delalloc reservations
Btrfs: space leak tracepoints
Btrfs: protect orphan block rsv with spin_lock
Btrfs: add allocator tracepoints
Btrfs: don't call btrfs_throttle in file write
Btrfs: release space on error in page_mkwrite
Btrfs: fix btrfsck error 400 when truncating a compressed
Btrfs: do not use btrfs_end_transaction_throttle everywhere
Btrfs: add balance progress reporting
Btrfs: allow for resuming restriper after it was paused
Btrfs: allow for canceling restriper
Btrfs: allow for pausing restriper
Btrfs: add skip_balance mount option
Btrfs: recover balance on mount
Btrfs: save balance parameters to disk
Btrfs: soft profile changing mode (aka soft convert)
Btrfs: implement online profile changing
Btrfs: do not reduce profile in do_chunk_alloc()
Btrfs: virtual address space subset filter
...
Fix up trivial conflict in fs/btrfs/ioctl.c due to the use of the new
mnt_drop_write_file() helper.
This in addition to a script in my btrfs-tracing tree will help track down space
leaks when we're getting space left over in block groups on umount. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
I used these tracepoints when figuring out what the cluster stuff was doing, so
add them to mainline in case we need to profile this stuff again. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
__send_signal()->trace_signal_generate() doesn't report enough info.
The users want to know was the signal actually delivered or not, and
they also need the shared/private info.
The patch moves trace_signal_generate() at the end of __send_signal()
and adds the 2 additional arguments.
This also allows us to kill trace_signal_overflow_fail/lose_info, we
can simply add the appropriate TRACE_SIGNAL_ "result" codes.
Reported-by: Seiji Aguchi <saguchi@redhat.com>
Reviewed-by: Seiji Aguchi <seiji.aguchi@hds.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
In trace_mm_vmscan_lru_isolate(), we don't output 'file' information to
the trace event and it is a bit inconvenient for the user to get the
real information(like pasted below). mm_vmscan_lru_isolate:
isolate_mode=2 order=0 nr_requested=32 nr_scanned=32 nr_taken=32
contig_taken=0 contig_dirty=0 contig_failed=0
'active' can be obtained by analyzing mode(Thanks go to Minchan and
Mel), So this patch adds 'file' to the trace event and it now looks
like: mm_vmscan_lru_isolate: isolate_mode=2 order=0 nr_requested=32
nr_scanned=32 nr_taken=32 contig_taken=0 contig_dirty=0 contig_failed=0
file=0
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew elucidates:
- First installmeant of MM. We have a HUGE number of MM patches this
time. It's crazy.
- MAINTAINERS updates
- backlight updates
- leds
- checkpatch updates
- misc ELF stuff
- rtc updates
- reiserfs
- procfs
- some misc other bits
* akpm: (124 commits)
user namespace: make signal.c respect user namespaces
workqueue: make alloc_workqueue() take printf fmt and args for name
procfs: add hidepid= and gid= mount options
procfs: parse mount options
procfs: introduce the /proc/<pid>/map_files/ directory
procfs: make proc_get_link to use dentry instead of inode
signal: add block_sigmask() for adding sigmask to current->blocked
sparc: make SA_NOMASK a synonym of SA_NODEFER
reiserfs: don't lock root inode searching
reiserfs: don't lock journal_init()
reiserfs: delay reiserfs lock until journal initialization
reiserfs: delete comments referring to the BKL
drivers/rtc/interface.c: fix alarm rollover when day or month is out-of-range
drivers/rtc/rtc-twl.c: add DT support for RTC inside twl4030/twl6030
drivers/rtc/: remove redundant spi driver bus initialization
drivers/rtc/rtc-jz4740.c: make jz4740_rtc_driver static
drivers/rtc/rtc-mc13xxx.c: make mc13xxx_rtc_idtable static
rtc: convert drivers/rtc/* to use module_platform_driver()
drivers/rtc/rtc-wm831x.c: convert to devm_kzalloc()
drivers/rtc/rtc-wm831x.c: remove unused period IRQ handler
...
oom_score_adj is used for guarding processes from OOM-Killer. One of
problem is that it's inherited at fork(). When a daemon set oom_score_adj
and make children, it's hard to know where the value is set.
This patch adds some tracepoints useful for debugging. This patch adds
3 trace points.
- creating new task
- renaming a task (exec)
- set oom_score_adj
To debug, users need to enable some trace pointer. Maybe filtering is useful as
# EVENT=/sys/kernel/debug/tracing/events/task/
# echo "oom_score_adj != 0" > $EVENT/task_newtask/filter
# echo "oom_score_adj != 0" > $EVENT/task_rename/filter
# echo 1 > $EVENT/enable
# EVENT=/sys/kernel/debug/tracing/events/oom/
# echo 1 > $EVENT/enable
output will be like this.
# grep oom /sys/kernel/debug/tracing/trace
bash-7699 [007] d..3 5140.744510: oom_score_adj_update: pid=7699 comm=bash oom_score_adj=-1000
bash-7699 [007] ...1 5151.818022: task_newtask: pid=7729 comm=bash clone_flags=1200011 oom_score_adj=-1000
ls-7729 [003] ...2 5151.818504: task_rename: pid=7729 oldcomm=bash newcomm=ls oom_score_adj=-1000
bash-7699 [002] ...1 5175.701468: task_newtask: pid=7730 comm=bash clone_flags=1200011 oom_score_adj=-1000
grep-7730 [007] ...2 5175.701993: task_rename: pid=7730 oldcomm=bash newcomm=grep oom_score_adj=-1000
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rename mm_page_free_direct into mm_page_free and mm_pagevec_free into
mm_page_free_batched
Since v2.6.33-5426-gc475dab the kernel triggers mm_page_free_direct for
all freed pages, not only for directly freed. So, let's name it properly.
For pages freed via page-list we also trigger mm_page_free_batched event.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: (36 commits)
mfd: Clearing events requires event registers to be writable for da9052-core
mfd: Fix annotations in da9052-core
gpiolib: Mark da9052 driver broken
mfd: Declare da9052_regmap_config for the bus drivers
MFD: DA9052/53 MFD core module add SPI support v2
MFD: DA9052/53 MFD core module
regmap: Add irq_base accessor to regmap_irq
regmap: Allow drivers to reinitialise the register cache at runtime
regmap: Add trace event for successful cache reads
regmap: Allow regmap_update_bits() users to detect changes
regmap: Report if we actually handled an interrupt in regmap-irq
regmap: Fix rbtreee build when not using debugfs
regmap: Provide debugfs dump of the rbtree cache data
regmap: Do debugfs init before cache init
regmap: Suppress noop writes in regmap_update_bits()
regmap: Remove indexed cache type
regmap: Drop check whether a register is readable in regcache_read
regmap: Properly round cache_word_size
regmap: Add support for 10/14 register formating
regmap: Try cached read before checking if a hardware read is possible
...
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
sched/tracing: Add a new tracepoint for sleeptime
sched: Disable scheduler warnings during oopses
sched: Fix cgroup movement of waking process
sched: Fix cgroup movement of newly created process
sched: Fix cgroup movement of forking process
sched: Remove cfs bandwidth period check in tg_set_cfs_period()
sched: Fix load-balance lock-breaking
sched: Replace all_pinned with a generic flags field
sched: Only queue remote wakeups when crossing cache boundaries
sched: Add missing rcu_dereference() around ->real_parent usage
[S390] fix cputime overflow in uptime_proc_show
[S390] cputime: add sparse checking and cleanup
sched: Mark parent and real_parent as __rcu
sched, nohz: Fix missing RCU read lock
sched, nohz: Set the NOHZ_BALANCE_KICK flag for idle load balancer
sched, nohz: Fix the idle cpu check in nohz_idle_balance
sched: Use jump_labels for sched_feat
sched/accounting: Fix parameter passing in task_group_account_field
sched/accounting: Fix user/system tick double accounting
sched/accounting: Re-use scheduler statistics for the root cgroup
...
Fix up conflicts in
- arch/ia64/include/asm/cputime.h, include/asm-generic/cputime.h
usecs_to_cputime64() vs the sparse cleanups
- kernel/sched/fair.c, kernel/time/tick-sched.c
scheduler changes in multiple branches
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
cpu: Export cpu_up()
rcu: Apply ACCESS_ONCE() to rcu_boost() return value
Revert "rcu: Permit rt_mutex_unlock() with irqs disabled"
docs: Additional LWN links to RCU API
rcu: Augment rcu_batch_end tracing for idle and callback state
rcu: Add rcutorture tests for srcu_read_lock_raw()
rcu: Make rcutorture test for hotpluggability before offlining CPUs
driver-core/cpu: Expose hotpluggability to the rest of the kernel
rcu: Remove redundant rcu_cpu_stall_suppress declaration
rcu: Adaptive dyntick-idle preparation
rcu: Keep invoking callbacks if CPU otherwise idle
rcu: Irq nesting is always 0 on rcu_enter_idle_common
rcu: Don't check irq nesting from rcu idle entry/exit
rcu: Permit dyntick-idle with callbacks pending
rcu: Document same-context read-side constraints
rcu: Identify dyntick-idle CPUs on first force_quiescent_state() pass
rcu: Remove dynticks false positives and RCU failures
rcu: Reduce latency of rcu_prepare_for_idle()
rcu: Eliminate RCU_FAST_NO_HZ grace-period hang
rcu: Avoid needlessly IPIing CPUs at GP end
...
If CONFIG_SCHEDSTATS is defined, the kernel maintains
information about how long the task was sleeping or
in the case of iowait, blocking in the kernel before
getting woken up.
This will be useful for sleep time profiling.
Note: this information is only provided for sched_fair.
Other scheduling classes may choose to provide this in
the future.
Note: the delay includes the time spent on the runqueue
as well.
Signed-off-by: Arun Sharma <asharma@fb.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1324512940-32060-2-git-send-email-asharma@fb.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
pa_inode in group_pa is set NULL in ext4_mb_new_group_pa, so
pa_inode should be not referenced.
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Compensate the task's think time when computing the final pause time,
so that ->dirty_ratelimit can be executed accurately.
think time := time spend outside of balance_dirty_pages()
In the rare case that the task slept longer than the 200ms period time
(result in negative pause time), the sleep time will be compensated in
the following periods, too, if it's less than 1 second.
Accumulated errors are carefully avoided as long as the max pause area
is not hitted.
Pseudo code:
period = pages_dirtied / task_ratelimit;
think = jiffies - dirty_paused_when;
pause = period - think;
1) normal case: period > think
pause = period - think
dirty_paused_when = jiffies + pause
nr_dirtied = 0
period time
|===============================>|
think time pause time
|===============>|==============>|
------|----------------|---------------|------------------------
dirty_paused_when jiffies
2) no pause case: period <= think
don't pause; reduce future pause time by:
dirty_paused_when += period
nr_dirtied = 0
period time
|===============================>|
think time
|===================================================>|
------|--------------------------------+-------------------|----
dirty_paused_when jiffies
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
This makes the binary trace understandable by trace-cmd.
CC: Dave Chinner <david@fromorbit.com>
CC: Curt Wohlgemuth <curtw@google.com>
CC: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
The current rcu_batch_end event trace records only the name of the RCU
flavor and the total number of callbacks that remain queued on the
current CPU. This is insufficient for testing and tuning the new
dyntick-idle RCU_FAST_NO_HZ code, so this commit adds idle state along
with whether or not any of the callbacks that were ready to invoke
at the beginning of rcu_do_batch() are still queued.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The current implementation of RCU_FAST_NO_HZ prevents CPUs from entering
dyntick-idle state if they have RCU callbacks pending. Unfortunately,
this has the side-effect of often preventing them from entering this
state, especially if at least one other CPU is not in dyntick-idle state.
However, the resulting per-tick wakeup is wasteful in many cases: if the
CPU has already fully responded to the current RCU grace period, there
will be nothing for it to do until this grace period ends, which will
frequently take several jiffies.
This commit therefore permits a CPU that has done everything that the
current grace period has asked of it (rcu_pending() == 0) even if it
still as RCU callbacks pending. However, such a CPU posts a timer to
wake it up several jiffies later (6 jiffies, based on experience with
grace-period lengths). This wakeup is required to handle situations
that can result in all CPUs being in dyntick-idle mode, thus failing
to ever complete the current grace period. If a CPU wakes up before
the timer goes off, then it cancels that timer, thus avoiding spurious
wakeups.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
With the new implementation of RCU_FAST_NO_HZ, it was possible to hang
RCU grace periods as follows:
o CPU 0 attempts to go idle, cycles several times through the
rcu_prepare_for_idle() loop, then goes dyntick-idle when
RCU needs nothing more from it, while still having at least
on RCU callback pending.
o CPU 1 goes idle with no callbacks.
Both CPUs can then stay in dyntick-idle mode indefinitely, preventing
the RCU grace period from ever completing, possibly hanging the system.
This commit therefore prevents CPUs that have RCU callbacks from entering
dyntick-idle mode. This approach also eliminates the need for the
end-of-grace-period IPIs used previously.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit adds trace_rcu_prep_idle(), which is invoked from
rcu_prepare_for_idle() and rcu_wake_cpu() to trace attempts on
the part of RCU to force CPUs into dyntick-idle mode.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit updates the trace_rcu_dyntick() header comment to reflect
events added by commit 4b4f421.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The trace_rcu_dyntick() trace event did not print both the old and
the new value of the nesting level, and furthermore printed only
the low-order 32 bits of it. This could result in some confusion
when interpreting trace-event dumps, so this commit prints both
the old and the new value, prints the full 64 bits, and also selects
the process-entry/exit increment to print nicely in hexadecimal.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Trace the rcutorture RCU accesses and dump the trace buffer when the
first failure is detected.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Earlier versions of RCU used the scheduling-clock tick to detect idleness
by checking for the idle task, but handled idleness differently for
CONFIG_NO_HZ=y. But there are now a number of uses of RCU read-side
critical sections in the idle task, for example, for tracing. A more
fine-grained detection of idleness is therefore required.
This commit presses the old dyntick-idle code into full-time service,
so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is
always invoked at the beginning of an idle loop iteration. Similarly,
rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked
at the end of an idle-loop iteration. This allows the idle task to
use RCU everywhere except between consecutive rcu_idle_enter() and
rcu_idle_exit() calls, in turn allowing architecture maintainers to
specify exactly where in the idle loop that RCU may be used.
Because some of the userspace upcall uses can result in what looks
to RCU like half of an interrupt, it is not possible to expect that
the irq_enter() and irq_exit() hooks will give exact counts. This
patch therefore expands the ->dynticks_nesting counter to 64 bits
and uses two separate bitfields to count process/idle transitions
and interrupt entry/exit transitions. It is presumed that userspace
upcalls do not happen in the idle loop or from usermode execution
(though usermode might do a system call that results in an upcall).
The counter is hard-reset on each process/idle transition, which
avoids the interrupt entry/exit error from accumulating. Overflow
is avoided by the 64-bitness of the ->dyntick_nesting counter.
This commit also adds warnings if a non-idle task asks RCU to enter
idle state (and these checks will need some adjustment before applying
Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246).
In addition, validation of ->dynticks and ->dynticks_nesting is added.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
This tracepoint shows how long a task is sleeping in uninterruptible state.
E.g. it may show how long and where a mutex is waited for.
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1322471015-107825-8-git-send-email-avagin@openvz.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently we only trace physical reads, there's no instrumentation if
the read is satisfied from cache.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include <linux/module.h>
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include <linux/module.h>
net: sch_generic remove redundant use of <linux/module.h>
net: inet_timewait_sock doesnt need <linux/module.h>
...
Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h
* 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: Add a 'reason' to wb_writeback_work
writeback: send work item to queue_io, move_expired_inodes
writeback: trace event balance_dirty_pages
writeback: trace event bdi_dirty_ratelimit
writeback: fix ppc compile warnings on do_div(long long, unsigned long)
writeback: per-bdi background threshold
writeback: dirty position control - bdi reserve area
writeback: control dirty pause time
writeback: limit max dirty pause time
writeback: IO-less balance_dirty_pages()
writeback: per task dirty rate limit
writeback: stabilize bdi->dirty_ratelimit
writeback: dirty rate control
writeback: add bg_threshold parameter to __bdi_update_bandwidth()
writeback: dirty position control
writeback: account per-bdi accumulated dirtied pages
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (97 commits)
jbd2: Unify log messages in jbd2 code
jbd/jbd2: validate sb->s_first in journal_get_superblock()
ext4: let ext4_ext_rm_leaf work with EXT_DEBUG defined
ext4: fix a syntax error in ext4_ext_insert_extent when debugging enabled
ext4: fix a typo in struct ext4_allocation_context
ext4: Don't normalize an falloc request if it can fit in 1 extent.
ext4: remove comments about extent mount option in ext4_new_inode()
ext4: let ext4_discard_partial_buffers handle unaligned range correctly
ext4: return ENOMEM if find_or_create_pages fails
ext4: move vars to local scope in ext4_discard_partial_page_buffers_no_lock()
ext4: Create helper function for EXT4_IO_END_UNWRITTEN and i_aiodio_unwritten
ext4: optimize locking for end_io extent conversion
ext4: remove unnecessary call to waitqueue_active()
ext4: Use correct locking for ext4_end_io_nolock()
ext4: fix race in xattr block allocation path
ext4: trace punch_hole correctly in ext4_ext_map_blocks
ext4: clean up AGGRESSIVE_TEST code
ext4: move variables to their scope
ext4: fix quota accounting during migration
ext4: migrate cleanup
...
Change ISOLATE_XXX macro with bitwise isolate_mode_t type. Normally,
macro isn't recommended as it's type-unsafe and making debugging harder as
symbol cannot be passed throught to the debugger.
Quote from Johannes
" Hmm, it would probably be cleaner to fully convert the isolation mode
into independent flags. INACTIVE, ACTIVE, BOTH is currently a
tri-state among flags, which is a bit ugly."
This patch moves isolate mode from swap.h to mmzone.h by memcontrol.h
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 3a9f987b31.
With all the files that are real modules now having module.h
explicitly called out for inclusion, and no reliance on any
implicit presence of module.h assumed, we should no longer
need this workaround.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The <linux/module.h> pretty much brings in the kitchen sink along
with it, so it should be avoided wherever reasonably possible in
terms of being included from other commonly used <linux/something.h>
files, as it results in a measureable increase on compile times.
The worst culprit was probably device.h since it is used everywhere.
This file also had an implicit dependency/usage of mutex.h which was
masked by module.h, and is also fixed here at the same time.
There are over a dozen other headers that simply declare the
struct instead of pulling in the whole file, so follow their lead
and simply make it a few more.
Most of the implicit dependencies on module.h being present by
these headers pulling it in have been now weeded out, so we can
finally make this change with hopefully minimal breakage.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This creates a new 'reason' field in a wb_writeback_work
structure, which unambiguously identifies who initiates
writeback activity. A 'wb_reason' enumeration has been
added to writeback.h, to enumerate the possible reasons.
The 'writeback_work_class' and tracepoint event class and
'writeback_queue_io' tracepoints are updated to include the
symbolic 'reason' in all trace events.
And the 'writeback_inodes_sbXXX' family of routines has had
a wb_stats parameter added to them, so callers can specify
why writeback is being started.
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Instead of sending ->older_than_this to queue_io() and
move_expired_inodes(), send the entire wb_writeback_work
structure. There are other fields of a work item that are
useful in these routines and in tracepoints.
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Useful for analyzing the dynamics of the throttling algorithms and
debugging user reported problems.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (549 commits)
ALSA: hda - Fix ADC input-amp handling for Cx20549 codec
ALSA: hda - Keep EAPD turned on for old Conexant chips
ALSA: hda/realtek - Fix missing volume controls with ALC260
ASoC: wm8940: Properly set codec->dapm.bias_level
ALSA: hda - Fix pin-config for ASUS W90V
ALSA: hda - Fix surround/CLFE headphone and speaker pins order
ALSA: hda - Fix typo
ALSA: Update the sound git tree URL
ALSA: HDA: Add new revision for ALC662
ASoC: max98095: Convert codec->hw_write to snd_soc_write
ASoC: keep pointer to resource so it can be freed
ASoC: sgtl5000: Fix wrong mask in some snd_soc_update_bits calls
ASoC: wm8996: Fix wrong mask for setting WM8996_AIF_CLOCKING_2
ASoC: da7210: Add support for line out and DAC
ASoC: da7210: Add support for DAPM
ALSA: hda/realtek - Fix DAC assignments of multiple speakers
ASoC: Use SGTL5000_LINREG_VDDD_MASK instead of hardcoded mask value
ASoC: Set sgtl5000->ldo in ldo_regulator_register
ASoC: wm8996: Use SND_SOC_DAPM_AIF_OUT for AIF2 Capture
ASoC: wm8994: Use SND_SOC_DAPM_AIF_OUT for AIF3 Capture
...
This patch introduces a fast path in ext4_ext_convert_to_initialized()
for the case when the conversion can be performed by transferring
the newly initialized blocks from the uninitialized extent into
an adjacent initialized extent. Doing so removes the expensive
invocations of memmove() which occur during extent insertion and
the subsequent merge.
In practice this should be the common case for clients performing
append writes into files pre-allocated via
fallocate(FALLOC_FL_KEEP_SIZE). In such a workload performed via
direct IO and when using a suboptimal implementation of memmove()
(x86_64 prior to the 2.6.39 rewrite), this patch reduces kernel CPU
consumption by 32%.
Two new trace points are added to ext4_ext_convert_to_initialized()
to offer visibility into its operations. No exit trace point has
been added due to the multiplicity of return points. This can be
revisited once the upstream cleanup is backported.
Signed-off-by: Eric Gouriou <egouriou@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
llist: Add back llist_add_batch() and llist_del_first() prototypes
sched: Don't use tasklist_lock for debug prints
sched: Warn on rt throttling
sched: Unify the ->cpus_allowed mask copy
sched: Wrap scheduler p->cpus_allowed access
sched: Request for idle balance during nohz idle load balance
sched: Use resched IPI to kick off the nohz idle balance
sched: Fix idle_cpu()
llist: Remove cpu_relax() usage in cmpxchg loops
sched: Convert to struct llist
llist: Add llist_next()
irq_work: Use llist in the struct irq_work logic
llist: Return whether list is empty before adding in llist_add()
llist: Move cpu_relax() to after the cmpxchg()
llist: Remove the platform-dependent NMI checks
llist: Make some llist functions inline
sched, tracing: Show PREEMPT_ACTIVE state in trace_sched_switch
sched: Remove redundant test in check_preempt_tick()
sched: Add documentation for bandwidth control
sched: Return unused runtime on group dequeue
...
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (121 commits)
perf symbols: Increase symbol KSYM_NAME_LEN size
perf hists browser: Refuse 'a' hotkey on non symbolic views
perf ui browser: Use libslang to read keys
perf tools: Fix tracing info recording
perf hists browser: Elide DSO column when it is set to just one DSO, ditto for threads
perf hists: Don't consider filtered entries when calculating column widths
perf hists: Don't decay total_period for filtered entries
perf hists browser: Honour symbol_conf.show_{nr_samples,total_period}
perf hists browser: Do not exit on tab key with single event
perf annotate browser: Don't change selection line when returning from callq
perf tools: handle endianness of feature bitmap
perf tools: Add prelink suggestion to dso update message
perf script: Fix unknown feature comment
perf hists browser: Apply the dso and thread filters when merging new batches
perf hists: Move the dso and thread filters from hist_browser
perf ui browser: Honour the xterm colors
perf top tui: Give color hints just on the percentage, like on --stdio
perf ui browser: Make the colors configurable and change the defaults
perf tui: Remove unneeded call to newtCls on startup
perf hists: Don't format the percentage on hist_entry__snprintf
...
Fix up conflicts in arch/x86/kernel/kprobes.c manually.
Ingo's tree did the insane "add volatile to const array", which just
doesn't make sense ("volatile const"?). But we could remove the const
*and* make the array volatile to make doubly sure that gcc doesn't
optimize it away..
Also fix up kernel/trace/ring_buffer.c non-data-conflicts manually: the
reader_lock has been turned into a raw lock by the core locking merge,
and there was a new user of it introduced in this perf core merge. Make
sure that new use also uses the raw accessor functions.
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
rcu: Move propagation of ->completed from rcu_start_gp() to rcu_report_qs_rsp()
rcu: Remove rcu_needs_cpu_flush() to avoid false quiescent states
rcu: Wire up RCU_BOOST_PRIO for rcutree
rcu: Make rcu_torture_boost() exit loops at end of test
rcu: Make rcu_torture_fqs() exit loops at end of test
rcu: Permit rt_mutex_unlock() with irqs disabled
rcu: Avoid having just-onlined CPU resched itself when RCU is idle
rcu: Suppress NMI backtraces when stall ends before dump
rcu: Prohibit grace periods during early boot
rcu: Simplify unboosting checks
rcu: Prevent early boot set_need_resched() from __rcu_pending()
rcu: Dump local stack if cannot dump all CPUs' stacks
rcu: Move __rcu_read_unlock()'s barrier() within if-statement
rcu: Improve rcu_assign_pointer() and RCU_INIT_POINTER() documentation
rcu: Make rcu_assign_pointer() unconditionally insert a memory barrier
rcu: Make rcu_implicit_dynticks_qs() locals be correct size
rcu: Eliminate in_irq() checks in rcu_enter_nohz()
nohz: Remove nohz_cpu_mask
rcu: Document interpretation of RCU-lockdep splats
rcu: Allow rcutorture's stat_interval parameter to be changed at runtime
...
* 'for-linus' of git://github.com/ericvh/linux:
9p: fix 9p.txt to advertise msize instead of maxdata
net/9p: Convert net/9p protocol dumps to tracepoints
fs/9p: change an int to unsigned int
fs/9p: Cleanup option parsing in 9p
9p: move dereference after NULL check
fs/9p: inode file operation is properly initialized init_special_inode
fs/9p: Update zero-copy implementation in 9p
* 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (63 commits)
PM / Clocks: Remove redundant NULL checks before kfree()
PM / Documentation: Update docs about suspend and CPU hotplug
ACPI / PM: Add Sony VGN-FW21E to nonvs blacklist.
ARM: mach-shmobile: sh7372 A4R support (v4)
ARM: mach-shmobile: sh7372 A3SP support (v4)
PM / Sleep: Mark devices involved in wakeup signaling during suspend
PM / Hibernate: Improve performance of LZO/plain hibernation, checksum image
PM / Hibernate: Do not initialize static and extern variables to 0
PM / Freezer: Make fake_signal_wake_up() wake TASK_KILLABLE tasks too
PM / Hibernate: Add resumedelay kernel param in addition to resumewait
MAINTAINERS: Update linux-pm list address
PM / ACPI: Blacklist Vaio VGN-FW520F machine known to require acpi_sleep=nonvs
PM / ACPI: Blacklist Sony Vaio known to require acpi_sleep=nonvs
PM / Hibernate: Add resumewait param to support MMC-like devices as resume file
PM / Hibernate: Fix typo in a kerneldoc comment
PM / Hibernate: Freeze kernel threads after preallocating memory
PM: Update the policy on default wakeup settings
PM / VT: Cleanup #if defined uglyness and fix compile error
PM / Suspend: Off by one in pm_suspend()
PM / Hibernate: Include storage keys in hibernation image on s390
...
* 'for-linus' of git://opensource.wolfsonmicro.com/regmap: (62 commits)
mfd: Enable rbtree cache for wm831x devices
regmap: Support some block operations on cached devices
regmap: Allow caches for devices with no defaults
regmap: Ensure rbtree syncs registers set to zero properly
regmap: Allow rbtree to cache zero default values
regmap: Warn on raw I/O as well as bulk reads that bypass cache
regmap: Return a sensible error code if we fail to read the cache
regmap: Use bsearch() to search the register defaults
regmap: Fix doc comment
regmap: Optimize the lookup path to use binary search
regmap: Ensure we scream if we enable cache bypass/only at the same time
regmap: Implement regcache_cache_bypass helper function
regmap: Save/restore the bypass state upon syncing
regmap: Lock the sync path, ensure we use the lockless _regmap_write()
regmap: Fix apostrophe usage
regmap: Make _regmap_write() global
regmap: Fix lock used for regcache_cache_only()
regmap: Grab the lock in regcache_cache_only()
regmap: Modify map->cache_bypass directly
regmap: Fix regcache_sync generic implementation
...