acpi_dev_found checks that there is a matching ACPI node, but it
may be disabled (_STA method returns 0) in which case the
soc_button_array driver will not bind to it and axp20x-pek should
handle the power-button.
This commit switches from acpi_dev_found to acpi_dev_present to
avoid not registering an input-dev for the powerbutton when there
is a disabled PNP0C40 device.
The ACPI-6.0 standard defines a standard gpio button device using
the ACPI0011 HID replacing the custom PNP0C40 gpio device, many
newer devices define both PNP0C40 and ACPI0011 devices enabling one
or the other depending on whether the BIOS thinks it is going to boot
Android or Windows.
This commit adds a check for the ACPI0011 device, so that if
either device is present *and* enabled we don't register an input-dev
for the powerbutton.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Commit 9b13a4ca8d ("Input: axp20x-pek - do not register input device
on some systems") added a check for the INTCFD9 ACPI device which also
handles the powerbutton as on some systems the powerbutton is connected
to both the PMIC, handled by axp20x-pek, and to a gpio on the SoC, handled
by soc_button_array which attaches itself to the INTCFD9 ACPI device.
Testing + comparing DSDTs has shown that this only happens on Cherry
Trail devices with an AXP288 PMIC, the AXP288 PMIC is also used on
Bay Trail devices but there the power button is only connected to
the PMIC and not handled by soc_button_array.
This means that the INTCFD9 check has caused a regression on Bay Trail
devices, causing power-button presses to no longer be seen.
This commit fixes this by limiting the check to devices where the ACPI
node for the AXP288 contains a _HRV (hardware revision) attribute with
a value of 3 which indicates we are dealing with a Cherry Trail platform.
Fixes: 9b13a4ca8d ("Input: axp20x-pek - do not register input ...")
Reported-by: Сергей Трусов <t.rus76@ya.ru>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJZK2lrAAoJEHm+PkMAQRiGm3AH/13F1DlIk05aSXHoDr/idIpR
GMHmk3YF+EuFjsL463Sh6s/SSWmz0Lda8euaoB4wCWvQFX2ZjTE+aOd79XlRiZJQ
OTtLkV9I41eXIJUpEOHia7xZiCsbw+usqcHrm1aBoSh5KKV2iQmEOrnJdibqJVOF
eXUMphNK/zFtAd2bKtQSxkaBnOOqsQUgVQSkr2K9rSg25l0KokFC6c5K5IjLn4x9
QgDY4wmMvHrDz0CtpoqlNM4XqbsDJVrFeZGfg6hlMqSRDeXeg4h3Ol0VfIT496RP
QBdrDb6hWO+HKt9B0M+7Q+8a/Fsw+5dtpqv1W/Wlr0i4CS6euU8NChAmrpkrqGo=
=m5ba
-----END PGP SIGNATURE-----
Merge tag 'v4.12-rc3' into for-linus
Merge with mainline to get acpi_dev_present() needed by patches to
axp20x-pek driver.
- Revert one more commit related to the ACPI-based handling of
laptop lids that changed the default behavior on laptops that
booted with closed lids and introduced a regression there
(Benjamin Tissoires).
- Add a missing acpi_put_table() to the code implementing the
/sys/firmware/acpi/tables interface to prevent a counter in
the ACPICA core from overflowing (Dan Williams).
- Drop error messages printed by ACPICA on acpi_get_table()
reference counting mismatches as they need not indicate real
errors at this point (Lv Zheng).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZMeW9AAoJEILEb/54YlRx4B0P/R4z8QpaYtHRCJmjl4wULRJU
GQ3YYar4GSS5dROIsGj9KF2hD9JikeQxmf/E3qJSZrwVVlDOJzFAsi6hWSBBRDv4
u0FhgNRUV6x84TMOTk1KauH8Ucb9oJReDpjZcbrIXgGX3CfE0K9YrXX9gfXn4ITp
gIF9wW3OjXNy5E4YqHB9MoPJ8TSD4qBJUJUYS23igZOsT1fxV8KkcSLmrUk2znMB
J7uOVA9rzmih+nilnH1PyEVm/p3C9nflqhjZPc2oKAdFWj+3WLHdAoCRgmIkOFk4
RVZm4iRsv0a9ONwFvi7ALsUJWS8oAfcIq1W4qC4D1xgSIIU0vXVKtKpDeYPWrL/W
gXuduT3yAvLAePqw24PFABkKbhSgxyd7Y90uTlOHAvSW3deFlOYLPXsCo7uRp9X1
MrZzy3gzZRIZeYSYM31OpawnPeTkyPX5vG//+rAlWBa7kW07emSob7DYUKb0At/5
9uFVQVmRjTtA6cVPVz5oaZknTGJkF3k6OmyG38zy7FMBMhqdiRFD4myQXuarnuVs
/3gl7LfNVHXuNiIg2oAoKujJ9hABWvtIYT1dCoW1KY6/LQyz6BmJVOLOQ483YUAB
SWuNfKVnRn8zYp4isYaJinLqPF7OORwB1zBPKpoPgZV+n/30blxLpGj5kYx492mt
Oi8JrydZeAvK3FAz34oO
=5CiR
-----END PGP SIGNATURE-----
Merge tag 'acpi-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These revert one more problematic commit related to the ACPI-based
handling of laptop lids and make some unuseful error messages coming
from ACPICA go away.
Specifics:
- Revert one more commit related to the ACPI-based handling of laptop
lids that changed the default behavior on laptops that booted with
closed lids and introduced a regression there (Benjamin Tissoires).
- Add a missing acpi_put_table() to the code implementing the
/sys/firmware/acpi/tables interface to prevent a counter in the
ACPICA core from overflowing (Dan Williams).
- Drop error messages printed by ACPICA on acpi_get_table() reference
counting mismatches as they need not indicate real errors at this
point (Lv Zheng)"
* tag 'acpi-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Tables: Fix regression introduced by a too early mechanism enabling
Revert "ACPI / button: Change default behavior to lid_init_state=open"
ACPI / sysfs: fix acpi_get_table() leak / acpi-sysfs denial of service
- Make cpufreq_register_driver() return an error if the ->init()
calls fail for all CPUs to prevent non-functional drivers from
hanging around for no reason (David Arcari).
- Make kirkwood-cpufreq check the return value of clk_prepare_enable()
(which may fail) as appropriate (Arvind Yadav).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZMeVLAAoJEILEb/54YlRxm50QALGQv5EqJExNL9CwRSkcHabU
VJMxx0Q9hhTuGb9hrQaFCsmE5svcbUpk8w/80nAGvtdRT5w2vB1ssTpJMwE7y407
8k8DbqEG+Hzk6cjIsMUN2w6n8mWfhWFkk1bzk7xH4xVOcAsJBnfkxlW1cU9viU9G
FnsEwVmFN5gpZZGwAIqi1Y4GoCpr5xr8mhvTa8D1ofHWfQ/DvroD6uyxd6p3T1cy
ffXhLxCZAXWkg/RjkVzy+fksLF1A/H/7ylTMjLJzYbzdlf1R73tQ+vLFOJu9RYvT
xHLU76zozx85LPMxe0MOvuP8yu4Kfy2DOZIBiH3/upBB+I7UzVzuQL0jk1byIGWv
k5kwvThjJcnF07H0e56tHxq989tDE0RaGg3Mt3JqJ9+ddOKApPa2rDTTv9LtOZOX
4mjDBdDzQg+N+uec8mGpuH+/Nz5LTJtopPyK7LG2r+KwS1cGMyhoEpolwUpVB9wp
t/WcKfewm89ROfCrUjRe0+eRiF8DJz3xMsMbYVYHOxVSfWE/M/llQsMxhmrxW1Bw
jMYzemHziOnYKcSKQtFzgo4/JtDnvx+/2ktlApgcRy9F89I/vjvZfcc3UGCfcd6o
p46Y26b0cTHxfnBGDZOyHQzrlWFiv6j5wSScFncAJGNzZbNfweGRI+lyoS+IPsEf
ueBWL3SzN/Tja4FeQ9BO
=dmsR
-----END PGP SIGNATURE-----
Merge tag 'pm-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix two bugs in error code paths in the cpufreq core and in the
kirkwood-cpufreq driver.
Specifics:
- Make cpufreq_register_driver() return an error if the ->init()
calls fail for all CPUs to prevent non-functional drivers from
hanging around for no reason (David Arcari).
- Make kirkwood-cpufreq check the return value of
clk_prepare_enable() (which may fail) as appropriate (Arvind
Yadav)"
* tag 'pm-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: kirkwood-cpufreq:- Handle return value of clk_prepare_enable()
cpufreq: cpufreq_register_driver() should return -ENODEV if init fails
which can causes crashes in drivers/char/random.c:get_reg().
-----BEGIN PGP SIGNATURE-----
iQEyBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlkxzw8ACgkQ8vlZVpUN
gaMFowf2KbhElx9OO7Nc62adSF8Ni9vJgk9kigOgrHFXCk4AV5L2QtyLBfk9gIOb
QxG62fzs2mnnL26SgnvN+5qbuZch9G0GjZDOQTpAcxvT9EL67oZ5EhM3AJAHckOW
LxSYCJKYBZqvd4dTHZI36gGNfnpYY7O58pRydE1BPPYSftzaQSS0UtMdcBkjCOoc
Kix35I8PpIhXBubgTE2I38FlGHneS442FAGctnMa/8QlqbkCjSqd1V2rdqxvlNvi
lQ4AG+6xLu2r/yCXqwHf18sKjt6XuluqrJTQGdUzarFw20ikURjSxwlu5Rh2PwJk
hyFHMzBlQ7o68XBDk4LzNkNPgBSb
=4W3q
-----END PGP SIGNATURE-----
Merge tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull /dev/random bug fix from Ted Ts'o:
"Fix a race on architectures with prioritized interrupts (such as m68k)
which can causes crashes in drivers/char/random.c:get_reg()"
* tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
fix race in drivers/char/random.c:get_reg()
Merge misc fixes from Andrew Morton:
"15 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
scripts/gdb: make lx-dmesg command work (reliably)
mm: consider memblock reservations for deferred memory initialization sizing
mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified
mlock: fix mlock count can not decrease in race condition
mm/migrate: fix refcount handling when !hugepage_migration_supported()
dax: fix race between colliding PMD & PTE entries
mm: avoid spurious 'bad pmd' warning messages
mm/page_alloc.c: make sure OOM victim can try allocations with no watermarks once
pcmcia: remove left-over %Z format
slub/memcg: cure the brainless abuse of sysfs attributes
initramfs: fix disabling of initramfs (and its compression)
mm: clarify why we want kmalloc before falling backto vmallock
frv: declare jiffies to be located in the .data section
include/linux/gfp.h: fix ___GFP_NOLOCKDEP value
ksm: prevent crash after write_protect_page fails
lx-dmesg needs access to the log_buf symbol from printk.c.
Unfortunately, the symbol log_buf also exists in BPF's verifier.c and
hence gdb can pick one or the other. If it happens to pick BPF's
log_buf, lx-dmesg doesn't work:
(gdb) lx-dmesg
Python Exception <class 'gdb.MemoryError'> Cannot access memory at address 0x0:
Error occurred in Python command: Cannot access memory at address 0x0
(gdb) p log_buf
$15 = 0x0
Luckily, GDB has a way to deal with this, see
https://sourceware.org/gdb/onlinedocs/gdb/Symbols.html
(gdb) info variables ^log_buf$
All variables matching regular expression "^log_buf$":
File <linux.git>/kernel/bpf/verifier.c:
static char *log_buf;
File <linux.git>/kernel/printk/printk.c:
static char *log_buf;
(gdb) p 'verifier.c'::log_buf
$1 = 0x0
(gdb) p 'printk.c'::log_buf
$2 = 0x811a6aa0 <__log_buf> ""
(gdb) p &log_buf
$3 = (char **) 0x8120fe40 <log_buf>
(gdb) p &'verifier.c'::log_buf
$4 = (char **) 0x8120fe40 <log_buf>
(gdb) p &'printk.c'::log_buf
$5 = (char **) 0x8048b7d0 <log_buf>
By being explicit about the location of the symbol, we can make lx-dmesg
work again. While at it, do the same for the other symbols we need from
printk.c
Link: http://lkml.kernel.org/r/20170526112222.3414-1-git@andred.net
Signed-off-by: André Draszik <git@andred.net>
Tested-by: Kieran Bingham <kieran@bingham.xyz>
Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have seen an early OOM killer invocation on ppc64 systems with
crashkernel=4096M:
kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0
kthreadd cpuset=/ mems_allowed=7
CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1
Call Trace:
dump_stack+0xb0/0xf0 (unreliable)
dump_header+0xb0/0x258
out_of_memory+0x5f0/0x640
__alloc_pages_nodemask+0xa8c/0xc80
kmem_getpages+0x84/0x1a0
fallback_alloc+0x2a4/0x320
kmem_cache_alloc_node+0xc0/0x2e0
copy_process.isra.25+0x260/0x1b30
_do_fork+0x94/0x470
kernel_thread+0x48/0x60
kthreadd+0x264/0x330
ret_from_kernel_thread+0x5c/0xa4
Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0 unstable:0
slab_reclaimable:5 slab_unreclaimable:73
mapped:0 shmem:0 pagetables:0 bounce:0
free:0 free_pcp:0 free_cma:0
Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
0 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 0kB
Total swap = 0kB
819200 pages RAM
0 pages HighMem/MovableOnly
817481 pages reserved
0 pages cma reserved
0 pages hwpoisoned
the reason is that the managed memory is too low (only 110MB) while the
rest of the the 50GB is still waiting for the deferred intialization to
be done. update_defer_init estimates the initial memoty to initialize
to 2GB at least but it doesn't consider any memory allocated in that
range. In this particular case we've had
Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB)
so the low 2GB is mostly depleted.
Fix this by considering memblock allocations in the initial static
initialization estimation. Move the max_initialise to
reset_deferred_meminit and implement a simple memblock_reserved_memory
helper which iterates all reserved blocks and sums the size of all that
start below the given address. The cumulative size is than added on top
of the initial estimation. This is still not ideal because
reset_deferred_meminit doesn't consider holes and so reservation might
be above the initial estimation whihch we ignore but let's make the
logic simpler until we really need to handle more complicated cases.
Fixes: 3a80a7fa79 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Link: http://lkml.kernel.org/r/20170531104010.GI27783@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> [4.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KVM uses get_user_pages() to resolve its stage2 faults. KVM sets the
FOLL_HWPOISON flag causing faultin_page() to return -EHWPOISON when it
finds a VM_FAULT_HWPOISON. KVM handles these hwpoison pages as a
special case. (check_user_page_hwpoison())
When huge pages are involved, this doesn't work so well.
get_user_pages() calls follow_hugetlb_page(), which stops early if it
receives VM_FAULT_HWPOISON from hugetlb_fault(), eventually returning
-EFAULT to the caller. The step to map this to -EHWPOISON based on the
FOLL_ flags is missing. The hwpoison special case is skipped, and
-EFAULT is returned to user-space, causing Qemu or kvmtool to exit.
Instead, move this VM_FAULT_ to errno mapping code into a header file
and use it from faultin_page() and follow_hugetlb_page().
With this, KVM works as expected.
This isn't a problem for arm64 today as we haven't enabled
MEMORY_FAILURE, but I can't see any reason this doesn't happen on x86
too, so I think this should be a fix. This doesn't apply earlier than
stable's v4.11.1 due to all sorts of cleanup.
[james.morse@arm.com: add vm_fault_to_errno() call to faultin_page()]
suggested.
Link: http://lkml.kernel.org/r/20170525171035.16359-1-james.morse@arm.com
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170524160900.28786-1-james.morse@arm.com
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org> [4.11.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kefeng reported that when running the follow test, the mlock count in
meminfo will increase permanently:
[1] testcase
linux:~ # cat test_mlockal
grep Mlocked /proc/meminfo
for j in `seq 0 10`
do
for i in `seq 4 15`
do
./p_mlockall >> log &
done
sleep 0.2
done
# wait some time to let mlock counter decrease and 5s may not enough
sleep 5
grep Mlocked /proc/meminfo
linux:~ # cat p_mlockall.c
#include <sys/mman.h>
#include <stdlib.h>
#include <stdio.h>
#define SPACE_LEN 4096
int main(int argc, char ** argv)
{
int ret;
void *adr = malloc(SPACE_LEN);
if (!adr)
return -1;
ret = mlockall(MCL_CURRENT | MCL_FUTURE);
printf("mlcokall ret = %d\n", ret);
ret = munlockall();
printf("munlcokall ret = %d\n", ret);
free(adr);
return 0;
}
In __munlock_pagevec() we should decrement NR_MLOCK for each page where
we clear the PageMlocked flag. Commit 1ebb7cc6a5 ("mm: munlock: batch
NR_MLOCK zone state updates") has introduced a bug where we don't
decrement NR_MLOCK for pages where we clear the flag, but fail to
isolate them from the lru list (e.g. when the pages are on some other
cpu's percpu pagevec). Since PageMlocked stays cleared, the NR_MLOCK
accounting gets permanently disrupted by this.
Fix it by counting the number of page whose PageMlock flag is cleared.
Fixes: 1ebb7cc6a5 (" mm: munlock: batch NR_MLOCK zone state updates")
Link: http://lkml.kernel.org/r/1495678405-54569-1-git-send-email-xieyisheng1@huawei.com
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joern Engel <joern@logfs.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: zhongjiang <zhongjiang@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On failing to migrate a page, soft_offline_huge_page() performs the
necessary update to the hugepage ref-count.
But when !hugepage_migration_supported() , unmap_and_move_hugepage()
also decrements the page ref-count for the hugepage. The combined
behaviour leaves the ref-count in an inconsistent state.
This leads to soft lockups when running the overcommitted hugepage test
from mce-tests suite.
Soft offlining pfn 0x83ed600 at process virtual address 0x400000000000
soft offline: 0x83ed600: migration failed 1, type 1fffc00000008008 (uptodate|head)
INFO: rcu_preempt detected stalls on CPUs/tasks:
Tasks blocked on level-0 rcu_node (CPUs 0-7): P2715
(detected by 7, t=5254 jiffies, g=963, c=962, q=321)
thugetlb_overco R running task 0 2715 2685 0x00000008
Call trace:
dump_backtrace+0x0/0x268
show_stack+0x24/0x30
sched_show_task+0x134/0x180
rcu_print_detail_task_stall_rnp+0x54/0x7c
rcu_check_callbacks+0xa74/0xb08
update_process_times+0x34/0x60
tick_sched_handle.isra.7+0x38/0x70
tick_sched_timer+0x4c/0x98
__hrtimer_run_queues+0xc0/0x300
hrtimer_interrupt+0xac/0x228
arch_timer_handler_phys+0x3c/0x50
handle_percpu_devid_irq+0x8c/0x290
generic_handle_irq+0x34/0x50
__handle_domain_irq+0x68/0xc0
gic_handle_irq+0x5c/0xb0
Address this by changing the putback_active_hugepage() in
soft_offline_huge_page() to putback_movable_pages().
This only triggers on systems that enable memory failure handling
(ARCH_SUPPORTS_MEMORY_FAILURE) but not hugepage migration
(!ARCH_ENABLE_HUGEPAGE_MIGRATION).
I imagine this wasn't triggered as there aren't many systems running
this configuration.
[akpm@linux-foundation.org: remove dead comment, per Naoya]
Link: http://lkml.kernel.org/r/20170525135146.32011-1-punit.agrawal@arm.com
Reported-by: Manoj Iyer <manoj.iyer@canonical.com>
Tested-by: Manoj Iyer <manoj.iyer@canonical.com>
Suggested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org> [3.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We currently have two related PMD vs PTE races in the DAX code. These
can both be easily triggered by having two threads reading and writing
simultaneously to the same private mapping, with the key being that
private mapping reads can be handled with PMDs but private mapping
writes are always handled with PTEs so that we can COW.
Here is the first race:
CPU 0 CPU 1
(private mapping write)
__handle_mm_fault()
create_huge_pmd() - FALLBACK
handle_pte_fault()
passes check for pmd_devmap()
(private mapping read)
__handle_mm_fault()
create_huge_pmd()
dax_iomap_pmd_fault() inserts PMD
dax_iomap_pte_fault() does a PTE fault, but we already have a DAX PMD
installed in our page tables at this spot.
Here's the second race:
CPU 0 CPU 1
(private mapping read)
__handle_mm_fault()
passes check for pmd_none()
create_huge_pmd()
dax_iomap_pmd_fault() inserts PMD
(private mapping write)
__handle_mm_fault()
create_huge_pmd() - FALLBACK
(private mapping read)
__handle_mm_fault()
passes check for pmd_none()
create_huge_pmd()
handle_pte_fault()
dax_iomap_pte_fault() inserts PTE
dax_iomap_pmd_fault() inserts PMD,
but we already have a PTE at
this spot.
The core of the issue is that while there is isolation between faults to
the same range in the DAX fault handlers via our DAX entry locking,
there is no isolation between faults in the code in mm/memory.c. This
means for instance that this code in __handle_mm_fault() can run:
if (pmd_none(*vmf.pmd) && transparent_hugepage_enabled(vma)) {
ret = create_huge_pmd(&vmf);
But by the time we actually get to run the fault handler called by
create_huge_pmd(), the PMD is no longer pmd_none() because a racing PTE
fault has installed a normal PMD here as a parent. This is the cause of
the 2nd race. The first race is similar - there is the following check
in handle_pte_fault():
} else {
/* See comment in pte_alloc_one_map() */
if (pmd_devmap(*vmf->pmd) || pmd_trans_unstable(vmf->pmd))
return 0;
So if a pmd_devmap() PMD (a DAX PMD) has been installed at vmf->pmd, we
will bail and retry the fault. This is correct, but there is nothing
preventing the PMD from being installed after this check but before we
actually get to the DAX PTE fault handlers.
In my testing these races result in the following types of errors:
BUG: Bad rss-counter state mm:ffff8800a817d280 idx:1 val:1
BUG: non-zero nr_ptes on freeing mm: 15
Fix this issue by having the DAX fault handlers verify that it is safe
to continue their fault after they have taken an entry lock to block
other racing faults.
[ross.zwisler@linux.intel.com: improve fix for colliding PMD & PTE entries]
Link: http://lkml.kernel.org/r/20170526195932.32178-1-ross.zwisler@linux.intel.com
Link: http://lkml.kernel.org/r/20170522215749.23516-2-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Pawel Lebioda <pawel.lebioda@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Pawel Lebioda <pawel.lebioda@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Xiong Zhou <xzhou@redhat.com>
Cc: Eryu Guan <eguan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the pmd_devmap() checks were added by 5c7fb56e5e ("mm, dax:
dax-pmd vs thp-pmd vs hugetlbfs-pmd") to add better support for DAX huge
pages, they were all added to the end of if() statements after existing
pmd_trans_huge() checks. So, things like:
- if (pmd_trans_huge(*pmd))
+ if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd))
When further checks were added after pmd_trans_unstable() checks by
commit 7267ec008b ("mm: postpone page table allocation until we have
page to map") they were also added at the end of the conditional:
+ if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))
This ordering is fine for pmd_trans_huge(), but doesn't work for
pmd_trans_unstable(). This is because DAX huge pages trip the bad_pmd()
check inside of pmd_none_or_trans_huge_or_clear_bad() (called by
pmd_trans_unstable()), which prints out a warning and returns 1. So, we
do end up doing the right thing, but only after spamming dmesg with
suspicious looking messages:
mm/pgtable-generic.c:39: bad pmd ffff8808daa49b88(84000001006000a5)
Reorder these checks in a helper so that pmd_devmap() is checked first,
avoiding the error messages, and add a comment explaining why the
ordering is important.
Fixes: commit 7267ec008b ("mm: postpone page table allocation until we have page to map")
Link: http://lkml.kernel.org/r/20170522215749.23516-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Pawel Lebioda <pawel.lebioda@intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Xiong Zhou <xzhou@redhat.com>
Cc: Eryu Guan <eguan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roman Gushchin has reported that the OOM killer can trivially selects
next OOM victim when a thread doing memory allocation from page fault
path was selected as first OOM victim.
allocate invoked oom-killer: gfp_mask=0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0, oom_score_adj=0
allocate cpuset=/ mems_allowed=0
CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
Call Trace:
oom_kill_process+0x219/0x3e0
out_of_memory+0x11d/0x480
__alloc_pages_slowpath+0xc84/0xd40
__alloc_pages_nodemask+0x245/0x260
alloc_pages_vma+0xa2/0x270
__handle_mm_fault+0xca9/0x10c0
handle_mm_fault+0xf3/0x210
__do_page_fault+0x240/0x4e0
trace_do_page_fault+0x37/0xe0
do_async_page_fault+0x19/0x70
async_page_fault+0x28/0x30
...
Out of memory: Kill process 492 (allocate) score 899 or sacrifice child
Killed process 492 (allocate) total-vm:2052368kB, anon-rss:1894576kB, file-rss:4kB, shmem-rss:0kB
allocate: page allocation failure: order:0, mode:0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null)
allocate cpuset=/ mems_allowed=0
CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
Call Trace:
__alloc_pages_slowpath+0xd32/0xd40
__alloc_pages_nodemask+0x245/0x260
alloc_pages_vma+0xa2/0x270
__handle_mm_fault+0xca9/0x10c0
handle_mm_fault+0xf3/0x210
__do_page_fault+0x240/0x4e0
trace_do_page_fault+0x37/0xe0
do_async_page_fault+0x19/0x70
async_page_fault+0x28/0x30
...
oom_reaper: reaped process 492 (allocate), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
...
allocate invoked oom-killer: gfp_mask=0x0(), nodemask=(null), order=0, oom_score_adj=0
allocate cpuset=/ mems_allowed=0
CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
Call Trace:
oom_kill_process+0x219/0x3e0
out_of_memory+0x11d/0x480
pagefault_out_of_memory+0x68/0x80
mm_fault_error+0x8f/0x190
? handle_mm_fault+0xf3/0x210
__do_page_fault+0x4b2/0x4e0
trace_do_page_fault+0x37/0xe0
do_async_page_fault+0x19/0x70
async_page_fault+0x28/0x30
...
Out of memory: Kill process 233 (firewalld) score 10 or sacrifice child
Killed process 233 (firewalld) total-vm:246076kB, anon-rss:20956kB, file-rss:0kB, shmem-rss:0kB
There is a race window that the OOM reaper completes reclaiming the
first victim's memory while nothing but mutex_trylock() prevents the
first victim from calling out_of_memory() from pagefault_out_of_memory()
after memory allocation for page fault path failed due to being selected
as an OOM victim.
This is a side effect of commit 9a67f6488e ("mm: consolidate
GFP_NOFAIL checks in the allocator slowpath") because that commit
silently changed the behavior from
/* Avoid allocations with no watermarks from looping endlessly */
to
/*
* Give up allocations without trying memory reserves if selected
* as an OOM victim
*/
in __alloc_pages_slowpath() by moving the location to check TIF_MEMDIE
flag. I have noticed this change but I didn't post a patch because I
thought it is an acceptable change other than noise by warn_alloc()
because !__GFP_NOFAIL allocations are allowed to fail. But we
overlooked that failing memory allocation from page fault path makes
difference due to the race window explained above.
While it might be possible to add a check to pagefault_out_of_memory()
that prevents the first victim from calling out_of_memory() or remove
out_of_memory() from pagefault_out_of_memory(), changing
pagefault_out_of_memory() does not suppress noise by warn_alloc() when
allocating thread was selected as an OOM victim. There is little point
with printing similar backtraces and memory information from both
out_of_memory() and warn_alloc().
Instead, if we guarantee that current thread can try allocations with no
watermarks once when current thread looping inside
__alloc_pages_slowpath() was selected as an OOM victim, we can follow "who
can use memory reserves" rules and suppress noise by warn_alloc() and
prevent memory allocations from page fault path from calling
pagefault_out_of_memory().
If we take the comment literally, this patch would do
- if (test_thread_flag(TIF_MEMDIE))
- goto nopage;
+ if (alloc_flags == ALLOC_NO_WATERMARKS || (gfp_mask & __GFP_NOMEMALLOC))
+ goto nopage;
because gfp_pfmemalloc_allowed() returns false if __GFP_NOMEMALLOC is
given. But if I recall correctly (I couldn't find the message), the
condition is meant to apply to only OOM victims despite the comment.
Therefore, this patch preserves TIF_MEMDIE check.
Fixes: 9a67f6488e ("mm: consolidate GFP_NOFAIL checks in the allocator slowpath")
Link: http://lkml.kernel.org/r/201705192112.IAF69238.OQOHSJLFOFFMtV@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Roman Gushchin <guro@fb.com>
Tested-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org> [4.11]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 5b5e0928f7 ("lib/vsprintf.c: remove %Z support") removed some
usages of format %Z but forgot "%.2Zx". This makes clang 4.0 reports a
-Wformat-extra-args warning because it does not know about %Z.
Replace %Z with %z.
Link: http://lkml.kernel.org/r/20170520090946.22562-1-nicolas.iooss_linux@m4x.org
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Harald Welte <laforge@gnumonks.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@vger.kernel.org> [4.11+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
memcg_propagate_slab_attrs() abuses the sysfs attribute file functions
to propagate settings from the root kmem_cache to a newly created
kmem_cache. It does that with:
attr->show(root, buf);
attr->store(new, buf, strlen(bug);
Aside of being a lazy and absurd hackery this is broken because it does
not check the return value of the show() function.
Some of the show() functions return 0 w/o touching the buffer. That
means in such a case the store function is called with the stale content
of the previous show(). That causes nonsense like invoking
kmem_cache_shrink() on a newly created kmem_cache. In the worst case it
would cause handing in an uninitialized buffer.
This should be rewritten proper by adding a propagate() callback to
those slub_attributes which must be propagated and avoid that insane
conversion to and from ASCII, but that's too large for a hot fix.
Check at least the return value of the show() function, so calling
store() with stale content is prevented.
Steven said:
"It can cause a deadlock with get_online_cpus() that has been uncovered
by recent cpu hotplug and lockdep changes that Thomas and Peter have
been doing.
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(cpu_hotplug.lock);
lock(slab_mutex);
lock(cpu_hotplug.lock);
lock(slab_mutex);
*** DEADLOCK ***"
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705201244540.2255@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit db2aa7fd15 ("initramfs: allow again choice of the embedded
initram compression algorithm") introduced the possibility to select the
initramfs compression algorithm from Kconfig and while this is a nice
feature it broke the use case described below.
Here is what my build system does:
- kernel is initially configured not to have an initramfs included
- build the user space root file system
- re-configure the kernel to have an initramfs included
(CONFIG_INITRAMFS_SOURCE="/path/to/romfs") and set relevant
CONFIG_INITRAMFS options, in my case, no compression option
(CONFIG_INITRAMFS_COMPRESSION_NONE)
- kernel is re-built with these options -> kernel+initramfs image is
copied
- kernel is re-built again without these options -> kernel image is
copied
Building a kernel without an initramfs means setting this option:
CONFIG_INITRAMFS_SOURCE="" (and this one only)
whereas building a kernel with an initramfs means setting these options:
CONFIG_INITRAMFS_SOURCE="/home/fainelli/work/uclinux-rootfs/romfs /home/fainelli/work/uclinux-rootfs/misc/initramfs.dev"
CONFIG_INITRAMFS_ROOT_UID=1000
CONFIG_INITRAMFS_ROOT_GID=1000
CONFIG_INITRAMFS_COMPRESSION_NONE=y
CONFIG_INITRAMFS_COMPRESSION=""
Commit db2aa7fd15 ("initramfs: allow again choice of the embedded
initram compression algorithm") is problematic because
CONFIG_INITRAMFS_COMPRESSION which is used to determine the
initramfs_data.cpio extension/compression is a string, and due to how
Kconfig works it will evaluate in order, how to assign it.
Setting CONFIG_INITRAMFS_COMPRESSION_NONE with CONFIG_INITRAMFS_SOURCE=""
cannot possibly work (because of the depends on INITRAMFS_SOURCE!=""
imposed on CONFIG_INITRAMFS_COMPRESSION ) yet we still get
CONFIG_INITRAMFS_COMPRESSION assigned to ".gz" because CONFIG_RD_GZIP=y
is set in my kernel, even when there is no initramfs being built.
So we basically end-up generating two initramfs_data.cpio* files, one
without extension, and one with .gz. This causes usr/Makefile to track
usr/initramfs_data.cpio.gz, and not usr/initramfs_data.cpio anymore,
that is also largely problematic after 9e3596b0c6 ("kbuild:
initramfs cleanup, set target from Kconfig") because we used to track
all possible initramfs_data files in the $(targets) variable before that
commit.
The end result is that the kernel with an initramfs clearly does not
contain what we expect it to, it has a stale initramfs_data.cpio file
built into it, and we keep re-generating an initramfs_data.cpio.gz file
which is not the one that we want to include in the kernel image proper.
The fix consists in hiding CONFIG_INITRAMFS_COMPRESSION when
CONFIG_INITRAMFS_SOURCE="". This puts us back in a state to the
pre-4.10 behavior where we can properly disable and re-enable initramfs
within the same kernel .config file, and be in control of what
CONFIG_INITRAMFS_COMPRESSION is set to.
Fixes: db2aa7fd15 ("initramfs: allow again choice of the embedded initram compression algorithm")
Fixes: 9e3596b0c6 ("kbuild: initramfs cleanup, set target from Kconfig")
Link: http://lkml.kernel.org/r/20170521033337.6197-1-f.fainelli@gmail.com
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Cc: P J P <ppandit@redhat.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While converting drm_[cm]alloc* helpers to kvmalloc* variants Chris
Wilson has wondered why we want to try kmalloc before vmalloc fallback
even for larger allocations requests. Let's clarify that one larger
physically contiguous block is less likely to fragment memory than many
scattered pages which can prevent more large blocks from being created.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170517080932.21423-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 7c30f352c8 ("jiffies.h: declare jiffies and jiffies_64 with
____cacheline_aligned_in_smp") removed a section specification from the
jiffies declaration that caused conflicts on some platforms.
Unfortunately this change broke the build for frv:
kernel/built-in.o: In function `__do_softirq': (.text+0x6460): relocation truncated to fit: R_FRV_GPREL12 against symbol
`jiffies' defined in *ABS* section in .tmp_vmlinux1
kernel/built-in.o: In function `__do_softirq': (.text+0x6574): relocation truncated to fit: R_FRV_GPREL12 against symbol
`jiffies' defined in *ABS* section in .tmp_vmlinux1
kernel/built-in.o: In function `pwq_activate_delayed_work': workqueue.c:(.text+0x15b9c): relocation truncated to fit: R_FRV_GPREL12 against
symbol `jiffies' defined in *ABS* section in .tmp_vmlinux1
...
Add __jiffy_arch_data to the declaration of jiffies and use it on frv to
include the section specification. For all other platforms
__jiffy_arch_data (currently) has no effect.
Fixes: 7c30f352c8 ("jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp")
Link: http://lkml.kernel.org/r/20170516221333.177280-1-mka@chromium.org
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Igor Stoppa has noticed that __GFP_NOLOCKDEP can use a lower bit. At
the time commit 7e7844226f ("lockdep: allow to disable reclaim lockup
detection") was written we still had __GFP_OTHER_NODE but I have removed
it in commit 41b6167e8f ("mm: get rid of __GFP_OTHER_NODE") and forgot
to lower the bit value.
The current value is outside of __GFP_BITS_SHIFT so it cannot be used
actually.
Fixes: 7e7844226f ("lockdep: allow to disable reclaim lockup detection")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Igor Stoppa <igor.stoppa@nokia.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"err" needs to be left set to -EFAULT if split_huge_page succeeds.
Otherwise if "err" gets clobbered with zero and write_protect_page
fails, try_to_merge_one_page() will succeed instead of returning -EFAULT
and then try_to_merge_with_ksm_page() will continue thinking kpage is a
PageKsm when in fact it's still an anonymous page. Eventually it'll
crash in page_add_anon_rmap.
This has been reproduced on Fedora25 kernel but I can reproduce with
upstream too.
The bug was introduced in commit f765f54059 ("ksm: prepare to new THP
semantics") introduced in v4.5.
page:fffff67546ce1cc0 count:4 mapcount:2 mapping:ffffa094551e36e1 index:0x7f0f46673
flags: 0x2ffffc0004007c(referenced|uptodate|dirty|lru|active|swapbacked)
page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
page->mem_cgroup:ffffa09674bf0000
------------[ cut here ]------------
kernel BUG at mm/rmap.c:1222!
CPU: 1 PID: 76 Comm: ksmd Not tainted 4.9.3-200.fc25.x86_64 #1
RIP: do_page_add_anon_rmap+0x1c4/0x240
Call Trace:
page_add_anon_rmap+0x18/0x20
try_to_merge_with_ksm_page+0x50b/0x780
ksm_scan_thread+0x1211/0x1410
? prepare_to_wait_event+0x100/0x100
? try_to_merge_with_ksm_page+0x780/0x780
kthread+0xd9/0xf0
? kthread_park+0x60/0x60
ret_from_fork+0x25/0x30
Fixes: f765f54059 ("ksm: prepare to new THP semantics")
Link: http://lkml.kernel.org/r/20170513131040.21732-1-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Federico Simoncelli <fsimonce@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* pm-cpufreq:
cpufreq: kirkwood-cpufreq:- Handle return value of clk_prepare_enable()
cpufreq: cpufreq_register_driver() should return -ENODEV if init fails
- Fix an unmount hang due to a race in io buffer accounting.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCgAGBQJZMKVEAAoJEPh/dxk0SrTrBYcQAKSpzE8C9wDBw6cyxP3kwrTr
FSQiSr7flnGBHwy2U0UC/SIFIwYxvW4BTnXJWADyqtnvLWP1+TC7UY1oNpkTsbkK
KLsWgz3aOcT/8sb346PzFDAuxof2lkv3xFPRBFaoeSkybxWqLz6BWsbmaJNH/wqy
W3k3H241mAftEiv1i9IUlAZMXE31qywIKzzUJvkOglXS8OdVFfMPQvUz6epU2LWA
I2tBip936Sl45vLu6ubqoRpk8dWNuPPX+f4YXl8dVeqRKTYhviMwgYD4rlljb6Ti
kIRG9HYg1GVZo5z/5unAjyEaKzYoRrXnO5Lg+i09NIhezlDhB2HJ+k71NljoeHoe
YCwqumQIGgnxdFu+FP10tKh2EWvDp80SQxgzIvr+FCCKJdsdNYyftRh4CtsCPJSG
xWHT1jgovygHsBEEmG2LS9mCXKkyWgMkHNMBu3Yy/F/4HGzrPjcU3F+x90OmOo7J
S26kEwsAoo+Q5Is8QkmqrnD+CQ7jwXEv9Mw3UqRwQ7UagRdR2nI8CIGEC7W+42Mm
Gd3TtAyJCbhZWXNq7pLeTnGu7JY3/dhR/8VSW+mIKtvFg7v9O1wZBYId8vTwZN1+
8jgnW0h6myE10YKU5bc1TZeYYAkWA+JLRKxoexL3QD8jWeffyZgMNWPM2rb+4Jjp
2wwCHMPvHE8X7a2urTW3
=wRbJ
-----END PGP SIGNATURE-----
Merge tag 'xfs-4.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull XFS fix from Darrick Wong:
"I've one more bugfix for you for 4.12-rc4: Fix an unmount hang due to
a race in io buffer accounting"
* tag 'xfs-4.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: use ->b_state to fix buffer I/O accounting release race
introduced in -rc1.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJZMYN+AAoJEEp/3jgCEfOL6u0H/Rp2NdbW4QPoJWZmvL0vD77B
4N3Y+QCG7CSG3TBHWFlIUL7MsHY2wS/3D6rcIPlLhXVb3PV+IhBw5Aqu/OwzF1j6
YuEAoJZWAS4nP3eTDVz1eFSJnk/M4jrIqgPFFdKdh+pumxTnEwl6vXeKD8SSX4xx
2mlxf+EavqeNYppTyBdIWYlVjF03klWPp42G0CRpvjficO+5tvkP4cOrI8cRkgoK
fYFZDTp7Zz0Y1+FQfxntfRtFgksvNplvdnenDnjsPxty+YPUGETpRfo72R8DMU8n
A16g9hkHZNL3CQWcKwkwQmHX/IJxpKzC0xNMNaZO+8dJqAWOdB9h9FS+roAO1p0=
=WpjE
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-4.12-rc4' of git://github.com/ceph/ceph-client
Pull ceph fix from Ilya Dryomov:
"A small fix for rbd FALLOC_FL_ZERO_RANGE/PUNCH_HOLE handling breakage
introduced in -rc1"
* tag 'ceph-for-4.12-rc4' of git://github.com/ceph/ceph-client:
rbd: implement REQ_OP_WRITE_ZEROES
- a fix to DM to account for the possibility that PREFLUSH or FUA are
used without the SYNC flag if the underlying storage doesn't have a
volatile write-cache
- a DM ioctl memory allocation flag fix to use __GFP_HIGH to allow
emergency forward progress (by using memory reserves as last resort)
- a small DM integrity cleanup to use kvmalloc() instead of duplicating
the same
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJZMZqnAAoJEMUj8QotnQNaB3UIAKDWS1bAAuPR6iiIdlffmZzf
DxbCkpgCIHcmuihNKm+57P+Vwa7a5OFb7LSZhU7p2ormncrzEmLRP4D5twtyxTNt
f1zT85gB3FXZVRDbcOP7yx2JuZdimu41kYBc9PaQuyTaNUl8lisnQhTVuJ9m0VO3
jdgQpSD0jQPhSFd/rA4hi75l2MeKwOJA2cXsfBgSxIW+rflYFi7SoNfG7wQattpU
ArdDEgU++lRuk9FaGVHOIgTduhutk8NtRQFmxXZlE10KEf05SbBLVr1EGgKLVnyC
72+NI8OKeqic9R4phyWNZhErWnteXy9bbQtvBOpEIs8byFY176byzBkGBXjHebE=
=MLRg
-----END PGP SIGNATURE-----
Merge tag 'for-4.12/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- a DM verity fix for a mode when no salt is used
- a fix to DM to account for the possibility that PREFLUSH or FUA are
used without the SYNC flag if the underlying storage doesn't have a
volatile write-cache
- a DM ioctl memory allocation flag fix to use __GFP_HIGH to allow
emergency forward progress (by using memory reserves as last resort)
- a small DM integrity cleanup to use kvmalloc() instead of duplicating
the same
* tag 'for-4.12/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: make flush bios explicitly sync
dm ioctl: restore __GFP_HIGH in copy_params()
dm integrity: use kvmalloc() instead of dm_integrity_kvmalloc()
dm verity: fix no salt use case
Pull MD fixes from Shaohua Li:
"Several patches for MD. One notable is making flush bios sync, others
fix small issues"
* tag 'md/4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md: Make flush bios explicitely sync
md: report sector of stripes with check mismatches
md: uuid debug statement now in processor byte order.
md-cluster: fix potential lock issue in add_new_disk
Pull block fixes from Jens Axboe:
"A set of fixes that should go into the next -rc. This contains:
- A use-after-free in the request_list exit for the legacy IO path,
from Bart.
- A fix for CFQ, fixing a recent regression with the conversion to
higher resolution timing for iops mode. From Hou Tao.
- A single fix for nbd, split in two patches, fixing a leak of a data
structure.
- A regression fix from Keith, ensuring that callers of
blk_mq_update_nr_hw_queues() hold the right lock"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: Avoid that blk_exit_rl() triggers a use-after-free
cfq-iosched: fix the delay of cfq_group's vdisktime under iops mode
blk-mq: Take tagset lock when updating hw queues
nbd: don't leak nbd_config
nbd: nbd_reset() call in nbd_dev_add() is redundant
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZMOEFAAoJEAx081l5xIa+2EMP/iI6uzoEvhfM8niZLdJ4t/zu
HjZkfHE3MoBgTFc47Vy2o0b1umQlNCZg+Zn+0OWs5wKqqISxTQ7bUygXsu64Ua5o
Pp1wt5seuJeTs/W0uaWviDZeiwZW41FJnrTci8/YDTjWG9dgfdHS4PTNyfLa1GOe
kHTdsVEXfhKUE0W0pAi9wos1c+/MvWmStWTB581IFR6CO7T3g3X/ZrsnUuC7Cp5F
imcVh2WhoN52PGW63y84bOg6GONEfuDpPquRAKyqunUUbBDJOO95idNo3AoeXGIs
2ZgglzSphWx/Htx5CtvtOLNDJBTOwwBZd5R5/KL29liJK/AIEKDXl6DAPrUISrs3
mI+wD+iACC6p8RhQrg8dyd9Hw0rv0HZD4SUNLqVOgiE3MNcGOG4Iwdtp3DQaAum/
PxMhICO/gZGCG+Q2XEC5O5i2PA6vIGQkYumE27Wi24YPVc+3qQ0Slpbc7hqSanrn
RYePKowjx8+q/fnDErzcRInciltfB0aL5uxH07lhta82jSzBj2H6wWkUPKv3zC/3
8fCH4kFeFepLDjFB+HsnqoBrNX/DbKo3Z84XCD05/RMAtshWy9rgLE+KZKF4Y0sF
Te8Q6ZBW5SxhuC17j3Jq/hvrTET/dh6MegqVDc6CId0990LED0rhg7yS7xzIBKaY
JtuVg73p69Rq+cn5JP41
=RiLR
-----END PGP SIGNATURE-----
Merge tag 'drm-dp-quirk-for-v4.12-rc4' of git://people.freedesktop.org/~airlied/linux
Pull drm displayport quirk support:
"DP quirk for usb c dongles.
As mentioned I have a separate request for fixing a regression, but
also keeping the broken hw working, for certain USB-C DP adapters they
require a minimised n/m parameters, but an attempt to do this
generically has failed, we need to quirk these specific adapters.
However doing it generically regressed some eDP panels.
This pull adds the infrastructure and a quirk for the adapter"
* tag 'drm-dp-quirk-for-v4.12-rc4' of git://people.freedesktop.org/~airlied/linux:
drm/i915: Detect USB-C specific dongles before reducing M and N
drm/dp: start a DPCD based DP sink/branch device quirk database
drm/i915: use drm DP helper to read DPCD desc
drm/dp: add helper for reading DP sink/branch device desc from DPCD
This contains the fixes for a few reported regression for HD-audio and
USB-audio. All small, trivial, and boring.
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEECxfAB4MH3rD5mfB6bDGAVD0pKaQFAlkxbHgOHHRpd2FpQHN1
c2UuZGUACgkQbDGAVD0pKaSwzg//ZudLpexdkyOCwMZRf6Hft48bMMTH8xBYa/8r
ORQn+zRCKq6lcbrtJ55IyFSO1cXtLAOzK/D/WSOdA48XXXcgb3h4uHLSmlJJCt02
UdTETQs711BlZBC1ey7K4ZLwOOLFuHjnEaTsoodS5JvNdhZdxdX8WpSYijvCKgcg
8cKn1r7sfLz6BYs6UVEQ45Wo5BxodKphBVvyikZZmdW8swDz+Kqqhi3HShMPATeR
BUxuMF3+zdyIHYZ7FHPxzV5mEQg0peX0UgFCiMpAmi99nVxTgzafLCI12lFc5JVW
jUIqqOHwd9vbSew/u8VFpY3KU4ENtc4UwzsTHWiTfXZWghMcxuBmPNTjuiz5+1qI
WLpZzPJdx+xDMvgSShy2EK2EqpTYQqF76uCynm7KUZv5MumRLZ9yqAksrR+IniiE
xNmOonqcNjcJw1iWrC0OHdRyiY4p1IW1QKyi7H4ge681j1P6Ur4mO5jeTYv4BCd6
x8RZhAqKvhxtdDhV7xJZmGZSY/E/ZBkGCD6RO1h3+ZebKM7+TgJWYCizjGMrxn1y
663UA5my8drHR/1RjIr/yGyhZ164mpZgLZewB4wv6CXL0ckQfnn9b55VN8/htnsf
DclrYXCD4bOXxfB5F9qzIkL3AB/gKL8CSY7hMc8+C5a5CLFBgewBL6HnJHE917NC
0RfsFpo=
=8xnN
-----END PGP SIGNATURE-----
Merge tag 'sound-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This contains the fixes for a few reported regression for HD-audio and
USB-audio. All small, trivial, and boring"
* tag 'sound-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Fix applying MSI dual-codec mobo quirk
ALSA: usb: Avoid VLA in mixer_us16x08.c
ALSA: usb: Fix a typo in Tascam US-16x08 mixer element
Revert "ALSA: usb-audio: purge needless variable length array"
A bit largish fixes for dmaengine driver's fixing
- mv_xor_v2 driver for handling descriptors, tx_submit implementation,
removing interrupt coalescing and setting DMA mask properly
- usb-dmac DMAOR AE bit definition
- ep93xx start buffer from BASE0 and not drain the transfers in
terminate_all
- rcar-dmac use right descriptor pointer for residue calculation
- pl330 fix warn for irq freeup
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZMScCAAoJEHwUBw8lI4NHDuAP/1fhckRJUC8P9vP0uf1Id7sy
l56ddvXZPbIEZNEN+ZYZjHngF1qbOR6H1NvK6dgn+7fEoAbvxzB0jbJDMSltv/CK
f27LKd25R/T/a3AapkL5y/8OnfJW5EwP4BkHOFZpFd5pFs58v18XahwO+JTC2yKc
H7gKPDO4IDWcLl7+YsQvDcMreJR6ReRZ7ZHN8IXs7pTAlS5W0All8DKXLPMt3jKg
oSluWxSivWoBDd9odjucyzfylNPCdU1eqY3XPjeAyW7KFL80c10SnwB9VqVB00Jg
fXSDsA/YsdyOEvYeDAKh6olTPkdU/nIfhRvrywA+nQQt3tuGQC/JnrLU7LFPIkWg
LEuOiYPo58u5Zjk96A4nkXlwwbYuvbQjqD+UI2ciizXdEPfeseYv1Tt7Sv52iPij
uJMs+caVTqKFMpBdaMVDxKZRQLPul0Tg5pvOVEWa64GIItqyHR3rZSInLhXcqNfm
Cue4mCL0VXkIzzLy2ZvhyWAu9eyTkFn/qbKOfjgwczfyLuqLg/KxknwQxny9ubqv
z3p8gcdvdiNi3nU/jezyaqQvDtMflxG0EjHukhy6q+87G5mJwuTu67jDIcB4Qmm7
erlHh3MsuWIuEAZHfyZmQZysGmhBWi6Nb4/8UJfJoGRziz0AKpd0eZ/rXiQu6DUu
uw4Bz4UUHGG9P60GuFB2
=sGFQ
-----END PGP SIGNATURE-----
Merge tag 'dmaengine-fix-4.12-rc4' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
"Here is the dmaengine fixes request for 4.12. Fixes bunch of issues in
the driver, npthing exciting though..
- mv_xor_v2 driver fixes for handling descriptors, tx_submit
implementation, removing interrupt coalescing and setting DMA mask
properly
- fix usb-dmac DMAOR AE bit definition
- fix ep93xx start buffer from BASE0 and not drain the transfers in
terminate_all
- fix rcar-dmac to use right descriptor pointer for residue
calculation
- pl330 fix warn for irq freeup"
* tag 'dmaengine-fix-4.12-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: pl330: fix warning in pl330_remove
rcar-dmac: fixup descriptor pointer for descriptor mode
dmaengine: ep93xx: Don't drain the transfers in terminate_all()
dmaengine: ep93xx: Always start from BASE0
dmaengine: usb-dmac: Fix DMAOR AE bit definition
dmaengine: mv_xor_v2: set DMA mask to 40 bits
dmaengine: mv_xor_v2: remove interrupt coalescing
dmaengine: mv_xor_v2: fix tx_submit() implementation
dmaengine: mv_xor_v2: enable XOR engine after its configuration
dmaengine: mv_xor_v2: do not use descriptors not acked by async_tx
dmaengine: mv_xor_v2: properly handle wrapping in the array of HW descriptors
dmaengine: mv_xor_v2: handle mv_xor_v2_prep_sw_desc() error properly
Pull HID fixes from Jiri Kosina:
- corner-case oops fixes for Asus and Wacom drivers from Carlo Caione
and Jason Gerecke
- power management fix (reported on SIS0817 touchscreen) for i2c-hid
devices from Hans de Goede
- device-id-specific fixes and quirks from Hans de Goede, Diego Elio
Pettenò and Che-Liang Chiou
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: asus: Stop underlying hardware on remove
HID: i2c: Call acpi_device_fix_up_power for ACPI-enumerated devices
HID: asus: Add support for T100 keyboard
HID: elecom: extend to fix the descriptor for DEFT trackballs
HID: magicmouse: Set multi-touch keybits for Magic Mouse
HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference
Pull livepatching fix from Jiri Kosina:
"Kconfig dependency fix for livepatching infrastructure from Miroslav
Benes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: Make livepatch dependent on !TRIM_UNUSED_KSYMS
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- revert a broken PAT commit that broke a number of systems
- fix two preemptability warnings/bugs that can trigger under certain
circumstances, in the debug code and in the microcode loader"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "x86/PAT: Fix Xorg regression on CPUs that don't support PAT"
x86/debug/32: Convert a smp_processor_id() call to raw to avoid DEBUG_PREEMPT warning
x86/microcode/AMD: Change load_microcode_amd()'s param to bool to fix preemptibility bug
Pull EFI fixes from Ingo Molnar:
"Misc fixes:
- three boot crash fixes for uncommon configurations
- silence a boot warning under virtualization
- plus a GCC 7 related (harmless) build warning fix"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/bgrt: Skip efi_bgrt_init() in case of non-EFI boot
x86/efi: Correct EFI identity mapping under 'efi=old_map' when KASLR is enabled
x86/efi: Disable runtime services on kexec kernel if booted with efi=old_map
efi: Remove duplicate 'const' specifiers
efi: Don't issue error message when booted under Xen
The BAD_MADT_GICC_ENTRY() macro checks if a GICC MADT entry passes
muster from an ACPI specification standpoint. Current macro detects the
MADT GICC entry length through ACPI firmware version (it changed from 76
to 80 bytes in the transition from ACPI 5.1 to ACPI 6.0 specification)
but always uses (erroneously) the ACPICA (latest) struct (ie struct
acpi_madt_generic_interrupt - that is 80-bytes long) length to check if
the current GICC entry memory record exceeds the MADT table end in
memory as defined by the MADT table header itself, which may result in
false negatives depending on the ACPI firmware version and how the MADT
entries are laid out in memory (ie on ACPI 5.1 firmware MADT GICC
entries are 76 bytes long, so by adding 80 to a GICC entry start address
in memory the resulting address may well be past the actual MADT end,
triggering a false negative).
Fix the BAD_MADT_GICC_ENTRY() macro by reshuffling the condition checks
and update them to always use the firmware version specific MADT GICC
entry length in order to carry out boundary checks.
Fixes: b6cfb27737 ("ACPI / ARM64: add BAD_MADT_GICC_ENTRY() macro")
Reported-by: Julien Grall <julien.grall@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Hanjun Guo <hanjun.guo@linaro.org>
Cc: Al Stone <ahs3@redhat.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We are missing a call to hid_hw_stop() on the remove hook.
Among other things this is causing an Oops when (re-)starting GNOME /
upowerd / ... after the module has been already rmmod-ed.
Signed-off-by: Carlo Caione <carlo@endlessm.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
When removing a device with less than 9 IRQs (AMBA_NR_IRQS), we'll get a
big WARN_ON from devres.c because pl330_remove calls devm_free_irqs for
unallocated irqs. Similarly to pl330_probe, check that IRQ number is
present before calling devm_free_irq.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Commit 4e552c8cb5 ("leds: add LED_ON brightness as boolean value")
has introduced the LED_ON enumeration value that can be used
instead of LED_FULL which has more of a linear value.
Because the tm2-touchscreen doesn't have brightness levels, but
it's a simple on/off led, use LED_ON instead of LED_FULL.
Signed-off-by: Andi Shyti <andi.shyti@samsung.com>
Reviewed-by: Jaechul Lee <jcsing.lee@samsung.com>
Tested-by: Jaechul Lee <jcsing.lee@samsung.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
DP sink specific quirks
* tag 'topic/dp-quirks-2017-05-31' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915: Detect USB-C specific dongles before reducing M and N
drm/dp: start a DPCD based DP sink/branch device quirk database
drm/i915: use drm DP helper to read DPCD desc
drm/dp: add helper for reading DP sink/branch device desc from DPCD
fix a crash that was likely a result of buggy client behavior.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZMHDGAAoJECebzXlCjuG+wVAP/RC2THsrHEfWQSrc+/wkKron
7PUZo6VRhoasjBInSJB/tdy+Yb82NbfLoXfJ71ddAwRUZlte74aI762HHuMdWtHY
8mCum5ea1AfRX5N/L/isO6lh4utO0vQEJ8r+P095d3EDwl0DnLYC3JVlKd/1r2VS
ELy8DZkyaVHZO9xiT+mnRgsq4aMjxG3F7DTHpcKDDFzG5Ts00zBQIXDu/rKmw3fD
WEuQjjrit1gFrUIUzJbSqwSokDCcf7v9HtGTI5+t+pIZ4Q2SyuKuTZvjtg+hb7Qa
K+F2SNNQsqfTW65zllhVR3gYpCykoqYPAJDw9MlqLN5tCmXFZLYhFHEUFx5kuobx
7+Dc3z1o5BgOiXcnKBVe+uONxXxcMYXLbU0e5Gac39GYW5xWzrU1+O6mMi0Q01YS
QsGRZEqHE2/3j1TAl0Q2SqT8gtG+A7piU4s5VavIHKIzI3/WubZ1GjLQ+RfXjuNa
DvkcAvSYfHyxzdWlyxjkzM09edt6SN3yEYdIRv9hiJEbUO3itVm9ycXTHLJUQUL0
sfVeXkm49e8gZZxHn+XuJubkT8HYlDGLQVSzK1zWFgt+zxd9LiP9iY+zs+vL9ryJ
DM9VmlJxZvNx9T7zSradW7gbIwOgxmBfRHFD05oODS1Tymb029akuU0YACb0sVnQ
LzDaZejUmURp7vlUffFp
=wznG
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.12-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
"Revert patch accidentally included in the merge window pull request,
and fix a crash that was likely a result of buggy client behavior"
* tag 'nfsd-4.12-1' of git://linux-nfs.org/~bfields/linux:
nfsd4: fix null dereference on replay
nfsd: Revert "nfsd: check for oversized NFSv2/v3 arguments"
Use ERR_CAST() to avoid cross-structure cast in ocf2, ntfs, and NFS.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Kees Cook <kees@outflux.net>
iQIcBAABCgAGBQJZMHWdAAoJEIly9N/cbcAmWOYP/i45fa6JG7Aw9N59Uz4sqeUQ
ZUlvAUek6GkaGijCPtDYjy0cVj2Cc3QZLSRq9dDw/rU66Mc0ybYWHtIIwJy4ZjVe
D4w2Cs7K1oSOnhJnPTjQSKuMD81PF75NLChf3XSfLvtOWVIqW33EzLIu5lJ1rc1x
wh1fEAsJXGA9xklmW+m8Vn1FoS1a1j+9zuCEmGpveOkk6UKhhp73Ke8PP4uK9ld+
saApe/iH0JdTP6I7030A8hXwz7ZCYbMicw1kVpnsn4rM24p+k3Y2/OrFT2tY6/Y6
fzkTuVL7omQmUWph9zX6SYPg2GACEBTLb5V1YJ6zDUUzucu7vjfsvsTHXZb1gq2j
i8hZ6XsNOMWYJiOkOOSKM0rpjG6WSvF/sGc78ap7NJ4QPZ2/h3BTOXfk/ye/xQmL
WidEESJ4srInpi5ju8JTWHe27aydwiUUF91Y+gFv4G6CGU6/5vjUzOsgeiMxt0JN
lPaTjjL4lBHI2yohx2Wqy88yYWulK3LB0Hzt9XcSGMBA58H9d0CV0ZTkH3dJJkpC
QCM+Kt1DPy5A2RPC2APrPPCJsQycX9PSDeRaWkTxHnNLftpq65h1pAKjMcqsUPgb
HEEMLIBGqm871dr3+aPJPfG3Qil9ANBscDRbHXugCFTseFQO6M26KAxWGN+6LIQp
6Z0GUaPgJEua9ejodq4m
=R3qn
-----END PGP SIGNATURE-----
Merge tag 'gcc-plugins-v4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull gcc-plugin prepwork from Kees Cook:
"Use designated initializers for mtk-vcodec, powerplay, amdgpu, and
sgi-xp. Use ERR_CAST() to avoid cross-structure cast in ocf2, ntfs,
and NFS.
Christoph Hellwig recommended that I send these fixes now, rather than
waiting for the v4.13 merge window. These are all initializer and cast
fixes needed for the future randstruct plugin that haven't been picked
up by the respective maintainers"
* tag 'gcc-plugins-v4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
mtk-vcodec: Use designated initializers
drm/amd/powerplay: Use designated initializers
drm/amdgpu: Use designated initializers
sgi-xp: Use designated initializers
ocfs2: Use ERR_CAST() to avoid cross-structure cast
ntfs: Use ERR_CAST() to avoid cross-structure cast
NFS: Use ERR_CAST() to avoid cross-structure cast
Commit 9fdca4da4d (IB/SA: Split struct sa_path_rec based on IB and
ROCE specific fields) moved the service_id to be specific attribute
for IB and OPA SA Path Record, and thus wasn't assigned for RoCE.
This caused to the following kernel panic in the CMA request handler flow:
[ 27.074594] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 27.074731] IP: __radix_tree_lookup+0x1d/0xe0
...
[ 27.075356] Workqueue: ib_cm cm_work_handler [ib_cm]
[ 27.075401] task: ffff88022e3b8000 task.stack: ffffc90001298000
[ 27.075449] RIP: 0010:__radix_tree_lookup+0x1d/0xe0
...
[ 27.075979] Call Trace:
[ 27.076015] radix_tree_lookup+0xd/0x10
[ 27.076055] cma_ps_find+0x59/0x70 [rdma_cm]
[ 27.076097] cma_id_from_event+0xd2/0x470 [rdma_cm]
[ 27.076144] ? ib_init_ah_from_path+0x39a/0x590 [ib_core]
[ 27.076193] cma_req_handler+0x25/0x480 [rdma_cm]
[ 27.076237] cm_process_work+0x25/0x120 [ib_cm]
[ 27.076280] ? cm_get_bth_pkey.isra.62+0x3c/0xa0 [ib_cm]
[ 27.076350] cm_req_handler+0xb03/0xd40 [ib_cm]
[ 27.076430] ? sched_clock_cpu+0x11/0xb0
[ 27.076478] cm_work_handler+0x194/0x1588 [ib_cm]
[ 27.076525] process_one_work+0x160/0x410
[ 27.076565] worker_thread+0x137/0x4a0
[ 27.076614] kthread+0x112/0x150
[ 27.076684] ? max_active_store+0x60/0x60
[ 27.077642] ? kthread_park+0x90/0x90
[ 27.078530] ret_from_fork+0x2c/0x40
This patch moves it back to the common SA Path Record structure
and removes the redundant setter and getter.
Tested on Connect-IB and Connect-X4 in Infiniband and RoCE respectively.
Fixes: 9fdca4da4d (IB/SA: Split struct sa_path_rec based on IB ands
ROCE specific fields)
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This change will optimize kernel memory deregistration operations.
__ib_umem_release() used to call set_page_dirty_lock() against every
writable page in its memory region. Its purpose is to keep data
synced between CPU and DMA device when swapping happens after mem
deregistration ops. Now we choose not to set page dirty bit if it's
already set by kernel prior to calling __ib_umem_release(). This
reduces memory deregistration time by half or even more when we ran
application simulation test program.
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Commit 5752075144 ("IB/SA: Add OPA path record type") introduced
new local function __ib_copy_path_rec_to_user, but didn't limit its
scope. This produces the following sparse warning:
drivers/infiniband/core/uverbs_marshall.c:99:6: warning:
symbol '__ib_copy_path_rec_to_user' was not declared. Should it be
static?
In addition, it used sizeof ... notations instead of sizeof(...), which
is correct in C, but a little bit misleading. Let's change it too.
Fixes: 5752075144 ("IB/SA: Add OPA path record type")
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
RDMA netlink is part of ib_core, hence ibnl_chk_listeners(),
ibnl_init() and ibnl_cleanup() don't need to be published
in public header file.
Let's remove EXPORT_SYMBOL from ibnl_chk_listeners() and move all these
functions to private header file.
CC: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
If srp_init_qp() fails at srp_create_ch_ib() then ch->send_cq
may be NULL.
Calling directly to ib_destroy_qp() is sufficient because
no work requests were posted on the created qp.
Fixes: 9294000d6d ("IB/srp: Drain the send queue before destroying a QP")
Cc: <stable@vger.kernel.org>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Bart van Assche <bart.vanassche@sandisk.com>--
Signed-off-by: Doug Ledford <dledford@redhat.com>