I'm getting some garbage in bytes 8 and 9 when doing conversion
from sockaddr_in to sockaddr_in6 (leftover from AF_INET?). Let's
explicitly clear the higher bytes.
Fixes: 0ab5539f85 ("selftests/bpf: Tests for BPF_SK_LOOKUP attach point")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200807223846.4190917-1-sdf@google.com
test_progs reports the segmentation fault as below:
$ sudo ./test_progs -t mmap --verbose
test_mmap:PASS:skel_open_and_load 0 nsec
[...]
test_mmap:PASS:adv_mmap1 0 nsec
test_mmap:PASS:adv_mmap2 0 nsec
test_mmap:PASS:adv_mmap3 0 nsec
test_mmap:PASS:adv_mmap4 0 nsec
Segmentation fault
This issue was triggered because mmap() and munmap() used inconsistent
length parameters; mmap() creates a new mapping of 3 * page_size, but the
length parameter set in the subsequent re-map and munmap() functions is
4 * page_size; this leads to the destruction of the process space.
To fix this issue, first create 4 pages of anonymous mapping, then do all
the mmap() with MAP_FIXED.
Another issue is that when unmap the second page fails, the length
parameter to delete tmp1 mappings should be 4 * page_size.
Signed-off-by: Jianlin Lv <Jianlin.Lv@arm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200810153940.125508-1-Jianlin.Lv@arm.com
- Have config-bisect save the good/bad configs at each step.
- Show log file location even on success
- Add PRE_TEST_DIE to kill test if the PRE_TEST fails
- Add a NOT operator for conditionals in config file
- Add the log output of the last test when emailing on failure.
- Other minor clean ups and small fixes.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXzH4tBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qiTVAQCmZzxANHxg58CI4gKCMDmUb9PBoPru
9vIHnQzgr8YiMQEA9+UIuxQxSVT79ONABut56tlTksPqWYelpdkn+nrJAAE=
=ilWu
-----END PGP SIGNATURE-----
Merge tag 'ktest-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
Pull ktest updates from Steven Rostedt:
- Have config-bisect save the good/bad configs at each step.
- Show log file location even on success
- Add PRE_TEST_DIE to kill test if the PRE_TEST fails
- Add a NOT operator for conditionals in config file
- Add the log output of the last test when emailing on failure.
- Other minor clean ups and small fixes.
* tag 'ktest-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest.pl: Fix spelling mistake "Cant" -> "Can't"
ktest.pl: Change the logic to control the size of the log file emailed
ktest.pl: Add MAIL_MAX_SIZE to limit the amount of log emailed
ktest.pl: Add the log of last test in email on failure
ktest.pl: Turn off buffering to the log file
ktest.pl: Just open up the log file once
ktest.pl: Add a NOT operator
ktest.pl: Define PRE_TEST_DIE to kill the test if the PRE_TEST fails
ktest.pl: Always show log file location if defined even on success
ktest.pl: Have config-bisect save each config used in the bisect
If the log file for a given test is larger than the max size given then use
set the seek from the end of the log file instead of from the start of the
test.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Daniel Borkmann says:
====================
pull-request: bpf 2020-08-08
The following pull-request contains BPF updates for your *net* tree.
We've added 11 non-merge commits during the last 2 day(s) which contain
a total of 24 files changed, 216 insertions(+), 135 deletions(-).
The main changes are:
1) Fix UAPI for BPF map iterator before it gets frozen to allow for more
extensions/customization in future, from Yonghong Song.
2) Fix selftests build to undo verbose build output, from Andrii Nakryiko.
3) Fix inlining compilation error on bpf_do_trace_printk() due to variable
argument lists, from Stanislav Fomichev.
4) Fix an uninitialized pointer warning at btf__parse_raw() in libbpf,
from Daniel T. Lee.
5) Fix several compilation warnings in selftests with regards to ignoring
return value, from Jianlin Lv.
6) Fix interruptions by switching off timeout for BPF tests, from Jiri Benc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
During diag self-tests we introduce long wait in the mptcp test
program to give the script enough time to access the sockets
dump.
Such wait is introduced after shutting down one sockets end. Since
commit 43b54c6ee3 ("mptcp: Use full MPTCP-level disconnect state
machine") if both sides shutdown the socket is correctly transitioned
into CLOSED status.
As a side effect some sockets are not dumped via the diag interface,
because the socket state (CLOSED) does not match the default filter, and
this cause self-tests instability.
Address the issue moving the above mentioned wait before shutting
down the socket.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/68
Fixes: df62f2ec3d ("selftests/mptcp: add diag interface tests")
Tested-and-acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit df62f2ec3d ("selftests/mptcp: add diag interface tests")
the MPTCP selftests relies on the MPTCP diag interface which is
enabled by a specific kconfig knob: be sure to include it.
Fixes: df62f2ec3d ("selftests/mptcp: add diag interface tests")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8tsE4WHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJhw+D/9nB8+KxD2yYp2ntoLrhu8cUP6V
LF8C7eQwFI/SV/Z/5ZpQPpBbndJAPz1ob/kZ8v5N4+EGfr3eRyI76RWnshl/CpA1
X/sYCSHezer52giAC59RGt0Nc/S6/sUrVU6/b28tzhoTYxJ6SoDl4WgC2pGGTPdY
ei/KeMPtH2lpy3NazCmLwIAElgnXBDrJZYtuaaIOe/WPDbJ+cbRJzsJ9VGItXqNc
h9n8vpExgHd7ThkM1xlJ5q7Q5KFltKUxGZJoOciLPNJshJ1o0NTMeo/7i8TF3aZZ
aVglnYVI/SKbrEa2JhboM4M7ytfAL606xYPsHr57ojBqxdhUk5zhFOi5uKyaM6Gm
t6wX9o5jfFCg3AZhyd+IP3q7Zc9z1IWMGjwFrNznchwvz2eCcSytOxOkIMuo9o2T
cs79++kmczAit9z9LmMGpHfHWFBOX3gvzfkMqBZMD4+6EeZ33U1CCnkMZuqmajqf
MYZzLzVibrcb6cUuZZm+lmhVgoBrr/HPy6BNf5s8n39PJGMbwkAqHACZI7+78VHu
vVcezubF0IyswRFJGcS19HVWOVJ2lNux8FUnEIOEtxIaUYsSYbwQZnWyFiwxOHJ9
+wZpcgMVLpEXCtOyhvgecn9GfJTvNdoGjVqjXbaH3KkaWm/QRH0mh+17yynajt75
+HK1Us+sy+7N9zinHQ==
=MRuJ
-----END PGP SIGNATURE-----
Merge tag 'kallsyms_show_value-fix-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull sysfs module section fix from Kees Cook:
"Fix sysfs module section output overflow.
About a month after my kallsyms_show_value() refactoring landed, 0day
noticed that there was a path through the kernfs binattr read handlers
that did not have PAGE_SIZEd buffers, and the module "sections" read
handler made a bad assumption about this, resulting in it stomping on
memory when reached through small-sized splice() calls.
I've added a set of tests to find these kinds of regressions more
quickly in the future as well"
Sefltests-acked-by: Shuah Khan <skhan@linuxfoundation.org>
* tag 'kallsyms_show_value-fix-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
selftests: splice: Check behavior of full and short splices
module: Correctly truncate sysfs sections output
Merge misc updates from Andrew Morton:
- a few MM hotfixes
- kthread, tools, scripts, ntfs and ocfs2
- some of MM
Subsystems affected by this patch series: kthread, tools, scripts, ntfs,
ocfs2 and mm (hofixes, pagealloc, slab-generic, slab, slub, kcsan,
debug, pagecache, gup, swap, shmem, memcg, pagemap, mremap, mincore,
sparsemem, vmalloc, kasan, pagealloc, hugetlb and vmscan).
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits)
mm: vmscan: consistent update to pgrefill
mm/vmscan.c: fix typo
khugepaged: khugepaged_test_exit() check mmget_still_valid()
khugepaged: retract_page_tables() remember to test exit
khugepaged: collapse_pte_mapped_thp() protect the pmd lock
khugepaged: collapse_pte_mapped_thp() flush the right range
mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible
mm: thp: replace HTTP links with HTTPS ones
mm/page_alloc: fix memalloc_nocma_{save/restore} APIs
mm/page_alloc.c: skip setting nodemask when we are in interrupt
mm/page_alloc: fallbacks at most has 3 elements
mm/page_alloc: silence a KASAN false positive
mm/page_alloc.c: remove unnecessary end_bitidx for [set|get]_pfnblock_flags_mask()
mm/page_alloc.c: simplify pageblock bitmap access
mm/page_alloc.c: extract the common part in pfn_to_bitidx()
mm/page_alloc.c: replace the definition of NR_MIGRATETYPE_BITS with PB_migratetype_bits
mm/shuffle: remove dynamic reconfiguration
mm/memory_hotplug: document why shuffle_zone() is relevant
mm/page_alloc: remove nr_free_pagecache_pages()
mm: remove vm_total_pages
...
Add a test suite for the mincore() syscall. It tests most of its use
cases as well as its interface.
Tests implemented:
- basic interface test
- behavior on anonymous mappings
- behavior on anonymous mappings with huge tlb pages
- file-backed mapping with a regular file
- file-backed mapping with a tmpfs file
Signed-off-by: Ricardo Cañuelo <ricardo.canuelo@collabora.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200728100450.4065-1-ricardo.canuelo@collabora.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add some tests to cover the kernel memory accounting functionality. These
are covering some issues (and changes) we had recently.
1) A test which allocates a lot of negative dentries, checks memcg slab
statistics, creates memory pressure by setting memory.max to some low
value and checks that some number of slabs was reclaimed.
2) A test which covers side effects of memcg destruction: it creates
and destroys a large number of sub-cgroups, each containing a
multi-threaded workload which allocates and releases some kernel
memory. Then it checks that the charge ans memory.stats do add up on
the parent level.
3) A test which reads /proc/kpagecgroup and implicitly checks that it
doesn't crash the system.
4) A test which spawns a large number of threads and checks that the
kernel stacks accounting works as expected.
5) A test which checks that living charged slab objects are not
preventing the memory cgroup from being released after being deleted by
a user.
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/r/20200623174037.3951353-19-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Haven't reproduced this issue. This PR is does a minor code cleanup.
Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Michal Koutn <mkoutny@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Chris Down <chris@chrisdown.name>
Link: http://lkml.kernel.org/r/20200726013808.22242-1-gaurav1086@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.
Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200726120752.16768-1-grandmaster@al2klimov.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In order to help catch regressions in splice vs read behavior in certain
special files, test a few with various different kinds of internal
kernel helpers.
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
- Add support for (optionally) using queued spinlocks & rwlocks.
- Support for a new faster system call ABI using the scv instruction on Power9
or later.
- Drop support for the PROT_SAO mmap/mprotect flag as it will be unsupported on
Power10 and future processors, leaving us with no way to implement the
functionality it requests. This risks breaking userspace, though we believe
it is unused in practice.
- A bug fix for, and then the removal of, our custom stack expansion checking.
We now allow stack expansion up to the rlimit, like other architectures.
- Remove the remnants of our (previously disabled) topology update code, which
tried to react to NUMA layout changes on virtualised systems, but was prone
to crashes and other problems.
- Add PMU support for Power10 CPUs.
- A change to our signal trampoline so that we don't unbalance the link stack
(branch return predictor) in the signal delivery path.
- Lots of other cleanups, refactorings, smaller features and so on as usual.
Thanks to:
Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey Kardashevskiy,
Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anton
Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan S, Bharata B Rao, Bill
Wendling, Bin Meng, Cédric Le Goater, Chris Packham, Christophe Leroy,
Christoph Hellwig, Daniel Axtens, Dan Williams, David Lamparter, Desnes A.
Nunes do Rosario, Erhard F., Finn Thain, Frederic Barrat, Ganesh Goudar,
Gautham R. Shenoy, Geoff Levand, Greg Kurz, Gustavo A. R. Silva, Hari Bathini,
Harish, Imre Kaloz, Joel Stanley, Joe Perches, John Crispin, Jordan Niethe,
Kajol Jain, Kamalesh Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li
RongQing, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal
Suchanek, Milton Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan
Chancellor, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver
O'Halloran, Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe
Bergheaud, Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy
Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh
Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar Dronamraju,
Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza Cascardo, Thiago Jung
Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov, Wei Yongjun, Wen Xiong,
YueHaibing.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl8tOxATHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDQfEAClXHWf6hnxB84bEu39D51NkVotL1IG
BRWFvyix+xHuUkHIouBPAAMl6ngY5X6wkYd+Z+CY9zHNtdSDoVlJE30YXdMQA/dE
L/rYxR1884yGR/uU/3wusboO68ReXwcKQPmKOymUfh0zH7ujyJsSWLpXFK1YDC5d
2TVVTi0Q+P5ucMHDh0L+AHirIxZvtZSp43+J7xLtywsj+XAxJWCTGo5WCJbdgbCA
Qbv3aOkVyUa3EgsbdM/STPpv82ebqT+PHxeSIO4Jw6ZODtKRH0R5YsWCApuY9eZ+
ebY9RLmgv9ZAhJqB2fv9A5NDcMoGpZNmjM7HrWpXwULKQpkBGHCzJ9FcSdHVMOx8
nbVMFjt4uzLwV1w8lFYslQ2tNH/uH2o9BlryV1RLpiiKokDAJO/NOsWN9y0u/I4J
EmAM5DSX2LgVvvas96IlGK8KX4xkOkf8FLX/H5UDvvAfloH8J4CZXk/CWCab/nqY
KEHPnMmYvQZ1w9SzyZg9sO/1p6Bl1Gmm75Jv2F1lBiRW/42VcGBI/qLsJ4lC59Fc
KbwufYNYYG38wbxDLW1HAPJhRonxIcaZj3EEqk7aTiLZ55nNbu8e2k32CpNXTGqt
npOhzJHimcq7L6+878ZW+xpbZwogIEUdRSsmwb6aT8za3ShnYwSA2Q3LYxh9xyGH
j3GifvPq6Efp3Q==
=QMY1
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add support for (optionally) using queued spinlocks & rwlocks.
- Support for a new faster system call ABI using the scv instruction on
Power9 or later.
- Drop support for the PROT_SAO mmap/mprotect flag as it will be
unsupported on Power10 and future processors, leaving us with no way
to implement the functionality it requests. This risks breaking
userspace, though we believe it is unused in practice.
- A bug fix for, and then the removal of, our custom stack expansion
checking. We now allow stack expansion up to the rlimit, like other
architectures.
- Remove the remnants of our (previously disabled) topology update
code, which tried to react to NUMA layout changes on virtualised
systems, but was prone to crashes and other problems.
- Add PMU support for Power10 CPUs.
- A change to our signal trampoline so that we don't unbalance the link
stack (branch return predictor) in the signal delivery path.
- Lots of other cleanups, refactorings, smaller features and so on as
usual.
Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey
Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju
T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan
S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris
Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan
Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn
Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand,
Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel
Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh
Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan
Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton
Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan
Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran,
Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud,
Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy
Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh
Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar
Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza
Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov,
Wei Yongjun, Wen Xiong, YueHaibing.
* tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits)
selftests/powerpc: Fix pkey syscall redefinitions
powerpc: Fix circular dependency between percpu.h and mmu.h
powerpc/powernv/sriov: Fix use of uninitialised variable
selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs
powerpc/40x: Fix assembler warning about r0
powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric
powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
cpuidle: pseries: Fixup exit latency for CEDE(0)
cpuidle: pseries: Add function to parse extended CEDE records
cpuidle: pseries: Set the latency-hint before entering CEDE
selftests/powerpc: Fix online CPU selection
powerpc/perf: Consolidate perf_callchain_user_[64|32]()
powerpc/pseries/hotplug-cpu: Remove double free in error path
powerpc/pseries/mobility: Add pr_debug() for device tree changes
powerpc/pseries/mobility: Set pr_fmt()
powerpc/cacheinfo: Warn if cache object chain becomes unordered
powerpc/cacheinfo: Improve diagnostics about malformed cache lists
powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages
powerpc/cacheinfo: Set pr_fmt()
powerpc: fix function annotations to avoid section mismatch warnings with gcc-10
...
99aacebecb ("selftests: do not use .ONESHELL") removed .ONESHELL, which
changes how Makefile "silences" multi-command target recipes. selftests/bpf's
Makefile relied (a somewhat unknowingly) on .ONESHELL behavior of silencing
all commands within the recipe if the first command contains @ symbol.
Removing .ONESHELL exposed this hack.
This patch fixes the issue by explicitly silencing each command with $(Q).
Also explicitly define fallback rule for building *.o from *.c, instead of
relying on non-silent inherited rule. This was causing a non-silent output for
bench.o object file.
Fixes: 92f7440ecc ("selftests/bpf: More succinct Makefile output")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200807033058.848677-1-andriin@fb.com
Clang compiler version: 12.0.0
The following warning appears during the selftests/bpf compilation:
prog_tests/send_signal.c:51:3: warning: ignoring return value of ‘write’,
declared with attribute warn_unused_result [-Wunused-result]
51 | write(pipe_c2p[1], buf, 1);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
prog_tests/send_signal.c:54:3: warning: ignoring return value of ‘read’,
declared with attribute warn_unused_result [-Wunused-result]
54 | read(pipe_p2c[0], buf, 1);
| ^~~~~~~~~~~~~~~~~~~~~~~~~
......
prog_tests/stacktrace_build_id_nmi.c:13:2: warning: ignoring return value
of ‘fscanf’,declared with attribute warn_unused_result [-Wunused-resul]
13 | fscanf(f, "%llu", &sample_freq);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test_tcpnotify_user.c:133:2: warning:ignoring return value of ‘system’,
declared with attribute warn_unused_result [-Wunused-result]
133 | system(test_script);
| ^~~~~~~~~~~~~~~~~~~
test_tcpnotify_user.c:138:2: warning:ignoring return value of ‘system’,
declared with attribute warn_unused_result [-Wunused-result]
138 | system(test_script);
| ^~~~~~~~~~~~~~~~~~~
test_tcpnotify_user.c:143:2: warning:ignoring return value of ‘system’,
declared with attribute warn_unused_result [-Wunused-result]
143 | system(test_script);
| ^~~~~~~~~~~~~~~~~~~
Add code that fix compilation warning about ignoring return value and
handles any errors; Check return value of library`s API make the code
more secure.
Signed-off-by: Jianlin Lv <Jianlin.Lv@arm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200806104224.95306-1-Jianlin.Lv@arm.com
Several bpf tests are interrupted by the default timeout of 45 seconds added
by commit 852c8cbf34 ("selftests/kselftest/runner.sh: Add 45 second
timeout per test"). In my case it was test_progs, test_tunnel.sh,
test_lwt_ip_encap.sh and test_xdping.sh.
There's not much value in having a timeout for bpf tests, switch it off.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/7a9198ed10917f4ecab4a3dd74bcda1200791efd.1596739059.git.jbenc@redhat.com
Previous commit adjusted kernel uapi for map
element bpf iterator. This patch adjusted libbpf API
due to uapi change. bpftool and bpf_iter selftests
are also changed accordingly.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200805055058.1457623-1-yhs@fb.com
runqslower's Makefile is building/installing bpftool into
$(OUTPUT)/sbin/bpftool, which coincides with $(DEFAULT_BPFTOOL). In practice
this means that often when building selftests from scratch (after `make
clean`), selftests are racing with runqslower to simultaneously build bpftool
and one of the two processes fail due to file being busy. Prevent this race by
explicitly order-depending on $(BPFTOOL_DEFAULT).
Fixes: a2c9652f75 ("selftests: Refactor build to remove tools/lib/bpf from include path")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200805004757.2960750-1-andriin@fb.com
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAl8ruxsACgkQUqAMR0iA
lPJyeQ/+IjRLKqM8QFv/vZDjhIHFOMR64wfNso3Jh6BhMCL38Ow1LISUqNSoBLQP
eHJF42KlZG4EWwUdXM1ZnRvp/BXl9urkcQUG1Q26tlNsa58UMbntXIIjzk9YucOw
Q6HBzURZ09KLvVHxddOG1V4wEHWpIGrS3siwBQTrvhhkgLfSteX632TAGC+Gizwj
dIleRfFiB1xYhMZg7Ej8b+yImlHj1FO+L3IHq/JgBSjDP5hKkAX16phmmflo8WKZ
ozSU6AqJVdBAYdR1I6xhkFUGwcYjcjPb1MrAeFMDv0YrgbWZNv30k6EpIW/Az5QI
tlw9gZ4EgdnYZ0iG5h8B1MfnRvsr7fYthnLj7wDTuBs1wMkm8kH+YOKGHS+ihjLO
Gg8cC5V4iN07gA/KjKngdoYUOD1IjvWTLj6mB70qWVnYRIz5UveaER3J+kWNC5gU
Jpw9TlcW05Fa9XQf6ChLG9cBj0QyWMSlhl6vQp45b0Z0wgQcpddIZC6TRRlwmgak
DpESQWF3P4FUAFjR/lGizpCsTB1VTp0ghmQ7geT0y1LbF0WDXV1WvaKGR5RmClv4
KVTs9DBCji0zBc5GfdV/0s8A0Ub9g3wdNy1WYRZAm7LIXn/zwNI8I5vH/U140Bd1
qkL4FYxA8lh8+Wy6qk2rM8/5KX5EsxysN8lyebrY7xNR5+ObSYQ=
=oQEB
-----END PGP SIGNATURE-----
Merge tag 'livepatching-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
Pull livepatching updates from Petr Mladek:
"Improvements and cleanups of livepatching selftests"
* tag 'livepatching-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
selftests/livepatch: adopt to newer sysctl error format
selftests/livepatch: Use "comm" instead of "diff" for dmesg
selftests/livepatch: add test delimiter to dmesg
selftests/livepatch: refine dmesg 'taints' in dmesg comparison
selftests/livepatch: Don't clear dmesg when running tests
selftests/livepatch: fix mem leaks in test-klp-shadow-vars
selftests/livepatch: more verification in test-klp-shadow-vars
selftests/livepatch: rework test-klp-shadow-vars
selftests/livepatch: simplify test-klp-callbacks busy target tests
- add syscall audit support
- add seccomp filter support
- clean up make rules under arch/xtensa/boot
- fix state management for exclusive access opcodes
- fix build with PMU enabled
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEK2eFS5jlMn3N6xfYUfnMkfg/oEQFAl8q2YsTHGpjbXZia2Jj
QGdtYWlsLmNvbQAKCRBR+cyR+D+gRPbID/9G6Ck59fsl62MGyqNeEmZV3Wro+tx7
7OlqN7KF20MgZMpure+NY2gTG+n3DwDdmEiiJ4aBzSY0Bg/R6SGubvAn6d+cBZG8
Wfd/bFTW390/FVaPNtveef8cJ9qlfqnGTQgTJ97LVoopnbIWm+aDHfyyb+2Td/c8
eIbhBmKOY1mZD8prnLZoVfXt7kuRrDDumBrRUwpIG/6O5sa+Q5xCj6KxNDlYqMMq
/gi7BEVnDKz6cjXswmJYVkoPFdpJQ6dYEdfqkp+uoEb3i66qOcqB8JKppLdhjZy0
MayL4t7xT+0PxDRQ7eU+TONVHdZxIgu9BKDpREC+xhKLBx2q0U0i/KMWOHnRdJry
AWJtDgiQmPzYuNEAlSDndxPmpDQptFIExJ6aKu0vWafv2XwTw5ukcksDh9bP6r8e
XnxQasiDooAcnW+ByILXyi8a2kOUGTyaM1JMKNtevLVmp4h36I7K9F++Xr9a/R/R
W+as2D4Tp0XX2yutDh5BvjSs5+BokGKj2CdlKpVA1CsrDeTXkjncNgyL84LXId/l
v7hm2mjsNwrtOvr8SiMiV7I/1k+5MhYfxxNrqMsUpXvvzR2TGJZzN4dLdW/IbvY4
mkBoVcGeaa7KODRIXYbnh9sjAx2fJDgkQHjbo9S4RB2csxXdWcgbeXKt6ijkMOwz
YzLhnJ/Bb7UqlQ==
=nN46
-----END PGP SIGNATURE-----
Merge tag 'xtensa-20200805' of git://github.com/jcmvbkbc/linux-xtensa
Pull Xtensa updates from Max Filippov:
- add syscall audit support
- add seccomp filter support
- clean up make rules under arch/xtensa/boot
- fix state management for exclusive access opcodes
- fix build with PMU enabled
* tag 'xtensa-20200805' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: add missing exclusive access state management
xtensa: fix xtensa_pmu_setup prototype
xtensa: add boot subdirectories build artifacts to 'targets'
xtensa: add uImage and xipImage to targets
xtensa: move vmlinux.bin[.gz] to boot subdirectory
xtensa: initialize_mmu.h: fix a duplicated word
selftests/seccomp: add xtensa support
xtensa: add seccomp support
xtensa: expose syscall through user_pt_regs
xtensa: add audit support
Pull networking updates from David Miller:
1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.
2) Support UDP segmentation in code TSO code, from Eric Dumazet.
3) Allow flashing different flash images in cxgb4 driver, from Vishal
Kulkarni.
4) Add drop frames counter and flow status to tc flower offloading,
from Po Liu.
5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.
6) Various new indirect call avoidance, from Eric Dumazet and Brian
Vazquez.
7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
Yonghong Song.
8) Support querying and setting hardware address of a port function via
devlink, use this in mlx5, from Parav Pandit.
9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.
10) Switch qca8k driver over to phylink, from Jonathan McDowell.
11) In bpftool, show list of processes holding BPF FD references to
maps, programs, links, and btf objects. From Andrii Nakryiko.
12) Several conversions over to generic power management, from Vaibhav
Gupta.
13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry
Yakunin.
14) Various https url conversions, from Alexander A. Klimov.
15) Timestamping and PHC support for mscc PHY driver, from Antoine
Tenart.
16) Support bpf iterating over tcp and udp sockets, from Yonghong Song.
17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.
18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.
19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
drivers. From Luc Van Oostenryck.
20) XDP support for xen-netfront, from Denis Kirjanov.
21) Support receive buffer autotuning in MPTCP, from Florian Westphal.
22) Support EF100 chip in sfc driver, from Edward Cree.
23) Add XDP support to mvpp2 driver, from Matteo Croce.
24) Support MPTCP in sock_diag, from Paolo Abeni.
25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
infrastructure, from Jakub Kicinski.
26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.
27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.
28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.
29) Refactor a lot of networking socket option handling code in order to
avoid set_fs() calls, from Christoph Hellwig.
30) Add rfc4884 support to icmp code, from Willem de Bruijn.
31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.
32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.
33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.
34) Support TCP syncookies in MPTCP, from Flowian Westphal.
35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano
Brivio.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits)
net: thunderx: initialize VF's mailbox mutex before first usage
usb: hso: remove bogus check for EINPROGRESS
usb: hso: no complaint about kmalloc failure
hso: fix bailout in error case of probe
ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
selftests/net: relax cpu affinity requirement in msg_zerocopy test
mptcp: be careful on subflow creation
selftests: rtnetlink: make kci_test_encap() return sub-test result
selftests: rtnetlink: correct the final return value for the test
net: dsa: sja1105: use detected device id instead of DT one on mismatch
tipc: set ub->ifindex for local ipv6 address
ipv6: add ipv6_dev_find()
net: openvswitch: silence suspicious RCU usage warning
Revert "vxlan: fix tos value before xmit"
ptp: only allow phase values lower than 1 period
farsync: switch from 'pci_' to 'dma_' API
wan: wanxl: switch from 'pci_' to 'dma_' API
hv_netvsc: do not use VF device if link is down
dpaa2-eth: Fix passing zero to 'PTR_ERR' warning
net: macb: Properly handle phylink on at91sam9x
...
This series adds reporting of the page table order from hmm_range_fault()
and some optimization of migrate_vma():
- Report the size of the page table mapping out of hmm_range_fault(). This
makes it easier to establish a large/huge/etc mapping in the device's
page table.
- Allow devices to ignore the invalidations during migration in cases
where the migration is not going to change pages. For instance migrating
pages to a device does not require the device to invalidate pages
already in the device.
- Update nouveau and hmm_tests to use the above
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl8oocYACgkQOG33FX4g
mxqd3Q/+OClUADmrI+EGJAPI7VD3EYfyZdnMCcp39AYNfySQPN9+fCMF5hVD5U7x
KZVflR/zKUIZJVvdD8yAdrynZ1sHBG/HEzDyoaKcGzfCKq5LEAEnP5FG3xsiDjkO
QX7w6qIGDz59gaeanQKNzqaR3DMpBwO/0D5/80DWXv+WgmxsAphanJYlo4eWyq4D
EGq8EndCxairkTLpPlDHvFottL5kAKDXEinSAwWGQeZJkRY93vj+HZAQaeltmB1K
SDdZr7lsEg2RhtRjzT7CkA2bkCERKL3xEc4VWaCAZw+qm8aeswADVOSo5E5F7DMI
NUsB/p4GZ2CvIog/y3g/aSGluevdYJHTH8ip1BnNr2qCcXSEqHKsmyKpVNZztSUl
uljyT17ZzTsdR4xj50tM27fzgDaavWrwFZTsJxUifuvAO9rHvGDVpaN8ZIU9iZei
PTsGQvfoHDmWBWKX1dkIUGq+UoGwEAYRGk+XU0OYZCK97xmjRnGVoH0FTOk4DNQs
+A0250oTOrvdSGiv0fNT5qpWpFsQ/84h8Lz6ubAD3okVo1bk9cFMe2argQl+E2qI
TGM9ZHS8rphJNWwiPm8xrgf9eQ9bNp3ilCsIzBBpqZq8elwaL6a3ySieDPE734Ar
FZEeEYTvj5Z/gXtyo/gxVKhltCc4U8kPqye9uexTInz4zBUUZOM=
=omAU
-----END PGP SIGNATURE-----
Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull hmm updates from Jason Gunthorpe:
"Ralph has been working on nouveau's use of hmm_range_fault() and
migrate_vma() which resulted in this small series. It adds reporting
of the page table order from hmm_range_fault() and some optimization
of migrate_vma():
- Report the size of the page table mapping out of hmm_range_fault().
This makes it easier to establish a large/huge/etc mapping in the
device's page table.
- Allow devices to ignore the invalidations during migration in cases
where the migration is not going to change pages.
For instance migrating pages to a device does not require the
device to invalidate pages already in the device.
- Update nouveau and hmm_tests to use the above"
* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
mm/hmm/test: use the new migration invalidation
nouveau/svm: use the new migration invalidation
mm/notifier: add migration invalidation type
mm/migrate: add a flags parameter to migrate_vma
nouveau: fix storing invalid ptes
nouveau/hmm: support mapping large sysmem pages
nouveau: fix mapping 2MB sysmem pages
nouveau/hmm: fault one page at a time
mm/hmm: add tests for hmm_pfn_to_map_order()
mm/hmm: provide the page mapping order in hmm_range_fault()
The msg_zerocopy test pins the sender and receiver threads to separate
cores to reduce variance between runs.
But it hardcodes the cores and skips core 0, so it fails on machines
with the selected cores offline, or simply fewer cores.
The test mainly gives code coverage in automated runs. The throughput
of zerocopy ('-z') and non-zerocopy runs is logged for manual
inspection.
Continue even when sched_setaffinity fails. Just log to warn anyone
interpreting the data.
Fixes: 07b65c5b31 ("test: add msg_zerocopy test")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kci_test_encap() is actually composed by two different sub-tests,
kci_test_encap_vxlan() and kci_test_encap_fou()
Therefore we should check the test result of these two in
kci_test_encap() to let the script be aware of the pass / fail status.
Otherwise it will generate false-negative result like below:
$ sudo ./test.sh
PASS: policy routing
PASS: route get
PASS: preferred_lft addresses have expired
PASS: promote_secondaries complete
PASS: tc htb hierarchy
PASS: gre tunnel endpoint
PASS: gretap
PASS: ip6gretap
PASS: erspan
PASS: ip6erspan
PASS: bridge setup
PASS: ipv6 addrlabel
PASS: set ifalias 5b193daf-0a08-46d7-af2c-e7aadd422ded for test-dummy0
PASS: vrf
PASS: vxlan
FAIL: can't add fou port 7777, skipping test
PASS: macsec
PASS: bridge fdb get
PASS: neigh get
$ echo $?
0
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The return value "ret" will be reset to 0 from the beginning of each
sub-test in rtnetlink.sh, therefore this test will always pass if the
last sub-test has passed:
$ sudo ./rtnetlink.sh
PASS: policy routing
PASS: route get
PASS: preferred_lft addresses have expired
PASS: promote_secondaries complete
PASS: tc htb hierarchy
PASS: gre tunnel endpoint
PASS: gretap
PASS: ip6gretap
PASS: erspan
PASS: ip6erspan
PASS: bridge setup
PASS: ipv6 addrlabel
PASS: set ifalias a39ee707-e36b-41d3-802f-63179ed4d580 for test-dummy0
PASS: vrf
PASS: vxlan
FAIL: can't add fou port 7777, skipping test
PASS: macsec
PASS: ipsec
3,7c3,7
< sa[0] spi=0x00000009 proto=0x32 salt=0x64636261 crypt=1
< sa[0] key=0x31323334 35363738 39303132 33343536
< sa[1] rx ipaddr=0x00000000 00000000 00000000 c0a87b03
< sa[1] spi=0x00000009 proto=0x32 salt=0x64636261 crypt=1
< sa[1] key=0x31323334 35363738 39303132 33343536
---
> sa[0] spi=0x00000009 proto=0x32 salt=0x61626364 crypt=1
> sa[0] key=0x34333231 38373635 32313039 36353433
> sa[1] rx ipaddr=0x00000000 00000000 00000000 037ba8c0
> sa[1] spi=0x00000009 proto=0x32 salt=0x61626364 crypt=1
> sa[1] key=0x34333231 38373635 32313039 36353433
FAIL: ipsec_offload incorrect driver data
FAIL: ipsec_offload
PASS: bridge fdb get
PASS: neigh get
$ echo $?
0
Make "ret" become a local variable for all sub-tests.
Also, check the sub-test results in kci_test_rtnl() and return the
final result for this test.
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is the "big" set of changes to the driver core, and some drivers
using the changes, for 5.9-rc1.
"Biggest" thing in here is the device link exposure in sysfs, to help
to tame the madness that is SoC device tree representations and driver
interactions with it.
Other stuff in here that is interesting is:
- device probe log helper so that drivers can report problems in
a unified way easier.
- devres functions added
- DEVICE_ATTR_ADMIN_* macro added to make it harder to write
incorrect sysfs file permissions
- documentation cleanups
- ability for debugfs to be present in the kernel, yet not
exposed to userspace. Needed for systems that want it
enabled, but do not trust users, so they can still use some
kernel functions that were otherwise disabled.
- other minor fixes and cleanups
The patches outside of drivers/base/ all have acks from the respective
subsystem maintainers to go through this tree instead of theirs.
All of these have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXylhOQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylGdACeKqxm8IIDZycj0QjLUlPiEwVIROgAnjpf5jAB
mb4jMvgEGsB6/FwxypPG
=RUss
-----END PGP SIGNATURE-----
Merge tag 'driver-core-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the "big" set of changes to the driver core, and some drivers
using the changes, for 5.9-rc1.
"Biggest" thing in here is the device link exposure in sysfs, to help
to tame the madness that is SoC device tree representations and driver
interactions with it.
Other stuff in here that is interesting is:
- device probe log helper so that drivers can report problems in a
unified way easier.
- devres functions added
- DEVICE_ATTR_ADMIN_* macro added to make it harder to write
incorrect sysfs file permissions
- documentation cleanups
- ability for debugfs to be present in the kernel, yet not exposed to
userspace. Needed for systems that want it enabled, but do not
trust users, so they can still use some kernel functions that were
otherwise disabled.
- other minor fixes and cleanups
The patches outside of drivers/base/ all have acks from the respective
subsystem maintainers to go through this tree instead of theirs.
All of these have been in linux-next with no reported issues"
* tag 'driver-core-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (39 commits)
drm/bridge: lvds-codec: simplify error handling
drm/bridge/sii8620: fix resource acquisition error handling
driver core: add deferring probe reason to devices_deferred property
driver core: add device probe log helper
driver core: Avoid binding drivers to dead devices
Revert "test_firmware: Test platform fw loading on non-EFI systems"
firmware_loader: EFI firmware loader must handle pre-allocated buffer
selftest/firmware: Add selftest timeout in settings
test_firmware: Test platform fw loading on non-EFI systems
driver core: Change delimiter in devlink device's name to "--"
debugfs: Add access restriction option
tracefs: Remove unnecessary debug_fs checks.
driver core: Fix probe_count imbalance in really_probe()
kobject: remove unused KOBJ_MAX action
driver core: Fix sleeping in invalid context during device link deletion
driver core: Add waiting_for_supplier sysfs file for devices
driver core: Add state_synced sysfs file for devices that support it
driver core: Expose device link details in sysfs
driver core: Drop mention of obsolete bus rwsem from kernel-doc
debugfs: file: Remove unnecessary cast in kfree()
...
Here is the large set of char and misc and other driver subsystem
patches for 5.9-rc1. Lots of new driver submissions in here, and
cleanups and features for existing drivers.
Highlights are:
- habanalabs driver updates
- coresight driver updates
- nvmem driver updates
- huge number of "W=1" build warning cleanups from Lee Jones
- dyndbg updates
- virtbox driver fixes and updates
- soundwire driver updates
- mei driver updates
- phy driver updates
- fpga driver updates
- lots of smaller individual misc/char driver cleanups and fixes
Full details are in the shortlog.
All of these have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXylccQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymofgCfZ1CxNWd0ZVM0YIn8cY9gO6ON7MsAnRq48hvn
Vjf4rKM73GC11bVF4Gyy
=Xq1R
-----END PGP SIGNATURE-----
Merge tag 'char-misc-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the large set of char and misc and other driver subsystem
patches for 5.9-rc1. Lots of new driver submissions in here, and
cleanups and features for existing drivers.
Highlights are:
- habanalabs driver updates
- coresight driver updates
- nvmem driver updates
- huge number of "W=1" build warning cleanups from Lee Jones
- dyndbg updates
- virtbox driver fixes and updates
- soundwire driver updates
- mei driver updates
- phy driver updates
- fpga driver updates
- lots of smaller individual misc/char driver cleanups and fixes
Full details are in the shortlog.
All of these have been in linux-next with no reported issues"
* tag 'char-misc-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (322 commits)
habanalabs: remove unused but set variable 'ctx_asid'
nvmem: qcom-spmi-sdam: Enable multiple devices
dt-bindings: nvmem: SID: add binding for A100's SID controller
nvmem: update Kconfig description
nvmem: qfprom: Add fuse blowing support
dt-bindings: nvmem: Add properties needed for blowing fuses
dt-bindings: nvmem: qfprom: Convert to yaml
nvmem: qfprom: use NVMEM_DEVID_AUTO for multiple instances
nvmem: core: add support to auto devid
nvmem: core: Add nvmem_cell_read_u8()
nvmem: core: Grammar fixes for help text
nvmem: sc27xx: add sc2730 efuse support
nvmem: Enforce nvmem stride in the sysfs interface
MAINTAINERS: Add git tree for NVMEM FRAMEWORK
nvmem: sprd: Fix return value of sprd_efuse_probe()
drivers: android: Fix the SPDX comment style
drivers: android: Fix a variable declaration coding style issue
drivers: android: Remove braces for a single statement if-else block
drivers: android: Remove the use of else after return
drivers: android: Fix a variable declaration coding style issue
...
This Kselftest update for Linux 5.9-rc1 consists of
- TAP output reporting related fixes from Paolo Bonzini and Kees Cook.
These fixes make it skip reporting consistent with TAP format.
- Cleanup fixes to framework run_tests from Yauheni Kaliuta
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl8p1akACgkQCwJExA0N
QxwqKg/9HzbnpWeb736HAjeA3v0LFuPte8TELSerjKqfas+g9xQxhf+ReHaZXz9i
KnhBPYyOb57DjnT7Mi7c5qGYzLKCvBF30OR0M1P5lRWBX3SC0VvkgdHzLpksIsHx
1wIht3gClvdOnHeHWWRG374iCbw86po8Xa5V9KOJOofbEKjctYjiShnd3OXXNLdJ
h1/Ro+LffdsqO7VWoJruSuBplCXAVIr8IpUnOhtw/JTGlK7csNHBdHb7KTBm+zKU
i0+f6H5uxRM3BA793OHen9D6kHAVLzhtPc7O0O1IhNRKnDgzY/UIS0qSGpxzD6KG
+ZZ4FpvyfN2EPqgGegjdTNhQjPrjXpPos46FTNOM/qiQNYvCuvUFjRCAzVuN9cqU
QzXdNPUOI7YLpOYpqLNqeefTXhZUxCWPF33oHPhU28W59qYpqQDRRDoOu7l9e7KQ
DMwIKCaSw5qCB7S6x5LqOiQcc4/3xIbJVjO/CQU7G4cuREkAHqUvc2rfwCHcn60e
rt2h/v2rnHzie4kTEukWKPuyHL64CL0jYTdHtFb/PbavnfbEPe3t6GiGu2DJN680
mem6+9Q12KTtY4VY6enBE4YHVlxQPksbp/o5G91evNuFul1ZsAFgWdR5t+iiMrCU
d6u2m5H6dMShZYqTw98MU/PKAh31sQCNZqhruhZMgUOJEV7lP/s=
=4uja
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest updates form Shuah Khan:
- TAP output reporting related fixes from Paolo Bonzini and Kees Cook.
These fixes make it skip reporting consistent with TAP format.
- Cleanup fixes to framework run_tests from Yauheni Kaliuta
* tag 'linux-kselftest-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (23 commits)
selftests/harness: Limit step counter reporting
selftests/seccomp: Check ENOSYS under tracing
selftests/seccomp: Refactor to use fixture variants
selftests/harness: Clean up kern-doc for fixtures
selftests: kmod: Add module address visibility test
Replace HTTP links with HTTPS ones: KMOD KERNEL MODULE LOADER - USERMODE HELPER
selftests: fix condition in run_tests
selftests: do not use .ONESHELL
selftests: pidfd: skip test if unshare fails with EPERM
selftests: pidfd: do not use ksft_exit_skip after ksft_set_plan
selftests/harness: Report skip reason
selftests/harness: Display signed values correctly
selftests/harness: Refactor XFAIL into SKIP
selftests/harness: Switch to TAP output
selftests: Add header documentation and helpers
selftests/binderfs: Fix harness API usage
selftests: Remove unneeded selftest API headers
selftests/clone3: Reorder reporting output
selftests: sync_test: do not use ksft_exit_skip after ksft_set_plan
selftests: sigaltstack: do not use ksft_exit_skip after ksft_set_plan
...
This Kunit update for Linux 5.9-rc1 consists of:
- Adds a generic kunit_resource API extending it to support
resources that are passed in to kunit in addition kunit
allocated resources. In addition, KUnit resources are now
refcounted to avoid passed in resources being released while
in use by kunit.
- Add support for named resources.
- Important bug fixes from Brendan Higgins and Will Chen
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl8pyI4ACgkQCwJExA0N
QxwQmQ//YXvS1IMt4yLQ7iTrhSK2CavzPi/zX+1bgXj7RfEPrdolF12tcHf7b/Ef
UwhG/PTEE6VkBWFuprleGk0fvVW9h13K+2RkN6wkx8rmTMunMhqmMIRj4neGQWQH
LV9bYcc5HBH36jbSnJYMFDJdsATq3mmPMeknzkJf/qZgWxczBLo3pUHPNRuhZWq7
MaRUutpo1f3oSroxhcY8gouRjjPq+yUCip9EeQfdZc/LV9tqjwhjfo+zRKAY3jGf
0KL00wv4NMTNBSgLkxXFrTcyP28Y84MZB+wE3ikyE5a5wzzvMXuEtZhJk9a9q2ZJ
tZI+ICCjnTa+E3pN93F2w7cwyrQysxNH1tpK0qv1hGXLRuxOGCJ993AeUtIWjnyV
CYh/OGK9o92pRuTmoJnMNk9em1WF45YTtvhzpadPiBGb84nbxYSAlrDmm/4xC89p
7PPpkkVxFReMiu1fIyXOV++y33YFMDtoWe/Qn3pLBRpDDs/XWdK0eT4dVWx1/ttl
BdtbAxXafApVI5lGk1qPoGC1MO9XvaAidQP6PULD/rhw2Ww6rtlvwGVPffx/omkn
zgzPTxH1rBehzKPcEaorQqqR6yGXxb4v+gJi1Tg8qRE51Lw619dfyZlJO7axp2h6
7RO3BOVrIkwLP9XT2NPlL6t+RqbQXezGoZjigLmBRlCSvs1lbLs=
=kIIk
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-kunit-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit updates from Shuah Khan:
- Add a generic kunit_resource API extending it to support resources
that are passed in to kunit in addition kunit allocated resources. In
addition, KUnit resources are now refcounted to avoid passed in
resources being released while in use by kunit.
- Add support for named resources.
- Important bug fixes from Brendan Higgins and Will Chen
* tag 'linux-kselftest-kunit-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: tool: fix improper treatment of file location
kunit: tool: fix broken default args in unit tests
kunit: capture stderr on all make subprocess calls
Documentation: kunit: Remove references to --defconfig
kunit: add support for named resources
kunit: generalize kunit_resource API beyond allocated resources
while to come. Changes include:
- Some new Chinese translations
- Progress on the battle against double words words and non-HTTPS URLs
- Some block-mq documentation
- More RST conversions from Mauro. At this point, that task is
essentially complete, so we shouldn't see this kind of churn again for a
while. Unless we decide to switch to asciidoc or something...:)
- Lots of typo fixes, warning fixes, and more.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl8oVkwPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5YoW8H/jJ/xnXFn7tkgVPQAlL3k5HCnK7A5nDP9RVR
cg1pTx1cEFdjzxPlJyExU6/v+AImOvtweHXC+JDK7YcJ6XFUNYXJI3LxL5KwUXbY
BL/xRFszDSXH2C7SJF5GECcFYp01e/FWSLN3yWAh+g+XwsKiTJ8q9+CoIDkHfPGO
7oQsHKFu6s36Af0LfSgxk4sVB7EJbo8e4psuPsP5SUrl+oXRO43Put0rXkR4yJoH
9oOaB51Do5fZp8I4JVAqGXvpXoExyLMO4yw0mASm6YSZ3KyjR8Fae+HD9Cq4ZuwY
0uzb9K+9NEhqbfwtyBsi99S64/6Zo/MonwKwevZuhtsDTK4l4iU=
=JQLZ
-----END PGP SIGNATURE-----
Merge tag 'docs-5.9' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"It's been a busy cycle for documentation - hopefully the busiest for a
while to come. Changes include:
- Some new Chinese translations
- Progress on the battle against double words words and non-HTTPS
URLs
- Some block-mq documentation
- More RST conversions from Mauro. At this point, that task is
essentially complete, so we shouldn't see this kind of churn again
for a while. Unless we decide to switch to asciidoc or
something...:)
- Lots of typo fixes, warning fixes, and more"
* tag 'docs-5.9' of git://git.lwn.net/linux: (195 commits)
scripts/kernel-doc: optionally treat warnings as errors
docs: ia64: correct typo
mailmap: add entry for <alobakin@marvell.com>
doc/zh_CN: add cpu-load Chinese version
Documentation/admin-guide: tainted-kernels: fix spelling mistake
MAINTAINERS: adjust kprobes.rst entry to new location
devices.txt: document rfkill allocation
PCI: correct flag name
docs: filesystems: vfs: correct flag name
docs: filesystems: vfs: correct sync_mode flag names
docs: path-lookup: markup fixes for emphasis
docs: path-lookup: more markup fixes
docs: path-lookup: fix HTML entity mojibake
CREDITS: Replace HTTP links with HTTPS ones
docs: process: Add an example for creating a fixes tag
doc/zh_CN: add Chinese translation prefer section
doc/zh_CN: add clearing-warn-once Chinese version
doc/zh_CN: add admin-guide index
doc:it_IT: process: coding-style.rst: Correct __maybe_unused compiler label
futex: MAINTAINERS: Re-add selftests directory
...
this has been brought into a shape which is maintainable and actually
works.
This final version was done by Sasha Levin who took it up after Intel
dropped the ball. Sasha discovered that the SGX (sic!) offerings out there
ship rogue kernel modules enabling FSGSBASE behind the kernels back which
opens an instantanious unpriviledged root hole.
The FSGSBASE instructions provide a considerable speedup of the context
switch path and enable user space to write GSBASE without kernel
interaction. This enablement requires careful handling of the exception
entries which go through the paranoid entry path as they cannot longer rely
on the assumption that user GSBASE is positive (as enforced via prctl() on
non FSGSBASE enabled systemn). All other entries (syscalls, interrupts and
exceptions) can still just utilize SWAPGS unconditionally when the entry
comes from user space. Converting these entries to use FSGSBASE has no
benefit as SWAPGS is only marginally slower than WRGSBASE and locating and
retrieving the kernel GSBASE value is not a free operation either. The real
benefit of RD/WRGSBASE is the avoidance of the MSR reads and writes.
The changes come with appropriate selftests and have held up in field
testing against the (sanitized) Graphene-SGX driver.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8pGnoTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoTYJD/9873GkwvGcc/Vq/dJH1szGTgFftPyZ
c/Y9gzx7EGBPLo25BS820L+ZlynzXHDxExKfCEaD10TZfe5XIc1vYNR0J74M2NmK
IBgEDstJeW93ai+rHCFRXIevhpzU4GgGYJ1MeeOgbVMN3aGU1g6HfzMvtF0fPn8Y
n6fsLZa43wgnoTdjwjjikpDTrzoZbaL1mbODBzBVPAaTbim7IKKTge6r/iCKrOjz
Uixvm3g9lVzx52zidJ9kWa8esmbOM1j0EPe7/hy3qH9DFo87KxEzjHNH3T6gY5t6
NJhRAIfY+YyTHpPCUCshj6IkRudE6w/qjEAmKP9kWZxoJrvPCTWOhCzelwsFS9b9
gxEYfsnaKhsfNhB6fi0PtWlMzPINmEA7SuPza33u5WtQUK7s1iNlgHfvMbjstbwg
MSETn4SG2/ZyzUrSC06lVwV8kh0RgM3cENc/jpFfIHD0vKGI3qfka/1RY94kcOCG
AeJd0YRSU2RqL7lmxhHyG8tdb8eexns41IzbPCLXX2sF00eKNkVvMRYT2mKfKLFF
q8v1x7yuwmODdXfFR6NdCkGm9IU7wtL6wuQ8Nhu9UraFmcXo6X6FLJC18FqcvSb9
jvcRP4XY/8pNjjf44JB8yWfah0xGQsaMIKQGP4yLv4j6Xk1xAQKH1MqcC7l1D2HN
5Z24GibFqSK/vA==
=QaAN
-----END PGP SIGNATURE-----
Merge tag 'x86-fsgsbase-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fsgsbase from Thomas Gleixner:
"Support for FSGSBASE. Almost 5 years after the first RFC to support
it, this has been brought into a shape which is maintainable and
actually works.
This final version was done by Sasha Levin who took it up after Intel
dropped the ball. Sasha discovered that the SGX (sic!) offerings out
there ship rogue kernel modules enabling FSGSBASE behind the kernels
back which opens an instantanious unpriviledged root hole.
The FSGSBASE instructions provide a considerable speedup of the
context switch path and enable user space to write GSBASE without
kernel interaction. This enablement requires careful handling of the
exception entries which go through the paranoid entry path as they
can no longer rely on the assumption that user GSBASE is positive (as
enforced via prctl() on non FSGSBASE enabled systemn).
All other entries (syscalls, interrupts and exceptions) can still just
utilize SWAPGS unconditionally when the entry comes from user space.
Converting these entries to use FSGSBASE has no benefit as SWAPGS is
only marginally slower than WRGSBASE and locating and retrieving the
kernel GSBASE value is not a free operation either. The real benefit
of RD/WRGSBASE is the avoidance of the MSR reads and writes.
The changes come with appropriate selftests and have held up in field
testing against the (sanitized) Graphene-SGX driver"
* tag 'x86-fsgsbase-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
x86/fsgsbase: Fix Xen PV support
x86/ptrace: Fix 32-bit PTRACE_SETREGS vs fsbase and gsbase
selftests/x86/fsgsbase: Add a missing memory constraint
selftests/x86/fsgsbase: Fix a comment in the ptrace_write_gsbase test
selftests/x86: Add a syscall_arg_fault_64 test for negative GSBASE
selftests/x86/fsgsbase: Test ptracer-induced GS base write with FSGSBASE
selftests/x86/fsgsbase: Test GS selector on ptracer-induced GS base write
Documentation/x86/64: Add documentation for GS/FS addressing mode
x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2
x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit
x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit
x86/entry/64: Introduce the FIND_PERCPU_BASE macro
x86/entry/64: Switch CR3 before SWAPGS in paranoid entry
x86/speculation/swapgs: Check FSGSBASE in enabling SWAPGS mitigation
x86/process/64: Use FSGSBASE instructions on thread copy and ptrace
x86/process/64: Use FSBSBASE in switch_to() if available
x86/process/64: Make save_fsgs_for_kvm() ready for FSGSBASE
x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions
x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions
x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
...
On distros using older glibc versions, the pkey tests encounter build
failures due to redefinition of the pkey syscall numbers.
For compatibility, commit 743f3544ff added a wrapper for the
gettid() syscall and included syscall.h if the version of glibc used
is older than 2.30. This leads to different definitions of SYS_pkey_*
as the ones in the pkey test header set numeric constants where as the
ones from syscall.h reuse __NR_pkey_*. The compiler complains about
redefinitions since they are different.
This replaces SYS_pkey_* definitions with __NR_pkey_* such that the
definitions in both syscall.h and pkeys.h are alike. This way, if
syscall.h has to be included for compatibility reasons, builds will
still succeed.
Fixes: 743f3544ff ("selftests/powerpc: Add wrapper for gettid")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Suggested-by: David Laight <david.laight@aculab.com>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a4956d838bf59b0a71a2553c5ca81131ea8b49b9.1596561758.git.sandipan@linux.ibm.com
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXygcpgAKCRCRxhvAZXjc
ogPeAQDv1ncqtNroFAC4pJ4tQhH7JSjW0OltiMk/AocY/J2SdQD9GJ15luYJ0/om
697q/Z68sndRynhdoZlMuf3oYuBlHQw=
=3ZhE
-----END PGP SIGNATURE-----
Merge tag 'close-range-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull close_range() implementation from Christian Brauner:
"This adds the close_range() syscall. It allows to efficiently close a
range of file descriptors up to all file descriptors of a calling
task.
This is coordinated with the FreeBSD folks which have copied our
version of this syscall and in the meantime have already merged it in
April 2019:
https://reviews.freebsd.org/D21627https://svnweb.freebsd.org/base?view=revision&revision=359836
The syscall originally came up in a discussion around the new mount
API and making new file descriptor types cloexec by default. During
this discussion, Al suggested the close_range() syscall.
First, it helps to close all file descriptors of an exec()ing task.
This can be done safely via (quoting Al's example from [1] verbatim):
/* that exec is sensitive */
unshare(CLONE_FILES);
/* we don't want anything past stderr here */
close_range(3, ~0U);
execve(....);
The code snippet above is one way of working around the problem that
file descriptors are not cloexec by default. This is aggravated by the
fact that we can't just switch them over without massively regressing
userspace. For a whole class of programs having an in-kernel method of
closing all file descriptors is very helpful (e.g. demons, service
managers, programming language standard libraries, container managers
etc.).
Second, it allows userspace to avoid implementing closing all file
descriptors by parsing through /proc/<pid>/fd/* and calling close() on
each file descriptor and other hacks. From looking at various
large(ish) userspace code bases this or similar patterns are very
common in service managers, container runtimes, and programming
language runtimes/standard libraries such as Python or Rust.
In addition, the syscall will also work for tasks that do not have
procfs mounted and on kernels that do not have procfs support compiled
in. In such situations the only way to make sure that all file
descriptors are closed is to call close() on each file descriptor up
to UINT_MAX or RLIMIT_NOFILE, OPEN_MAX trickery.
Based on Linus' suggestion close_range() also comes with a new flag
CLOSE_RANGE_UNSHARE to more elegantly handle file descriptor dropping
right before exec. This would usually be expressed in the sequence:
unshare(CLONE_FILES);
close_range(3, ~0U);
as pointed out by Linus it might be desirable to have this be a part
of close_range() itself under a new flag CLOSE_RANGE_UNSHARE which
gets especially handy when we're closing all file descriptors above a
certain threshold.
Test-suite as always included"
* tag 'close-range-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
tests: add CLOSE_RANGE_UNSHARE tests
close_range: add CLOSE_RANGE_UNSHARE
tests: add close_range() tests
arch: wire-up close_range()
open: add close_range()
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXygegQAKCRCRxhvAZXjc
olWZAQCMPbhI/20LA3OYJ6s+BgBEnm89PymvlHcym6Z4AvTungD+KqZonIYuxWgi
6Ttlv/fzgFFbXgJgbuass5mwFVoN5wM=
=oK7d
-----END PGP SIGNATURE-----
Merge tag 'cap-checkpoint-restore-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull checkpoint-restore updates from Christian Brauner:
"This enables unprivileged checkpoint/restore of processes.
Given that this work has been going on for quite some time the first
sentence in this summary is hopefully more exciting than the actual
final code changes required. Unprivileged checkpoint/restore has seen
a frequent increase in interest over the last two years and has thus
been one of the main topics for the combined containers &
checkpoint/restore microconference since at least 2018 (cf. [1]).
Here are just the three most frequent use-cases that were brought forward:
- The JVM developers are integrating checkpoint/restore into a Java
VM to significantly decrease the startup time.
- In high-performance computing environment a resource manager will
typically be distributing jobs where users are always running as
non-root. Long-running and "large" processes with significant
startup times are supposed to be checkpointed and restored with
CRIU.
- Container migration as a non-root user.
In all of these scenarios it is either desirable or required to run
without CAP_SYS_ADMIN. The userspace implementation of
checkpoint/restore CRIU already has the pull request for supporting
unprivileged checkpoint/restore up (cf. [2]).
To enable unprivileged checkpoint/restore a new dedicated capability
CAP_CHECKPOINT_RESTORE is introduced. This solution has last been
discussed in 2019 in a talk by Google at Linux Plumbers (cf. [1]
"Update on Task Migration at Google Using CRIU") with Adrian and
Nicolas providing the implementation now over the last months. In
essence, this allows the CRIU binary to be installed with the
CAP_CHECKPOINT_RESTORE vfs capability set thereby enabling
unprivileged users to restore processes.
To make this possible the following permissions are altered:
- Selecting a specific PID via clone3() set_tid relaxed from userns
CAP_SYS_ADMIN to CAP_CHECKPOINT_RESTORE.
- Selecting a specific PID via /proc/sys/kernel/ns_last_pid relaxed
from userns CAP_SYS_ADMIN to CAP_CHECKPOINT_RESTORE.
- Accessing /proc/pid/map_files relaxed from init userns
CAP_SYS_ADMIN to init userns CAP_CHECKPOINT_RESTORE.
- Changing /proc/self/exe from userns CAP_SYS_ADMIN to userns
CAP_CHECKPOINT_RESTORE.
Of these four changes the /proc/self/exe change deserves a few words
because the reasoning behind even restricting /proc/self/exe changes
in the first place is just full of historical quirks and tracking this
down was a questionable version of fun that I'd like to spare others.
In short, it is trivial to change /proc/self/exe as an unprivileged
user, i.e. without userns CAP_SYS_ADMIN right now. Either via ptrace()
or by simply intercepting the elf loader in userspace during exec.
Nicolas was nice enough to even provide a POC for the latter (cf. [3])
to illustrate this fact.
The original patchset which introduced PR_SET_MM_MAP had no
permissions around changing the exe link. They too argued that it is
trivial to spoof the exe link already which is true. The argument
brought up against this was that the Tomoyo LSM uses the exe link in
tomoyo_manager() to detect whether the calling process is a policy
manager. This caused changing the exe links to be guarded by userns
CAP_SYS_ADMIN.
All in all this rather seems like a "better guard it with something
rather than nothing" argument which imho doesn't qualify as a great
security policy. Again, because spoofing the exe link is possible for
the calling process so even if this were security relevant it was
broken back then and would be broken today. So technically, dropping
all permissions around changing the exe link would probably be
possible and would send a clearer message to any userspace that relies
on /proc/self/exe for security reasons that they should stop doing
this but for now we're only relaxing the exe link permissions from
userns CAP_SYS_ADMIN to userns CAP_CHECKPOINT_RESTORE.
There's a final uapi change in here. Changing the exe link used to
accidently return EINVAL when the caller lacked the necessary
permissions instead of the more correct EPERM. This pr contains a
commit fixing this. I assume that userspace won't notice or care and
if they do I will revert this commit. But since we are changing the
permissions anyway it seems like a good opportunity to try this fix.
With these changes merged unprivileged checkpoint/restore will be
possible and has already been tested by various users"
[1] LPC 2018
1. "Task Migration at Google Using CRIU"
https://www.youtube.com/watch?v=yI_1cuhoDgA&t=12095
2. "Securely Migrating Untrusted Workloads with CRIU"
https://www.youtube.com/watch?v=yI_1cuhoDgA&t=14400
LPC 2019
1. "CRIU and the PID dance"
https://www.youtube.com/watch?v=LN2CUgp8deo&list=PLVsQ_xZBEyN30ZA3Pc9MZMFzdjwyz26dO&index=9&t=2m48s
2. "Update on Task Migration at Google Using CRIU"
https://www.youtube.com/watch?v=LN2CUgp8deo&list=PLVsQ_xZBEyN30ZA3Pc9MZMFzdjwyz26dO&index=9&t=1h2m8s
[2] https://github.com/checkpoint-restore/criu/pull/1155
[3] https://github.com/nviennot/run_as_exe
* tag 'cap-checkpoint-restore-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
selftests: add clone3() CAP_CHECKPOINT_RESTORE test
prctl: exe link permission error changed from -EINVAL to -EPERM
prctl: Allow local CAP_CHECKPOINT_RESTORE to change /proc/self/exe
proc: allow access in init userns for map_files with CAP_CHECKPOINT_RESTORE
pid_namespace: use checkpoint_restore_ns_capable() for ns_last_pid
pid: use checkpoint_restore_ns_capable() for set_tid
capabilities: Introduce CAP_CHECKPOINT_RESTORE
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXygcLwAKCRCRxhvAZXjc
ohajAP4n5E3BmN0jpIviXT4eNhP62jzxJtxlVXtgGT3D8b1mpQEA5n8NSOlQLoAh
yUGsjtwR9xDcHMcrhXD3yN6eYJSK0A8=
=tn4R
-----END PGP SIGNATURE-----
Merge tag 'threads-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull thread updates from Christian Brauner:
"This contains the changes to add the missing support for attaching to
time namespaces via pidfds.
Last cycle setns() was changed to support attaching to multiple
namespaces atomically. This requires all namespaces to have a point of
no return where they can't fail anymore.
Specifically, <namespace-type>_install() is allowed to perform
permission checks and install the namespace into the new struct nsset
that it has been given but it is not allowed to make visible changes
to the affected task. Once <namespace-type>_install() returns,
anything that the given namespace type additionally requires to be
setup needs to ideally be done in a function that can't fail or if it
fails the failure must be non-fatal.
For time namespaces the relevant functions that fell into this
category were timens_set_vvar_page() and vdso_join_timens(). The
latter could still fail although it didn't need to. This function is
only implemented for vdso_join_timens() in current mainline. As
discussed on-list (cf. [1]), in order to make setns() support time
namespaces when attaching to multiple namespaces at once properly we
changed vdso_join_timens() to always succeed. So vdso_join_timens()
replaces the mmap_write_lock_killable() with mmap_read_lock().
Please note that arm is about to grow vdso support for time namespaces
(possibly this merge window). We've synced on this change and arm64
also uses mmap_read_lock(), i.e. makes vdso_join_timens() a function
that can't fail. Once the changes here and the arm64 changes have
landed, vdso_join_timens() should be turned into a void function so
it's obvious to callers and implementers on other architectures that
the expectation is that it can't fail.
We didn't do this right away because it would've introduced
unnecessary merge conflicts between the two trees for no major gain.
As always, tests included"
[1]: https://lore.kernel.org/lkml/20200611110221.pgd3r5qkjrjmfqa2@wittgenstein
* tag 'threads-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
tests: add CLONE_NEWTIME setns tests
nsproxy: support CLONE_NEWTIME with setns()
timens: add timens_commit() helper
timens: make vdso_join_timens() always succeed
- Improved selftest coverage, timeouts, and reporting
- Add EPOLLHUP support for SECCOMP_RET_USER_NOTIF (Christian Brauner)
- Refactor __scm_install_fd() into __receive_fd() and fix buggy callers
- Introduce "addfd" command for SECCOMP_RET_USER_NOTIF (Sargun Dhillon)
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8oZcQWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJomDD/4x3j7eXREcXDsHOmlgEaHWGx4l
JldHFQhV5GjmD7gOkPcoZSG7NfG7F6VpwAJg7ZoR3qUkem7K8DFucxqgo1RldCot
nigleeLX6JeMS0Z+iwjAVZd+5t4xG4J/7GGDHIIMiG5qvwJ0Yf64o1bkjaB2Q/Bv
tluBg0WF32kFMG/ZwyY/V2QDbbue97CFPflybOh1o2nWbVzmUlFEEum3UUvZsxc8
smMsattJyuAV7kcEKzKrs8b010NdFZqwdbub5Np9W3XEXGBYMdIPoNsOQGmB9wby
j2ui0lzboXRG997jM7TCd1l/XZAv8aAwvPplw3FJRybzkOGs9NDyLMoz87yJpR1T
xp511vnMyMbyKIGdungkt7cIyzaictHwaYzznsmuNdCPEjTaIQJr1ctsa4GEgtqf
pnkktZ9YbMCcHU0CtZ8GlOVqA9wE+FUm0/u0zgikzJQsB+HcNItiARTTTHRyco7p
VJCqK8o4Zx4ELV7QNkSH4nhFkVgRopvrvBiPAGro/qwGOofBg8W8wM8O1+V/MDmp
zSU22v4SncT1Xb7dtmdJqDEeHfDikhaCAb4Je2hsGQWzbdAqwHGlpa7vpk9x3Q5r
L+XyP+Z+rPHlXYyypJwUvvOQhXOmP0zYxcEHxByqIBfXiwy+3dN4tDDfatWbccwl
uTlTDM8kmQn6QzSztA==
=yb55
-----END PGP SIGNATURE-----
Merge tag 'seccomp-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp updates from Kees Cook:
"There are a bunch of clean ups and selftest improvements along with
two major updates to the SECCOMP_RET_USER_NOTIF filter return:
EPOLLHUP support to more easily detect the death of a monitored
process, and being able to inject fds when intercepting syscalls that
expect an fd-opening side-effect (needed by both container folks and
Chrome). The latter continued the refactoring of __scm_install_fd()
started by Christoph, and in the process found and fixed a handful of
bugs in various callers.
- Improved selftest coverage, timeouts, and reporting
- Add EPOLLHUP support for SECCOMP_RET_USER_NOTIF (Christian Brauner)
- Refactor __scm_install_fd() into __receive_fd() and fix buggy
callers
- Introduce 'addfd' command for SECCOMP_RET_USER_NOTIF (Sargun
Dhillon)"
* tag 'seccomp-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (30 commits)
selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD
seccomp: Introduce addfd ioctl to seccomp user notifier
fs: Expand __receive_fd() to accept existing fd
pidfd: Replace open-coded receive_fd()
fs: Add receive_fd() wrapper for __receive_fd()
fs: Move __scm_install_fd() to __receive_fd()
net/scm: Regularize compat handling of scm_detach_fds()
pidfd: Add missing sock updates for pidfd_getfd()
net/compat: Add missing sock updates for SCM_RIGHTS
selftests/seccomp: Check ENOSYS under tracing
selftests/seccomp: Refactor to use fixture variants
selftests/harness: Clean up kern-doc for fixtures
seccomp: Use -1 marker for end of mode 1 syscall list
seccomp: Fix ioctl number for SECCOMP_IOCTL_NOTIF_ID_VALID
selftests/seccomp: Rename user_trap_syscall() to user_notif_syscall()
selftests/seccomp: Make kcmp() less required
seccomp: Use pr_fmt
selftests/seccomp: Improve calibration loop
selftests/seccomp: use 90s as timeout
selftests/seccomp: Expand benchmark to per-filter measurements
...
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Flush the cleanup xtables worker to make sure destructors
have completed, from Florian Westphal.
2) iifgroup is matching erroneously, also from Florian.
3) Add selftest for meta interface matching, from Florian Westphal.
4) Move nf_ct_offload_timeout() to header, from Roi Dayan.
5) Call nf_ct_offload_timeout() from flow_offload_add() to
make sure garbage collection does not evict offloaded flow,
from Roi Dayan.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The new tests check that IP and IPv6 packets exceeding the local PMTU
estimate, forwarded by an Open vSwitch instance from another node,
result in the correct route exceptions being created, and that
communication with end-to-end fragmentation, over GENEVE and VXLAN
Open vSwitch ports, is now possible as a result of PMTU discovery.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new tests check that IP and IPv6 packets exceeding the local PMTU
estimate, both locally generated and forwarded by a bridge from
another node, result in the correct route exceptions being created,
and that communication with end-to-end fragmentation over VXLAN and
GENEVE tunnels is now possible as a result of PMTU discovery.
Part of the existing setup functions aren't generic enough to simply
add a namespace and a bridge to the existing routing setup. This
rework is in progress and we can easily shrink this once more generic
topology functions are available.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2020-08-04
The following pull-request contains BPF updates for your *net-next* tree.
We've added 73 non-merge commits during the last 9 day(s) which contain
a total of 135 files changed, 4603 insertions(+), 1013 deletions(-).
The main changes are:
1) Implement bpf_link support for XDP. Also add LINK_DETACH operation for the BPF
syscall allowing processes with BPF link FD to force-detach, from Andrii Nakryiko.
2) Add BPF iterator for map elements and to iterate all BPF programs for efficient
in-kernel inspection, from Yonghong Song and Alexei Starovoitov.
3) Separate bpf_get_{stack,stackid}() helpers for perf events in BPF to avoid
unwinder errors, from Song Liu.
4) Allow cgroup local storage map to be shared between programs on the same
cgroup. Also extend BPF selftests with coverage, from YiFei Zhu.
5) Add BPF exception tables to ARM64 JIT in order to be able to JIT BPF_PROBE_MEM
load instructions, from Jean-Philippe Brucker.
6) Follow-up fixes on BPF socket lookup in combination with reuseport group
handling. Also add related BPF selftests, from Jakub Sitnicki.
7) Allow to use socket storage in BPF_PROG_TYPE_CGROUP_SOCK-typed programs for
socket create/release as well as bind functions, from Stanislav Fomichev.
8) Fix an info leak in xsk_getsockopt() when retrieving XDP stats via old struct
xdp_statistics, from Peilin Ye.
9) Fix PT_REGS_RC{,_CORE}() macros in libbpf for MIPS arch, from Jerry Crunchtime.
10) Extend BPF kernel test infra with skb->family and skb->{local,remote}_ip{4,6}
fields and allow user space to specify skb->dev via ifindex, from Dmitry Yakunin.
11) Fix a bpftool segfault due to missing program type name and make it more robust
to prevent them in future gaps, from Quentin Monnet.
12) Consolidate cgroup helper functions across selftests and fix a v6 localhost
resolver issue, from John Fastabend.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a selftest for RED early_drop and mark qevents when a trap action is
attached at the associated block.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now skb->dev is unconditionally set to the loopback device in current net
namespace. But if we want to test bpf program which contains code branch
based on ifindex condition (eg filters out localhost packets) it is useful
to allow specifying of ifindex from userspace. This patch adds such option
through ctx_in (__sk_buff) parameter.
Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200803090545.82046-3-zeil@yandex-team.ru
Some of our tests use VSX or newer VMX instructions, so need to be
skipped on older CPUs to avoid SIGILL'ing.
Similarly TAR was added in v2.07, and the PMU event used in the stcx
fail test only works on Power8 or later.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200803020719.96114-1-mpe@ellerman.id.au
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl8mdWQUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroM48Qf/eUYXVdGQWErQ/BQR/pLPfTUkjkIj
1C/IGogWijbWPQrtIsnXVG53eULgHocbfzExZ8PzUSfjuZ/g9V8aHGbrGKbg/y6g
SVArpCTu+BDidmLMbjsAKh3f5SEbzZZuFKnoxoEEKiA2TeYqDQ05nxcv9+T4udrs
DWqXnk27y0vR9RtJkf5sqDnyH36cDT9TsbRFPaCt/hFE0UA65UTtWgl1ZaagzmuL
I70z5Ap+/6fsEMoKoys9AKSCpLRGUBu/42fQhZkEq9no0w5PBkzn/YMfapHwT9I+
1i9WUv3gn+FQOXpqAg0+aaPtWwIBIEONbm0qmw4yNNUI24Btq2QjGWxRqg==
=VQPu
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Bugfixes and strengthening the validity checks on inputs from new
userspace APIs.
Now I know why I shouldn't prepare pull requests on the weekend, it's
hard to concentrate if your son is shouting about his latest Minecraft
builds in your ear. Fortunately all the patches were ready and I just
had to check the test results..."
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SVM: Fix disable pause loop exit/pause filtering capability on SVM
KVM: LAPIC: Prevent setting the tscdeadline timer if the lapic is hw disabled
KVM: arm64: Don't inherit exec permission across page-table levels
KVM: arm64: Prevent vcpu_has_ptrauth from generating OOL functions
KVM: nVMX: check for invalid hdr.vmx.flags
KVM: nVMX: check for required but missing VMCS12 in KVM_SET_NESTED_STATE
selftests: kvm: do not set guest mode flag
core_retro selftest uses BPF program that's triggered on sys_enter
system-wide, but has no protection from some unrelated process doing syscall
while selftest is running. This leads to occasional test failures with
unexpected PIDs being returned. Fix that by filtering out all processes that
are not test_progs process.
Fixes: fcda189a51 ("selftests/bpf: Add test relying only on CO-RE and no recent kernel features")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200731204957.2047119-1-andriin@fb.com
Add bpf_link__detach() testing to selftests for cgroup, netns, and xdp
bpf_links.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200731182830.286260-4-andriin@fb.com
Nearly every user of cgroup helpers does the same sequence of API calls. So
push these into a single helper cgroup_setup_and_join. The cases that do
a bit of extra logic are test_progs which currently uses an env variable
to decide if it needs to setup the cgroup environment or can use an
existingi environment. And then tests that are doing cgroup tests
themselves. We skip these cases for now.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/159623335418.30208.15807461815525100199.stgit@john-XPS-13-9370
Daniel Borkmann says:
====================
pull-request: bpf 2020-07-31
The following pull-request contains BPF updates for your *net* tree.
We've added 5 non-merge commits during the last 21 day(s) which contain
a total of 5 files changed, 126 insertions(+), 18 deletions(-).
The main changes are:
1) Fix a map element leak in HASH_OF_MAPS map type, from Andrii Nakryiko.
2) Fix a NULL pointer dereference in __btf_resolve_helper_id() when no
btf_vmlinux is available, from Peilin Ye.
3) Init pos variable in __bpfilter_process_sockopt(), from Christoph Hellwig.
4) Fix a cgroup sockopt verifier test by specifying expected attach type,
from Jean-Philippe Brucker.
Note that when net gets merged into net-next later on, there is a small
merge conflict in kernel/bpf/btf.c between commit 5b801dfb7f ("bpf: Fix
NULL pointer dereference in __btf_resolve_helper_id()") from the bpf tree
and commit 138b9a0511 ("bpf: Remove btf_id helpers resolving") from the
net-next tree.
Resolve as follows: remove the old hunk with the __btf_resolve_helper_id()
function. Change the btf_resolve_helper_id() so it actually tests for a
NULL btf_vmlinux and bails out:
int btf_resolve_helper_id(struct bpf_verifier_log *log,
const struct bpf_func_proto *fn, int arg)
{
int id;
if (fn->arg_type[arg] != ARG_PTR_TO_BTF_ID || !btf_vmlinux)
return -EINVAL;
id = fn->btf_id[arg];
if (!id || id > btf_vmlinux->nr_types)
return -EINVAL;
return id;
}
Let me know if you run into any others issues (CC'ing Jiri Olsa so he's in
the loop with regards to merge conflict resolution).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Also add test cases with MP_JOIN when tcp_syncookies sysctl is 2 (i.e.,
syncookies are always-on).
While at it, also print the test number and add the test number
to the pcap files that can be generated optionally.
This makes it easier to match the pcap to the test case.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
check we can establish connections also when syn cookies are in use.
Check that
MPTcpExtMPCapableSYNRX and MPTcpExtMPCapableACKRX increase for each
MPTCP test.
Check TcpExtSyncookiesSent and TcpExtSyncookiesRecv increase in netns2.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
secure_computing() is called first in syscall_trace_enter() so that
a system call will be aborted quickly without doing succeeding syscall
tracing if seccomp rules want to deny that system call.
TODO:
- Update https://github.com/seccomp/libseccomp csky support
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Arnd Bergmann <arnd@arndb.de>
The txtimestamp selftest sets a fixed 500us tolerance. This value was
arrived at experimentally. Some platforms have higher variances. Make
this adjustable by adding the following flag:
-t N: tolerance (usec) for timestamp validation.
Signed-off-by: Jian Yang <jianyang@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When read netdevsim trap_flow_action_cookie, we need to init it first,
or we will get "Invalid argument" error.
Fixes: d3cbb907ae ("netdevsim: add ACL trap reporting cookie as a metadata")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Setting IFA_F_NODAD flag for IPv6 addresses to add to loopback is
unnecessary. Duplicate Address Detection does not happen on loopback
device.
Also, passing 'nodad' flag to 'ip address' breaks libbpf CI, which runs in
an environment with BusyBox implementation of 'ip' command, that doesn't
understand this flag.
Fixes: 0ab5539f85 ("selftests/bpf: Tests for BPF_SK_LOOKUP attach point")
Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Andrii Nakryiko <andrii@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200730125325.1869363-1-jakub@cloudflare.com
Check that link is NULL or proper pointer before invoking bpf_link__destroy().
Not doing this causes crash in test_progs, when cg_storage_multi selftest
fails.
Fixes: 3573f38401 ("selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200729045056.3363921-1-andriin@fb.com
This patch add xdpdrv mode for test_xdp_redirect.sh since veth has support
native mode. After update here is the test result:
# ./test_xdp_redirect.sh
selftests: test_xdp_redirect xdpgeneric [PASS]
selftests: test_xdp_redirect xdpdrv [PASS]
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: William Tu <u9012063@gmail.com>
Link: https://lore.kernel.org/bpf/20200729085658.403794-1-liuhangbin@gmail.com
Augment udp_limit test to set and verify socket storage value.
That should be enough to exercise the changes from the previous
patch.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200729003104.1280813-2-sdf@google.com
Commit afbf21dce6 ("bpf: Support readonly/readwrite buffers
in verifier") added readonly/readwrite buffer support which
is currently used by bpf_iter tracing programs. It has
a bug with incorrect parameter ordering which later fixed
by Commit f6dfbe31e8 ("bpf: Fix swapped arguments in calls
to check_buffer_access").
This patch added a test case with a negative offset access
which will trigger the error path.
Without Commit f6dfbe31e8, running the test case in the patch,
the error message looks like:
R1_w=rdwr_buf(id=0,off=0,imm=0) R10=fp0
; value_sum += *(__u32 *)(value - 4);
2: (61) r1 = *(u32 *)(r1 -4)
R1 invalid (null) buffer access: off=-4, size=4
With the above commit, the error message looks like:
R1_w=rdwr_buf(id=0,off=0,imm=0) R10=fp0
; value_sum += *(__u32 *)(value - 4);
2: (61) r1 = *(u32 *)(r1 -4)
R1 invalid rdwr buffer access: off=-4, size=4
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200728221801.1090406-1-yhs@fb.com
The size of the CPU affinity mask must be large enough for
systems with a very large number of CPUs. Otherwise, tests
which try to determine the first online CPU by calling
sched_getaffinity() will fail. This makes sure that the size
of the allocated affinity mask is dependent on the number of
CPUs as reported by get_nprocs_conf().
Fixes: 3752e453f6 ("selftests/powerpc: Add tests of PMU EBBs")
Reported-by: Shirisha Ganta <shiganta@in.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a408c4b8e9a23bb39b539417a21eb0ff47bb5127.1596084858.git.sandipan@linux.ibm.com
Add test validating that all inner maps are released properly after skeleton
is destroyed. To ensure determinism, trigger kernel-side synchronize_rcu()
before checking map existence by their IDs.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200729040913.2815687-2-andriin@fb.com
The test case check_highest_speed_is_chosen() configures $h1 to
advertise a subset of its supported speeds and checks that $h2 chooses
the highest speed from the subset.
To find the common advertised speeds between $h1 and $h2,
common_speeds_get() is called.
Currently, the first speed returned from common_speeds_get() is removed
claiming "h1 does not advertise this speed". The claim is wrong because
the function is called after $h1 already advertised a subset of speeds.
In case $h1 supports only two speeds, it will advertise a single speed
which will be later removed because of previously mentioned bug. This
results in the test needlessly failing. When more than two speeds are
supported this is not an issue because the first advertised speed
is the lowest one.
Fix this by not removing any speed from the list of commonly advertised
speeds.
Fixes: 64916b57c0 ("selftests: forwarding: Add speed and auto-negotiation test")
Reported-by: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When running under older versions of qemu of under newer versions with
old machine types, some security features will not be reported to the
guest. This will lead the guest OS to consider itself Vulnerable to
spectre_v2.
So, spectre_v2 test fails in such cases when the host is mitigated and
miss predictions cannot be detected as expected by the test.
Make it return the skip code instead, for this particular case. We
don't want to miss the case when the test fails and the system reports
as mitigated or not affected. But it is not a problem to miss failures
when the system reports as Vulnerable.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200728155039.401445-1-cascardo@canonical.com
On systems with large number of cpus, test fails trying to set
affinity by calling sched_setaffinity() with smaller size for affinity
mask. This patch fixes it by making sure that the size of allocated
affinity mask is dependent on the number of CPUs as reported by
get_nprocs().
Fixes: 00b7ec5c9c ("selftests/powerpc: Import Anton's context_switch2 benchmark")
Reported-by: Shirisha Ganta <shiganta@in.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Harish <harish@linux.ibm.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Reviewed-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200609081423.529664-1-harish@linux.ibm.com
We have custom stack expansion checks that it turns out are extremely
badly tested and contain bugs, surprise. So add some tests that
exercise the code and capture the current boundary conditions.
The signal test currently fails on 64-bit kernels because the 2048
byte allowance for the signal frame is too small, we will fix that in
a subsequent patch.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724092528.1578671-1-mpe@ellerman.id.au
For drivers that don't have the error handling callbacks we implement
recovery by removing the device and re-probing it. This causes the sysfs
directory for the PCI device to be removed which causes the following
spurious error to be printed when checking the PE state:
Breaking 0005:03:00.0...
./eeh-basic.sh: line 13: can't open /sys/bus/pci/devices/0005:03:00.0/eeh_pe_state: no such file
0005:03:00.0, waited 0/60
0005:03:00.0, waited 1/60
0005:03:00.0, waited 2/60
0005:03:00.0, waited 3/60
0005:03:00.0, waited 4/60
0005:03:00.0, waited 5/60
0005:03:00.0, waited 6/60
0005:03:00.0, waited 7/60
0005:03:00.0, Recovered after 8 seconds
We currently try to avoid this by checking if the PE state file exists
before reading from it. This is however inherently racy so re-work the
state checking so that we only read from the file once, and we squash any
errors that occur while reading.
Fixes: 85d86c8aa5 ("selftests/powerpc: Add basic EEH selftest")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200727010127.23698-1-oohall@gmail.com
Commit c46241a370 ("powerpc/pkeys: Check vma before
returning key fault error to the user") fixes a bug which
causes the kernel to set the wrong pkey in siginfo when a
pkey fault occurs after two competing threads that have
allocated different pkeys, one fully permissive and the
other restrictive, attempt to protect a common page at the
same time. This adds a test to detect the bug.
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ce40b6ee270bda52e8f4088578ed2faf7d1d509a.1595821792.git.sandipan@linux.ibm.com
The gettid() syscall wrapper was first introduced in
glibc 2.30. This adds a wrapper for use in distros
running older versions.
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8ca3b0eeda989707815d1cf337cc33f090408965.1595821792.git.sandipan@linux.ibm.com
Commit 192b6a7805 ("powerpc/book3s64/pkeys: Fix
pkey_access_permitted() for execute disable pkey") fixed a
bug that caused repetitive faults for pkeys with no execute
rights alongside some combination of read and write rights.
This removes the last two cases of the test, which check
the behaviour of pkeys with read, write but no execute
rights and all the rights, in favour of checking all the
possible combinations of read, write and execute rights
to be able to detect bugs like the one mentioned above.
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/db467500f8af47727bba6b35796e8974a78b71e5.1595821792.git.sandipan@linux.ibm.com
Using localhost requires the host to have a /etc/hosts file with that
specific line in it. By default my dev box did not, they used
ip6-localhost, so the test was failing. To fix remove the need for any
/etc/hosts and use ::1.
I could just add the line, but this seems easier.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/159594714197.21431.10113693935099326445.stgit@john-Precision-5820-Tower
Use the new MMU_NOTIFY_MIGRATE event to skip MMU invalidations of device
private memory and handle the invalidation in the driver as part of
migrating device private memory.
Link: https://lore.kernel.org/r/20200723223004.9586-6-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Add bpf_iter__bpf_map_elem and bpf_iter__bpf_sk_storage_map to bpf_iter.h.
Fixes: 3b1c420bd8 ("selftests/bpf: Add a test for bpf sk_storage_map iterator")
Fixes: 2a7c2fff7d ("selftests/bpf: Add test for bpf hash map iterators")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200727233345.1686358-1-andriin@fb.com
Xtensa syscall number can be obtained and changed through the
struct user_pt_regs. Syscall return value register is fixed relatively
to the current register window in the user_pt_regs, so it needs a bit of
special treatment.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
When size_t maps to unsigned int (e.g. on 32-bit powerpc), then the
comparison with 1<<35 is always true. Clang 9 threw:
warning: result of comparison of constant 34359738368 with \
expression of type 'size_t' (aka 'unsigned int') is always true \
[-Wtautological-constant-out-of-range-compare]
while (total < FILE_SZ) {
Tested: make -C tools/testing/selftests TARGETS="net" run_tests
Fixes: 192dc405f3 ("selftests: net: add tcp_mmap program")
Signed-off-by: Tanner Love <tannerlove@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On powerpcle, int64_t maps to long long. Clang 9 threw:
warning: absolute value function 'labs' given an argument of type \
'long long' but has parameter of type 'long' which may cause \
truncation of value [-Wabsolute-value]
if (labs(tstop - texpect) > cfg_variance_us)
Tested: make -C tools/testing/selftests TARGETS="net" run_tests
Fixes: af5136f950 ("selftests/net: SO_TXTIME with ETF and FQ")
Signed-off-by: Tanner Love <tannerlove@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clang 9 threw:
warning: format specifies type 'unsigned short' but the argument has \
type 'int' [-Wformat]
typeflags, PORT_BASE, PORT_BASE + port_off);
Tested: make -C tools/testing/selftests TARGETS="net" run_tests
Fixes: 77f65ebdca ("packet: packet fanout rollover during socket overload")
Signed-off-by: Tanner Love <tannerlove@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The signedness of char is implementation-dependent. Some systems
(including PowerPC and ARM) use unsigned char. Clang 9 threw:
warning: result of comparison of constant -1 with expression of type \
'char' is always true [-Wtautological-constant-out-of-range-compare]
&arg_index)) != -1) {
Tested: make -C tools/testing/selftests TARGETS="net" run_tests
Fixes: 16e7812241 ("selftests/net: Add a test to validate behavior of rx timestamps")
Signed-off-by: Tanner Love <tannerlove@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hdr.vmx.flags is meant for future extensions to the ABI, rejecting
invalid flags is necessary to avoid broken half-loads of the
nVMX state.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A missing VMCS12 was not causing -EINVAL (it was just read with
copy_from_user, so it is not a security issue, but it is still
wrong). Test for VMCS12 validity and reject the nested state
if a VMCS12 is required but not present.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Setting KVM_STATE_NESTED_GUEST_MODE enables various consistency checks
on VMCS12 and therefore causes KVM_SET_NESTED_STATE to fail spuriously
with -EINVAL. Do not set the flag so that we're sure to cover the
conditions included by the test, and cover the case where VMCS12 is
set and KVM_SET_NESTED_STATE is called with invalid VMCS12 contents.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add selftest validating all the attachment logic around BPF XDP link. Test
also link updates and get_obj_info() APIs.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-9-andriin@fb.com
Sync UAPI header and add support for using bpf_link-based XDP attachment.
Make xdp/ prog type set expected attach type. Kernel didn't enforce
attach_type for XDP programs before, so there is no backwards compatiblity
issues there.
Also fix section_names selftest to recognize that xdp prog types now have
expected attach type.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-8-andriin@fb.com
This mirrors the original egress-only test. The cgroup_storage is
now extended to have two packet counters, one for egress and one
for ingress. We also extend to have two egress programs to test
that egress will always share with other egress origrams in the
same cgroup. The behavior of the counters are exactly the same as
the original egress-only test.
The test is split into two, one "isolated" test that when the key
type is struct bpf_cgroup_storage_key, which contains the attach
type, programs of different attach types will see different
storages. The other, "shared" test that when the key type is u64,
programs of different attach types will see the same storage if
they are attached to the same cgroup.
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/c756f5f1521227b8e6e90a453299dda722d7324d.1595565795.git.zhuyifei@google.com
The current assumption is that the lifetime of a cgroup storage
is tied to the program's attachment. The storage is created in
cgroup_bpf_attach, and released upon cgroup_bpf_detach and
cgroup_bpf_release.
Because the current semantics is that each attachment gets a
completely independent cgroup storage, and you can have multiple
programs attached to the same (cgroup, attach type) pair, the key
of the CGROUP_STORAGE map, looking up the map with this pair could
yield multiple storages, and that is not permitted. Therefore,
the kernel verifier checks that two programs cannot share the same
CGROUP_STORAGE map, even if they have different expected attach
types, considering that the actual attach type does not always
have to be equal to the expected attach type.
The test creates a CGROUP_STORAGE map and make it shared across
two different programs, one cgroup_skb/egress and one /ingress.
It asserts that the two programs cannot be both loaded, due to
verifier failure from the above reason.
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/30a6b0da67ae6b0296c4d511bfb19c5f3d035916.1595565795.git.zhuyifei@google.com
This test creates a parent cgroup, and a child of that cgroup.
It attaches a cgroup_skb/egress program that simply counts packets,
to a global variable (ARRAY map), and to a CGROUP_STORAGE map.
The program is first attached to the parent cgroup only, then to
parent and child.
The test cases sends a message within the child cgroup, and because
the program is inherited across parent / child cgroups, it will
trigger the egress program for both the parent and child, if they
exist. The program, when looking up a CGROUP_STORAGE map, uses the
cgroup and attach type of the attachment parameters; therefore,
both attaches uses different cgroup storages.
We assert that all packet counts returns what we expects.
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/5a20206afa4606144691c7caa0d1b997cd60dec0.1595565795.git.zhuyifei@google.com
This test confirms that BPF program that calls bpf_get_stackid() cannot
attach to perf_event with precise_ip > 0 but not PERF_SAMPLE_CALLCHAIN;
and cannot attach if the perf_event has exclude_callchain_kernel.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723180648.1429892-6-songliubraving@fb.com
This tests new helper function bpf_get_stackid_pe and bpf_get_stack_pe.
These two helpers have different implementation for perf_event with PEB
entries.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200723180648.1429892-5-songliubraving@fb.com
If the bpf program contains out of bound access w.r.t. a
particular map key/value size, the verification will be
still okay, e.g., it will be accepted by verifier. But
it will be rejected during link_create time. A test
is added here to ensure link_create failure did happen
if out of bound access happened.
$ ./test_progs -n 4
...
#4/23 rdonly-buf-out-of-bound:OK
...
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184124.591700-1-yhs@fb.com
Two subtests are added.
$ ./test_progs -n 4
...
#4/20 bpf_array_map:OK
#4/21 bpf_percpu_array_map:OK
...
The bpf_array_map subtest also tested bpf program
changing array element values and send key/value
to user space through bpf_seq_write() interface.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184121.591367-1-yhs@fb.com
Cover the case when BPF socket lookup returns a socket that belongs to a
reuseport group, and the reuseport group contains connected UDP sockets.
Ensure that the presence of connected UDP sockets in reuseport group does
not affect the socket lookup result. Socket selected by reuseport should
always be used as result in such case.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Link: https://lore.kernel.org/bpf/20200722161720.940831-3-jakub@cloudflare.com
Augment the existing firmware update emulation to track activations and
validate proper update vs activate sequencing.
The DIMM firmware activate capability has a concept of a maximum amount
of time platform firmware will quiesce the system relative to how many
DIMMs are being activated in parallel. Simulate that DIMM activation
happens serially, 1 second per-DIMM, and limit the max at 3 seconds. The
nfit_test0 bus emulates 5 DIMMs so it will take 2 activations to update
all DIMMs.
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Reported-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
In preparation for adding a mocked implementation of the
firmware-activate bus-info command, rework nfit_ctl_test() to operate on
a local command payload wrapped in a 'struct nd_cmd_pkg'.
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Arrange the for nfit_test_ctl() path to dump command payloads similarly
to the acpi_nfit_ctl() path. This is useful for comparing the
sequence of command events between an emulated ACPI-NFIT platform and a
real one.
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
The ND_CMD_CALL path only applies to the nfit_test0 emulated DIMMs.
Cleanup occurrences of (i - t->dcr_idx) since that offset fixup only
applies to cases where nfit_test1 needs a bus-local index.
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
DSMs are strictly an ACPI mechanism, evict the bus_dsm_mask concept from
the generic 'struct nvdimm_bus_descriptor' object.
As a side effect the test facility ->bus_nfit_cmd_force_en is no longer
necessary. The test infrastructure can communicate that information
directly in ->bus_dsm_mask.
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
The UDP reuseport conflict was a little bit tricky.
The net-next code, via bpf-next, extracted the reuseport handling
into a helper so that the BPF sk lookup code could invoke it.
At the same time, the logic for reuseport handling of unconnected
sockets changed via commit efc6b6f6c3
which changed the logic to carry on the reuseport result into the
rest of the lookup loop if we do not return immediately.
This requires moving the reuseport_has_conns() logic into the callers.
While we are here, get rid of inline directives as they do not belong
in foo.c files.
The other changes were cases of more straightforward overlapping
modifications.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix RCU locaking in iwlwifi, from Johannes Berg.
2) mt76 can access uninitialized NAPI struct, from Felix Fietkau.
3) Fix race in updating pause settings in bnxt_en, from Vasundhara
Volam.
4) Propagate error return properly during unbind failures in ax88172a,
from George Kennedy.
5) Fix memleak in adf7242_probe, from Liu Jian.
6) smc_drv_probe() can leak, from Wang Hai.
7) Don't muck with the carrier state if register_netdevice() fails in
the bonding driver, from Taehee Yoo.
8) Fix memleak in dpaa_eth_probe, from Liu Jian.
9) Need to check skb_put_padto() return value in hsr_fill_tag(), from
Murali Karicheri.
10) Don't lose ionic RSS hash settings across FW update, from Shannon
Nelson.
11) Fix clobbered SKB control block in act_ct, from Wen Xu.
12) Missing newlink in "tx_timeout" sysfs output, from Xiongfeng Wang.
13) IS_UDPLITE cleanup a long time ago, incorrectly handled
transformations involving UDPLITE_RECV_CC. From Miaohe Lin.
14) Unbalanced locking in netdevsim, from Taehee Yoo.
15) Suppress false-positive error messages in qed driver, from Alexander
Lobakin.
16) Out of bounds read in ax25_connect and ax25_sendmsg, from Peilin Ye.
17) Missing SKB release in cxgb4's uld_send(), from Navid Emamdoost.
18) Uninitialized value in geneve_changelink(), from Cong Wang.
19) Fix deadlock in xen-netfront, from Andera Righi.
19) flush_backlog() frees skbs with IRQs disabled, so should use
dev_kfree_skb_irq() instead of kfree_skb(). From Subash Abhinov
Kasiviswanathan.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits)
drivers/net/wan: lapb: Corrected the usage of skb_cow
dev: Defer free of skbs in flush_backlog
qrtr: orphan socket in qrtr_release()
xen-netfront: fix potential deadlock in xennet_remove()
flow_offload: Move rhashtable inclusion to the source file
geneve: fix an uninitialized value in geneve_changelink()
bonding: check return value of register_netdevice() in bond_newlink()
tcp: allow at most one TLP probe per flight
AX.25: Prevent integer overflows in connect and sendmsg
cxgb4: add missing release on skb in uld_send()
net: atlantic: fix PTP on AQC10X
AX.25: Prevent out-of-bounds read in ax25_sendmsg()
sctp: shrink stream outq when fails to do addstream reconf
sctp: shrink stream outq only when new outcnt < old outcnt
AX.25: Fix out-of-bounds read in ax25_connect()
enetc: Remove the mdio bus on PF probe bailout
net: ethernet: ti: add NETIF_F_HW_TC hw feature flag for taprio offload
net: ethernet: ave: Fix error returns in ave_init
drivers/net/wan/x25_asy: Fix to make it work
ipvs: fix the connection sync failed in some cases
...
The firmware tests would always time out for me. Add a correct timeout,
including details on how the value was reached. Additionally allow the
test harness to skip comments in settings files and report how long a
given timeout was.
Reviewed-by: SeongJae Park <sjpark@amazon.de>
Acked-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200724213640.389191-3-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Randy reported compile failure when CONFIG_SYSCTL is not set/enabled:
ERROR: modpost: "sysctl_vals" [drivers/net/vrf.ko] undefined!
Fix by splitting out the sysctl init and cleanup into helpers that
can be set to do nothing when CONFIG_SYSCTL is disabled. In addition,
move vrf_strict_mode and vrf_strict_mode_change to above
vrf_shared_table_handler (code move only) and wrap all of it
in the ifdef CONFIG_SYSCTL.
Update the strict mode tests to check for the existence of the
/proc/sys entry.
Fixes: 33306f1aaf ("vrf: add sysctl parameter for strict mode")
Cc: Andrea Mayer <andrea.mayer@uniroma2.it>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David Ahern <dsahern@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: David S. Miller <davem@davemloft.net>
Update our memcmp selftest, to test the case where we're comparing up
to the end of a page and the subsequent page is not mapped. We have to
make sure we don't read off the end of the page and cause a fault.
We had a bug there in the past, fixed in commit
d947075739 ("powerpc/64: Fix memcmp reading past the end of src/dest").
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200722055315.962391-1-mpe@ellerman.id.au
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-07-21
The following pull-request contains BPF updates for your *net-next* tree.
We've added 46 non-merge commits during the last 6 day(s) which contain
a total of 68 files changed, 4929 insertions(+), 526 deletions(-).
The main changes are:
1) Run BPF program on socket lookup, from Jakub.
2) Introduce cpumap, from Lorenzo.
3) s390 JIT fixes, from Ilya.
4) teach riscv JIT to emit compressed insns, from Luke.
5) use build time computed BTF ids in bpf iter, from Yonghong.
====================
Purely independent overlapping changes in both filter.h and xdp.h
Signed-off-by: David S. Miller <davem@davemloft.net>
According to 'man 8 ip-netns', if `ip netns identify` returns an empty string,
there's no net namespace associated with current PID: fix the net ns entrance
logic.
Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Existing BTF_ID_LIST used a local static variable
to store btf_ids. This patch provided a new macro
BTF_ID_LIST_GLOBAL to store btf_ids in a global
variable which can be shared among multiple files.
The existing BTF_ID_LIST is still retained.
Two reasons. First, BTF_ID_LIST is also used to build
btf_ids for helper arguments which typically
is an array of 5. Since typically different
helpers have different signature, it makes
little sense to share them. Second, some
current computed btf_ids are indeed local.
If later those btf_ids are shared between
different files, they can use BTF_ID_LIST_GLOBAL then.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/bpf/20200720163401.1393159-1-yhs@fb.com
Sync kernel header btf_ids.h to tools directory.
Also define macro CONFIG_DEBUG_INFO_BTF before
including btf_ids.h in prog_tests/resolve_btfids.c
since non-stub definitions for BTF_ID_LIST etc. macros
are defined under CONFIG_DEBUG_INFO_BTF. This
prevented test_progs from failing.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200720163359.1393079-1-yhs@fb.com
A handful of samples and selftests fail to build on s390, because
after commit 0ebeea8ca8 ("bpf: Restrict bpf_probe_read{, str}()
only to archs where they work") bpf_probe_read is not available
anymore.
Fix by using bpf_probe_read_kernel.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200720114806.88823-1-iii@linux.ibm.com
OpenBSD netcat (Debian patchlevel 1.195-2) does not seem to react to
SIGINT for whatever reason, causing prefix.pl to hang after
test_lwt_seg6local.sh exits due to netcat inheriting
test_lwt_seg6local.sh's file descriptors.
Fix by using SIGTERM instead.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200720101810.84299-1-iii@linux.ibm.com
When running out of srctree, relative path to lib/test_bpf.ko is
different than when running in srctree. Check $building_out_of_srctree
environment variable and use a different relative path if needed.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717165326.6786-2-iii@linux.ibm.com
ISA v3.1 does not support the SAO storage control attribute required to
implement PROT_SAO. PROT_SAO was used by specialised system software
(Lx86) that has been discontinued for about 7 years, and is not thought
to be used elsewhere, so removal should not cause problems.
We rather remove it than keep support for older processors, because
live migrating guest partitions to newer processors may not be possible
if SAO is in use (or worse allowed with silent races).
- PROT_SAO stays in the uapi header so code using it would still build.
- arch_validate_prot() is removed, the generic version rejects PROT_SAO
so applications would get a failure at mmap() time.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Drop KVM change for the time being]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200703011958.1166620-3-npiggin@gmail.com
The per_event_excludes test wants to run on Power8 or later. But
currently it checks that AT_BASE_PLATFORM *equals* power8, which means
it only runs on Power8.
Fix it to check for the ISA 2.07 feature, which will be set on Power8
and later CPUs.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200716122142.3776261-1-mpe@ellerman.id.au
With commit 4a4a5e5d2a ("powerpc/pkeys: key allocation/deallocation
must not change pkey registers") we are not updating UAMOR on key
allocation. So don't update the expected uamor value in the test.
Fixes: 4a4a5e5d2a ("powerpc/pkeys: key allocation/deallocation must not change pkey registers")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200709032946.881753-23-aneesh.kumar@linux.ibm.com
Add tdc to existing kselftest infrastructure so that it can be run with
existing kselftests. TDC now generates objects in objdir/kselftest
without cluttering main objdir, leaves source directory clean, and
installs correctly in kselftest_install, properly adding itself to
run_kselftest.sh script.
Add tc-testing as a target of selftests/Makefile. Create tdc.sh to run
tdc.py targets with correct arguments. To support single target from
selftest/Makefile, combine tc-testing/bpf/Makefile and
tc-testing/Makefile. Move action.c up a directory to tc-testing/.
Tested with:
make O=/tmp/{objdir} TARGETS="tc-testing" kselftest
cd /tmp/{objdir}
cd kselftest
cd tc-testing
./tdc.sh
make -C tools/testing/selftests/ TARGETS=tc-testing run_tests
make TARGETS="tc-testing" kselftest
cd tools/testing/selftests
./kselftest_install.sh /tmp/exampledir
My VM doesn't run all the kselftests so I commented out all except my
target and net/pmtu.sh then:
cd /tmp/exampledir && ./run_kselftest.sh
Co-developed-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Briana Oursler <briana.oursler@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the example program for PTP ancillary functionality with the
ability to configure not only the periodic output's period (frequency),
but also the phase and duty cycle (pulse width) which were newly
introduced.
The ioctl level also needs to be updated to the new PTP_PEROUT_REQUEST2,
since the original PTP_PEROUT_REQUEST doesn't support this
functionality. For an in-tree testing program, not having explicit
backwards compatibility is fine, as it should always be tested with the
current kernel headers and sources.
Tested with an oscilloscope on the felix switch PHC:
echo '2 0' > /sys/class/ptp/ptp1/pins/switch_1588_dat0
./testptp -d /dev/ptp1 -p 1000000000 -w 100000000 -H 1000 -i 0
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since 'perout' holds the nanosecond value of the signal's period, it
should be a 64-bit value. Current assumption is that it cannot be larger
than 1 second.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a test that changes its UID, uses capabilities to
get CAP_CHECKPOINT_RESTORE and uses clone3() with set_tid to
create a process with a given PID as non-root.
Signed-off-by: Adrian Reber <areber@redhat.com>
Link: https://lore.kernel.org/r/20200719100418.2112740-8-areber@redhat.com
[christian.brauner@ubuntu.com: use TH_LOG() instead of ksft_print_msg()]
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
A fix to the VAS code we merged this cycle, to report the proper error code to
userspace for address translation failures. And a selftest update to match.
Another fix for our pkey handling of PROT_EXEC mappings.
A fix for a crash when booting a "secure VM" under an ultravisor with certain
numbers of CPUs.
Thanks to:
Aneesh Kumar K.V, Haren Myneni, Laurent Dufour, Sandipan Das, Satheesh
Rajendran, Thiago Jung Bauermann.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl8S7Q8THG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgA9WEAC580vTBCte54XEpRPfKLs0g/piM93z
+rDDKFFkHrxI+EpySeg20jlDMc/3EoevLN6UNT5I2hrZ5QcNF17RPVmjRoHP0w2l
ixy9B6Y+auFwos1yEec14ZLJ7iWsnko0SlgtWIAsn9r35hJYFtC3+ho3/XWO0lnF
0jb31uZim4nFQvGSNwe3oZ3/rJsKwWtPZw0WznFr9GB4pMOnrspV/zx796RI9OKY
khwm4y6pas5Duk9GUJPJjOIk4Mag4yLTXuhzJ5G5UeuUchZRxCTVcdnXEdGXeyte
9ZJnRjbvbDjTM9qyk2lPSHv5zFHfEbglDkp2zoKX2Ie083LIcKlkwgeFvlBjhdxQ
qwko27DXIZdmKTsSiFDODI0VlyK3ZHumCX/m2Ctg9/VmeSiYacebQjcS7DmAwQeE
6h2bRL2TiTLRkgWiD4HOvHZZTy22pVgRcYe/pwGzMMXJW6SLQ9GUOhhar4XEYMgj
pzn86uZRVJLf90qdUkI9sl8p/PthGlfehqHivfwgKYk/0H1AyGkChO3NB1mLiCfS
WC+7J/lDIvtAMnC+536LqZT5l46ntt5RQ5tUcHfvn4bFoh5ndeav0Y9hXEXblyYI
32lYj/paAmzR2kuHOzOQAa4hwy9rnKEQiGYsF1RcpMO5zdNupXl/EPY5WaKYyEx7
p+eGalBNTf8zuw==
=eEXz
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux into master
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 5.8:
- A fix to the VAS code we merged this cycle, to report the proper
error code to userspace for address translation failures. And a
selftest update to match.
- Another fix for our pkey handling of PROT_EXEC mappings.
- A fix for a crash when booting a "secure VM" under an ultravisor
with certain numbers of CPUs.
Thanks to: Aneesh Kumar K.V, Haren Myneni, Laurent Dufour, Sandipan
Das, Satheesh Rajendran, Thiago Jung Bauermann"
* tag 'powerpc-5.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
selftests/powerpc: Use proper error code to check fault address
powerpc/vas: Report proper error code for address translation failure
powerpc/pseries/svm: Fix incorrect check for shared_lppaca_size
powerpc/book3s64/pkeys: Fix pkey_access_permitted() for execute disable pkey
Commit 01397e822a ("kunit: Fix TabError, remove defconfig code and
handle when there is no kunitconfig") and commit 45ba7a893a ("kunit:
kunit_tool: Separate out config/build/exec/parse") introduced two
closely related issues which built off of each other: they excessively
created the build directory when not present and modified a constant
(constants in Python only exist by convention).
Together these issues broken a number of unit tests for KUnit tool, so
fix them.
Fixed up commit log to fic checkpatch commit description style error.
Shuah Khan <skhan@linuxfoundation.org>
Fixes: 01397e822a ("kunit: Fix TabError, remove defconfig code and handle when there is no kunitconfig")
Fixes: 45ba7a893a ("kunit: kunit_tool: Separate out config/build/exec/parse")
Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Commit ddbd60c779 ("kunit: use --build_dir=.kunit as default") changed
the default build directory for KUnit tests, but failed to update
associated unit tests for kunit_tool, so update them.
Fixes: ddbd60c779 ("kunit: use --build_dir=.kunit as default")
Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Direct stderr to subprocess.STDOUT so error messages get included in the
subprocess.CalledProcessError exceptions output field. This results in
more meaningful error messages for the user.
This is already being done in the make_allyesconfig method. Do the same
for make_mrproper, make_olddefconfig, and make methods.
With this, failures on unclean trees [1] will give users an error
message that includes:
"The source tree is not clean, please run 'make ARCH=um mrproper'"
[1] https://bugzilla.kernel.org/show_bug.cgi?id=205219
Signed-off-by: Will Chen <chenwi@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
When the selftest "step" counter grew beyond 255, non-fatal warnings
were being emitted, which is noisy and pointless. There are selftests
with more than 255 steps (especially those in loops, etc). Instead,
just cap "steps" to 254 and do not report the saturation.
Reported-by: Ralph Campbell <rcampbell@nvidia.com>
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Fixes: 9847d24af9 ("selftests/harness: Refactor XFAIL into SKIP")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
There should be no difference between -1 and other negative syscalls
while tracing.
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Will Deacon <will@kernel.org>
Cc: Keno Fischer <keno@juliacomputing.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Now that the selftest harness has variants, use them to eliminate a
bunch of copy/paste duplication.
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The FIXTURE*() macro kern-doc examples had the wrong names for the C code
examples associated with them. Fix those and clarify that FIXTURE_DATA()
usage should be avoided.
Cc: Shuah Khan <shuah@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Fixes: 74bc7c97fa ("kselftest: add fixture variants")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Make sure we don't regress the CAP_SYSLOG behavior of the module address
visibility via /proc/modules nor /sys/module/*/sections/*.
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Andrii reported that sockopt_inherit occasionally hangs up on 5.5 kernel [0].
This can happen if server_thread runs faster than the main thread.
In that case, pthread_cond_wait will wait forever because
pthread_cond_signal was executed before the main thread was blocking.
Let's move pthread_mutex_lock up a bit to make sure server_thread
runs strictly after the main thread goes to sleep.
(Not sure why this is 5.5 specific, maybe scheduling is less
deterministic? But I was able to confirm that it does indeed
happen in a VM.)
[0] https://lore.kernel.org/bpf/CAEf4BzY0-bVNHmCkMFPgObs=isUAyg-dFzGDY7QWYkmm7rmTSg@mail.gmail.com/
Reported-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200715224107.3591967-1-sdf@google.com
Similar to what have been done for DEVMAP, introduce tests to verify
ability to add a XDP program to an entry in a CPUMAP.
Verify CPUMAP programs can not be attached to devices as a normal
XDP program, and only programs with BPF_XDP_CPUMAP attach type can
be loaded in a CPUMAP.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/9c632fcea5382ea7b4578bd06b6eddf382c3550b.1594734381.git.lorenzo@kernel.org
Test that policers shared by different tc filters are correctly
reference counted by observing policers' occupancy via devlink-resource.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Query the maximum number of supported policers using devlink-resource
and test that this number can be reached by configuring tc filters with
police action. Test that an error is returned in case the maximum number
is exceeded.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test that upper and lower limits on rate and burst size imposed by the
device are rejected by the kernel.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test tc-police action in various scenarios such as Rx policing, Tx
policing, shared policer and police piped to mirred. The test passes
with both veth pairs and loopbacked ports.
# ./tc_police.sh
TEST: police on rx [ OK ]
TEST: police on tx [ OK ]
TEST: police with shared policer - rx [ OK ]
TEST: police with shared policer - tx [ OK ]
TEST: police rx and mirror [ OK ]
TEST: police tx and mirror [ OK ]
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ERR_NX_TRANSLATION(CSB.CC=5) is for internal to VAS for fault handling
and should not used by OS. ERR_NX_AT_FAULT(CSB.CC=250) is the proper
error code should be reported by OS when NX encounters address
translation failure.
This patch uses CC=250 to determine the fault address when the request
is not successful.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0315251705baff94f678c33178491b5008723511.camel@linux.ibm.com
With procfs v3.3.16, the sysctl command doesn't print the set key and
value on error. This change breaks livepatch selftest test-ftrace.sh,
that tests the interaction of sysctl ftrace_enabled:
Make it work with all sysctl versions using '-q' option.
Explicitly print the final status on success so that it can be verified
in the log. The error message is enough on failure.
Reported-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Reviewed-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lore.kernel.org/r/20200714091030.1611-1-pmladek@suse.com
Test whether we can add file descriptors in response to notifications.
This injects the file descriptors via notifications, and then uses kcmp
to determine whether or not it has been successful.
It also includes some basic sanity checking for arguments.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Chris Palmer <palmer@google.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Robert Sesek <rsesek@google.com>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Matt Denton <mpdenton@google.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/20200603011044.7972-5-sargun@sargun.me
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
During setup():
...
for ns in h0 r1 h1 h2 h3
do
create_ns ${ns}
done
...
while in cleanup():
...
for n in h1 r1 h2 h3 h4
do
ip netns del ${n} 2>/dev/null
done
...
and after removing the stderr redirection in cleanup():
$ sudo ./fib_nexthop_multiprefix.sh
...
TEST: IPv4: host 0 to host 3, mtu 1400 [ OK ]
TEST: IPv6: host 0 to host 3, mtu 1400 [ OK ]
Cannot remove namespace file "/run/netns/h4": No such file or directory
$ echo $?
1
and a non-zero return code, make kselftests fail (even if the test
itself is fine):
...
not ok 34 selftests: net: fib_nexthop_multiprefix.sh # exit=1
...
Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BusyBox diff doesn't support the GNU diff '--LTYPE-line-format' options
that were used in the selftests to filter older kernel log messages from
dmesg output.
Use "comm" which is more available in smaller boot environments.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Reviewed-by: Yannick Cote <ycote@redhat.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200710183745.19730-1-joe.lawrence@redhat.com
Alexei Starovoitov says:
====================
pull-request: bpf-next 2020-07-13
The following pull-request contains BPF updates for your *net-next* tree.
We've added 36 non-merge commits during the last 7 day(s) which contain
a total of 62 files changed, 2242 insertions(+), 468 deletions(-).
The main changes are:
1) Avoid trace_printk warning banner by switching bpf_trace_printk to use
its own tracing event, from Alan.
2) Better libbpf support on older kernels, from Andrii.
3) Additional AF_XDP stats, from Ciara.
4) build time resolution of BTF IDs, from Jiri.
5) BPF_CGROUP_INET_SOCK_RELEASE hook, from Stanislav.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a selftest for offloading a mirror action attached to the block
associated with RED early_drop qevent.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reliably remove all the type modifiers from read-only (.rodata) global
variable definitions, including cases of inner field const modifiers and
arrays of const values.
Also modify one of selftests to ensure that const volatile struct doesn't
prevent user-space from modifying .rodata variable.
Fixes: 985ead416d ("bpftool: Add skeleton codegen command")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200713232409.3062144-3-andriin@fb.com
Adding resolve_btfids test under test_progs suite.
It's possible to use btf_ids.h header and its logic in
user space application, so we can add easy test for it.
The test defines BTF_ID_LIST and checks it gets properly
resolved.
For this reason the test_progs binary (and other binaries
that use TRUNNER* macros) is processed with resolve_btfids
tool, which resolves BTF IDs in .BTF_ids section. The BTF
data are taken from btf_data.o object rceated from
progs/btf_data.c.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200711215329.41165-10-jolsa@kernel.org
Pull networking fixes from David Miller:
1) Restore previous behavior of CAP_SYS_ADMIN wrt loading networking
BPF programs, from Maciej Żenczykowski.
2) Fix dropped broadcasts in mac80211 code, from Seevalamuthu
Mariappan.
3) Slay memory leak in nl80211 bss color attribute parsing code, from
Luca Coelho.
4) Get route from skb properly in ip_route_use_hint(), from Miaohe Lin.
5) Don't allow anything other than ARPHRD_ETHER in llc code, from Eric
Dumazet.
6) xsk code dips too deeply into DMA mapping implementation internals.
Add dma_need_sync and use it. From Christoph Hellwig
7) Enforce power-of-2 for BPF ringbuf sizes. From Andrii Nakryiko.
8) Check for disallowed attributes when loading flow dissector BPF
programs. From Lorenz Bauer.
9) Correct packet injection to L3 tunnel devices via AF_PACKET, from
Jason A. Donenfeld.
10) Don't advertise checksum offload on ipa devices that don't support
it. From Alex Elder.
11) Resolve several issues in TCP MD5 signature support. Missing memory
barriers, bogus options emitted when using syncookies, and failure
to allow md5 key changes in established states. All from Eric
Dumazet.
12) Fix interface leak in hsr code, from Taehee Yoo.
13) VF reset fixes in hns3 driver, from Huazhong Tan.
14) Make loopback work again with ipv6 anycast, from David Ahern.
15) Fix TX starvation under high load in fec driver, from Tobias
Waldekranz.
16) MLD2 payload lengths not checked properly in bridge multicast code,
from Linus Lüssing.
17) Packet scheduler code that wants to find the inner protocol
currently only works for one level of VLAN encapsulation. Allow
Q-in-Q situations to work properly here, from Toke
Høiland-Jørgensen.
18) Fix route leak in l2tp, from Xin Long.
19) Resolve conflict between the sk->sk_user_data usage of bpf reuseport
support and various protocols. From Martin KaFai Lau.
20) Fix socket cgroup v2 reference counting in some situations, from
Cong Wang.
21) Cure memory leak in mlx5 connection tracking offload support, from
Eli Britstein.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits)
mlxsw: pci: Fix use-after-free in case of failed devlink reload
mlxsw: spectrum_router: Remove inappropriate usage of WARN_ON()
net: macb: fix call to pm_runtime in the suspend/resume functions
net: macb: fix macb_suspend() by removing call to netif_carrier_off()
net: macb: fix macb_get/set_wol() when moving to phylink
net: macb: mark device wake capable when "magic-packet" property present
net: macb: fix wakeup test in runtime suspend/resume routines
bnxt_en: fix NULL dereference in case SR-IOV configuration fails
libbpf: Fix libbpf hashmap on (I)LP32 architectures
net/mlx5e: CT: Fix memory leak in cleanup
net/mlx5e: Fix port buffers cell size value
net/mlx5e: Fix 50G per lane indication
net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash
net/mlx5e: Fix VXLAN configuration restore after function reload
net/mlx5e: Fix usage of rcu-protected pointer
net/mxl5e: Verify that rpriv is not NULL
net/mlx5: E-Switch, Fix vlan or qos setting in legacy mode
net/mlx5: Fix eeprom support for SFP module
cgroup: Fix sock_cgroup_data on big-endian.
selftests: bpf: Fix detach from sockmap tests
...
Since the BPF_PROG_TYPE_CGROUP_SOCKOPT verifier test does not set an
attach type, bpf_prog_load_check_attach() disallows loading the program
and the test is always skipped:
#434/p perfevent for cgroup sockopt SKIP (unsupported program type 25)
Fix the issue by setting a valid attach type.
Fixes: 0456ea170c ("bpf: Enable more helpers for BPF_PROG_TYPE_CGROUP_{DEVICE,SYSCTL,SOCKOPT}")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200710150439.126627-1-jean-philippe@linaro.org
There should be no difference between -1 and other negative syscalls
while tracing.
Cc: Keno Fischer <keno@juliacomputing.com>
Tested-by: Will Deacon <will@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Now that the selftest harness has variants, use them to eliminate a
bunch of copy/paste duplication.
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Will Deacon <will@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
The FIXTURE*() macro kern-doc examples had the wrong names for the C code
examples associated with them. Fix those and clarify that FIXTURE_DATA()
usage should be avoided.
Cc: Shuah Khan <shuah@kernel.org>
Fixes: 74bc7c97fa ("kselftest: add fixture variants")
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
When SECCOMP_IOCTL_NOTIF_ID_VALID was first introduced it had the wrong
direction flag set. While this isn't a big deal as nothing currently
enforces these bits in the kernel, it should be defined correctly. Fix
the define and provide support for the old command until it is no longer
needed for backward compatibility.
Fixes: 6a21cc50f0 ("seccomp: add a return code to trap to userspace")
Signed-off-by: Kees Cook <keescook@chromium.org>
The user_trap_syscall() helper creates a filter with
SECCOMP_RET_USER_NOTIF. To avoid confusion with SECCOMP_RET_TRAP, rename
the helper to user_notif_syscall().
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@chromium.org>
Cc: linux-kselftest@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: bpf@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
The seccomp tests are a bit noisy without CONFIG_CHECKPOINT_RESTORE (due
to missing the kcmp() syscall). The seccomp tests are more accurate with
kcmp(), but it's not strictly required. Refactor the tests to use
alternatives (comparing fd numbers), and provide a central test for
kcmp() so there is a single SKIP instead of many. Continue to produce
warnings for the other tests, though.
Additionally adds some more bad flag EINVAL tests to the addfd selftest.
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@chromium.org>
Cc: linux-kselftest@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: bpf@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
The seccomp benchmark calibration loop did not need to take so long.
Instead, use a simple 1 second timeout and multiply up to target. It
does not need to be accurate.
Signed-off-by: Kees Cook <keescook@chromium.org>
As seccomp_benchmark tries to calibrate how many samples will take more
than 5 seconds to execute, it may end up picking up a number of samples
that take 10 (but up to 12) seconds. As the calibration will take double
that time, it takes around 20 seconds. Then, it executes the whole thing
again, and then once more, with some added overhead. So, the thing might
take more than 40 seconds, which is too close to the 45s timeout.
That is very dependent on the system where it's executed, so may not be
observed always, but it has been observed on x86 VMs. Using a 90s timeout
seems safe enough.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Link: https://lore.kernel.org/r/20200601123202.1183526-1-cascardo@canonical.com
Signed-off-by: Kees Cook <keescook@chromium.org>
It's useful to see how much (at a minimum) each filter adds to the
syscall overhead. Add additional calculations.
Signed-off-by: Kees Cook <keescook@chromium.org>
The TSYNC ESRCH flag test will fail for regular users because NNP was
not set yet. Add NNP setting.
Fixes: 51891498f2 ("seccomp: allow TSYNC and USER_NOTIF together")
Cc: stable@vger.kernel.org
Reviewed-by: Tycho Andersen <tycho@tycho.ws>
Signed-off-by: Kees Cook <keescook@chromium.org>
Running the seccomp tests as a regular user shouldn't just fail tests
that require CAP_SYS_ADMIN (for getting a PID namespace). Instead,
detect those cases and SKIP them. Additionally, gracefully SKIP missing
CONFIG_USER_NS (and add to "config" since we'd prefer to actually test
this case).
Signed-off-by: Kees Cook <keescook@chromium.org>
The kselftests will be renaming XFAIL to SKIP in the test harness, and
to avoid painful conflicts, rename XFAIL to SKIP now in a future-proofed
way.
Signed-off-by: Kees Cook <keescook@chromium.org>
Add validating the UDP tunnel infra works.
$ ./udp_tunnel_nic.sh
PASSED all 383 checks
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This Kselftest fixes update for Linux 5.8-rc5 consists of tmp2 test
changes to run on python3 and kselftest framework fix to incorrect
return type.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl8ImAcACgkQCwJExA0N
QxwEPg//fYS/ilgwYRn+Zc9fV/JPGYBfJKN1DdAua0fRgXjcz2z3dMON7Tt/HBNY
KZDCDJMY4SSPBAdHRt/2duQx21C0i8zVcCBsPgZPhJdPF5tUIbD4oM0qCxrto1hB
Wss1UHWmobB+BPI7mJSPFirYXyVqVXcpdYJmKkRYbpJTFRS2t1g5jFczZVgEbeEv
7lEY4yRi2hlGsCR4J+VqIUzyY+enc0vSYNaSaH8d4P1cC8h5Dl7MoqSIJshrNcDe
kHz5l5R7cQSOQhsSIwvRuEolkmjyB0ejzuUsAQ6fwRKGMTVWlY+2fWcu5dNelYMm
C3BGyVcAPXsrMfX6Vw4AFRd1a/7XiuNOFwH4PkQeomfAh2Ml9m1pO3fdqWQlbnHu
VpkIbPaiTyarEaIYvF6UnxUTB+hNzHxw8Ac6TvwP3xz26k7U4X4ue75s03DVA7qx
63EsxY+gtAhD0ATUSpHq65ZgUSnQh5OAk0UB24h7kzxaE5EZ8gU8ifYqBPGiEot2
3KQ1QLZdCTmuTxsjI+OdXdOAnFuhqNtzk69CCS43YwDJitBLG6kz+aFgVWDZxkx1
MY5msDi/ROyymiWEUCSZhE70qIhIU/tDKQ9+QeL4VqEhzfcGrgePtV+dsp6seKCl
Swi2PY2zoCdq6gZshiiIQln/M5XaW8dPBvz+wKaRo/sMzwts18M=
=pKoZ
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-fixes-5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"TPM2 test changes to run on python3 and kselftest framework fix to
incorrect return type"
* tag 'linux-kselftest-fixes-5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kselftest: ksft_test_num return type should be unsigned
selftests: tpm: upgrade TPM2 tests from Python 2 to Python 3
Fix sockmap tests which rely on old bpf_prog_dispatch behaviour.
In the first case, the tests check that detaching without giving
a program succeeds. Since these are not the desired semantics,
invert the condition. In the second case, the clean up code doesn't
supply the necessary program fds.
Fixes: bb0de3131f ("bpf: sockmap: Require attach_bpf_fd when detaching a program")
Reported-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200709115151.75829-1-lmb@cloudflare.com
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.
Deterministic algorithm:
For each file:
If not .svg:
For each line:
If doesn't contain `\bxmlns\b`:
For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
If both the HTTP and HTTPS versions
return 200 OK and serve the same content:
Replace HTTP with HTTPS.
Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Test port split configuration using previously added number of port lanes
attribute.
Check that all the splittable ports are successfully split to their maximum
number of lanes and below, and that those which are not splittable fail to
be split.
Test output example:
TEST: swp4 is unsplittable [ OK ]
TEST: split port swp53 into 4 [ OK ]
TEST: Unsplit port pci/0000:03:00.0/25 [ OK ]
TEST: split port swp53 into 2 [ OK ]
TEST: Unsplit port pci/0000:03:00.0/25 [ OK ]
Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several users of kallsyms_show_value() were performing checks not
during "open". Refactor everything needed to gain proper checks against
file->f_cred for modules, kprobes, and bpf.
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8GUbMWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJohnD/9VsPsAMV+8lhsPvkcuW/DkTcAY
qUEzsXU3v06gJ0Z/1lBKtisJ6XmD93wWcZCTFvJ0S8vR3yLZvOVfToVjCMO32Trc
4ZkWTPwpvfeLug6T6CcI2ukQdZ/opI1cSabqGl79arSBgE/tsghwrHuJ8Exkz4uq
0b7i8nZa+RiTezwx4EVeGcg6Dv1tG5UTG2VQvD/+QGGKneBlrlaKlI885N/6jsHa
KxvB7+8ES1pnfGYZenx+RxMdljNrtyptbQEU8gyvoV5YR7635gjZsVsPwWANJo+4
EGcFFpwWOAcVQaC3dareLTM8nVngU6Wl3Rd7JjZtjvtZba8DdCn669R34zDGXbiP
+1n1dYYMSMBeqVUbAQfQyLD0pqMIHdwQj2TN8thSGccr2o3gNk6AXgYq0aYm8IBf
xDCvAansJw9WqmxErIIsD4BFkMqF7MjH3eYZxwCPWSrKGDvKxQSPV5FarnpDC9U7
dYCWVxNPmtn+unC/53yXjEcBepKaYgNR7j5G7uOfkHvU43Bd5demzLiVJ10D8abJ
ezyErxxEqX2Gr7JR2fWv7iBbULJViqcAnYjdl0y0NgK/hftt98iuge6cZmt1z6ai
24vI3X4VhvvVN5/f64cFDAdYtMRUtOo2dmxdXMid1NI07Mj2qFU1MUwb8RHHlxbK
8UegV2zcrBghnVuMkw==
=ib5Q
-----END PGP SIGNATURE-----
Merge tag 'kallsyms_show_value-v5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull kallsyms fix from Kees Cook:
"Refactor kallsyms_show_value() users for correct cred.
I'm not delighted by the timing of getting these changes to you, but
it does fix a handful of kernel address exposures, and no one has
screamed yet at the patches.
Several users of kallsyms_show_value() were performing checks not
during "open". Refactor everything needed to gain proper checks
against file->f_cred for modules, kprobes, and bpf"
* tag 'kallsyms_show_value-v5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
selftests: kmod: Add module address visibility test
bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok()
kprobes: Do not expose probe addresses to non-CAP_SYSLOG
module: Do not expose section addresses to non-CAP_SYSLOG
module: Refactor section attr into bin attribute
kallsyms: Refactor kallsyms_show_value() to take cred
basic functional test, triggering the msk diag interface
code. Require appropriate iproute2 support, skip elsewhere.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure we don't regress the CAP_SYSLOG behavior of the module address
visibility via /proc/modules nor /sys/module/*/sections/*.
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Switch perf_buffer test to use skeleton to avoid use of bpf_prog_load() and
make test a bit more succinct. Also switch BPF program to use tracepoint
instead of kprobe, as that allows to support older kernels, which had
tracepoint support before kprobe support in the form that libbpf expects
(i.e., libbpf expects /sys/bus/event_source/devices/kprobe/type, which doesn't
always exist on old kernels).
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200708015318.3827358-7-andriin@fb.com
Add a test that relies on CO-RE, but doesn't expect any of the recent
features, not available on old kernels. This is useful for Travis CI tests
running against very old kernels (e.g., libbpf has 4.9 kernel testing now), to
verify that CO-RE still works, even if kernel itself doesn't support BTF yet,
as long as there is .BTF embedded into vmlinux image by pahole. Given most of
CO-RE doesn't require any kernel awareness of BTF, it is a useful test to
validate that libbpf's BTF sanitization is working well even with ancient
kernels.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200708015318.3827358-5-andriin@fb.com
There are a number of places in test_progs that use minus-1 as the argument
to exit(). This is confusing as a process exit status is masked to be a
number between 0 and 255 as defined in man exit(3). Thus, users will see
status 255 instead of minus-1.
This patch use positive exit code 3 instead of minus-1. These cases are put
in the same group of infrastructure setup errors.
Fixes: fd27b1835e ("selftests/bpf: Reset process and thread affinity after each test/sub-test")
Fixes: 811d7e375d ("bpf: selftests: Restore netns after each test")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/159410594499.1093222.11080787853132708654.stgit@firesoul
This is a follow up adjustment to commit 6c92bd5cd4 ("selftests/bpf:
Test_progs indicate to shell on non-actions"), that returns shell exit
indication EXIT_FAILURE (value 1) when user selects a non-existing test.
The problem with using EXIT_FAILURE is that a shell script cannot tell
the difference between a non-existing test and the test failing.
This patch uses value 2 as shell exit indication.
(Aside note unrecognized option parameters use value 64).
Fixes: 6c92bd5cd4 ("selftests/bpf: Test_progs indicate to shell on non-actions")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/159410593992.1093222.90072558386094370.stgit@firesoul
An extra count on ebb_state.stats.pmc_count[PMC_INDEX(pmc)] is being per-
formed when count_pmc() is used to reset PMCs on a few selftests. This
extra pmc_count can occasionally invalidate results, such as the ones from
cycles_test shown hereafter. The ebb_check_count() failed with an above
the upper limit error due to the extra value on ebb_state.stats.pmc_count.
Furthermore, this extra count is also indicated by extra PMC1 trace_log on
the output of the cycle test (as well as on pmc56_overflow_test):
==========
...
[21]: counter = 8
[22]: register SPRN_MMCR0 = 0x0000000080000080
[23]: register SPRN_PMC1 = 0x0000000080000004
[24]: counter = 9
[25]: register SPRN_MMCR0 = 0x0000000080000080
[26]: register SPRN_PMC1 = 0x0000000080000004
[27]: counter = 10
[28]: register SPRN_MMCR0 = 0x0000000080000080
[29]: register SPRN_PMC1 = 0x0000000080000004
>> [30]: register SPRN_PMC1 = 0x000000004000051e
PMC1 count (0x280000546) above upper limit 0x2800003e8 (+0x15e)
[FAIL] Test FAILED on line 52
failure: cycles
==========
Signed-off-by: Desnes A. Nunes do Rosario <desnesn@linux.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200626164737.21943-1-desnesn@linux.ibm.com
Now that pidfds support CLONE_NEWTIME as well enable testing them in the
setns() testuite.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Andrei Vagin <avagin@gmail.com>
Link: https://lore.kernel.org/r/20200706154912.3248030-5-christian.brauner@ubuntu.com
samples/bpf no longer use bpf_map_def_legacy and instead use the
libbpf's bpf_map_def or new BTF-defined MAP format. This commit removes
unused bpf_map_def_legacy struct from selftests/bpf/bpf_legacy.h.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200707184855.30968-5-danieltimlee@gmail.com
Simple test that enforces a single SOCK_DGRAM socket per cgroup.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200706230128.4073544-5-sdf@google.com
The check if there are any files to install in case of no files
compares "X " with "X" so never false.
Remove extra spaces. It may make sense to use make's $(if) function
here.
Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Using one shell for the whole recipe with long lists can cause
make[1]: execvp: /bin/sh: Argument list too long
with some shells. Triggered by commit 309b81f0fd ("selftests/bpf:
Install generated test progs")
It requires to change the rule which rely on the one shell
behaviour (run_tests).
Simplify also INSTALL_SINGLE_RULE, remove extra echo, required to
workaround .ONESHELL.
Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Cc: Jiri Benc <jbenc@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Similar to how ENOSYS causes a skip if pidfd_send_signal is not present,
we can do the same for unshare if it fails with EPERM. This way, running
the test without privileges causes four tests to skip but no early bail out.
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Calling ksft_exit_skip after ksft_set_plan results in executing fewer tests
than planned. Use ksft_test_result_skip instead.
The plan passed to ksft_set_plan was wrong, too, so fix it while at it.
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Use a share memory segment to pass string information between forked
test and the test runner for the skip reason.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Since forever the harness output for signed value tests have reported
unsigned values to avoid casting. Instead, actually test the variable
types and perform the correct casts and choose the correct format
specifiers.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Using the kselftest_harness.h would result in non-TAP test reporting,
which didn't make much sense given that all the requirements for using
the low-level API were met. Switch to using ksft_*() helpers while
retaining as much of a human-readability as possible.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Add "how to use this API" documentation to kselftest.h, and include some
addition helpers and notes to make things easier to use.
Additionally removes the incorrect "Bail out!" line from the standard exit
path. The TAP13 specification says that "Bail out!" should be used when
giving up before all tests have been run. For a "normal" execution run,
the selftests should not report "Bail out!".
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The binderfs test mixed the full harness API and the selftest API.
Adjust to use only the harness API so that the harness API can switch
to using the selftest API internally in future patches.
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Remove unused includes of the kselftest.h header.
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Selftest output reporting was happening before the TAP headers and plan
had been emitted. Move the first test reports later.
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Calling ksft_exit_skip after ksft_set_plan results in executing fewer tests
than planned. Move it before.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Calling ksft_exit_skip after ksft_set_plan results in executing fewer tests
than planned. Use ksft_test_result_skip when possible, or just bail out if
memory corruption is detected.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Calling ksft_exit_skip after ksft_set_plan results in executing fewer tests
than planned. Use ksft_test_result_skip for the individual tests.
The call in suspend() is fine, but ksft_set_plan should be after it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The computation of the test plan uses the available_cpus bitset
before calling sched_getaffinity to fill it in. The resulting
plan is bogus, fix it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
According to the TAP specification, a skipped test must be marked as "ok"
and annotated with the SKIP directive, for example
ok 23 # skip Insufficient flogiston pressure.
(https://testanything.org/tap-specification.html)
Fix the kselftest infrastructure to match this.
For ksft_exit_skip, it is preferrable to emit a dummy plan line that
indicates the whole test was skipped, but this is not always possible
because of ksft_exit_skip being used as a "shortcut" by the tests.
In that case, print the test counts and a normal "ok" line. The format
is now the same independent of whether msg is NULL or not (but it is
never NULL in any caller right now).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Fixes a compiler warning:
In file included from sync_test.c:37:
../kselftest.h: In function ‘ksft_print_cnts’:
../kselftest.h:78:16: warning: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘int’ [-Wsign-compare]
if (ksft_plan != ksft_test_num())
^~
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Brian reported a crash in IPv6 code when using rpfilter with a setup
running FRR and external nexthop objects. The root cause of the crash
is fib6_select_path setting fib6_nh in the result to NULL because of
an improper check for nexthop objects.
More specifically, rpfilter invokes ip6_route_lookup with flowi6_oif
set causing fib6_select_path to be called with have_oif_match set.
fib6_select_path has early check on have_oif_match and jumps to the
out label which presumes a builtin fib6_nh. This path is invalid for
nexthop objects; for external nexthops fib6_select_path needs to just
return if the fib6_nh has already been set in the result otherwise it
returns after the call to nexthop_path_fib6_result. Update the check
on have_oif_match to not bail on external nexthops.
Update selftests for this problem.
Fixes: f88d8ea67f ("ipv6: Plumb support for nexthop object in a fib6_info")
Reported-by: Brian Rak <brak@choopa.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Python 2 is no longer supported by the Python upstream project, so
upgrade TPM2 tests to Python 3.
Fixed minor merge conflicts
Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Pengfei Xu <pengfei.xu@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
When investigating performance issues that involve latency / loss /
reordering it is useful to have the pcap from the sender-side as it
allows to easier infer the state of the sender's congestion-control,
loss-recovery, etc.
Allow the selftests to capture a pcap on both sender and receiver so
that this information is not lost when reproducing.
This patch also improves the file names. Instead of:
ns4-5ee79a56-X4O6gS-ns3-5ee79a56-X4O6gS-MPTCP-MPTCP-10.0.3.1.pcap
We now have something like for the same test:
5ee79a56-X4O6gS-ns3-ns4-MPTCP-MPTCP-10.0.3.1-10030-connector.pcap
5ee79a56-X4O6gS-ns3-ns4-MPTCP-MPTCP-10.0.3.1-10030-listener.pcap
It was a connection from ns3 to ns4, better to start with ns3 then. The
port is also added, easier to find the trace we want.
Co-developed-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Reset MXCSR in kernel_fpu_begin() to prevent using a stale user space
value.
- Prevent writing MSR_TEST_CTRL on CPUs which are not explicitly
whitelisted for split lock detection. Some CPUs which do not support
it crash even when the MSR is written to 0 which is the default value.
- Fix the XEN PV fallout of the entry code rework
- Fix the 32bit fallout of the entry code rework
- Add more selftests to ensure that these entry problems don't come back.
- Disable 16 bit segments on XEN PV. It's not supported because XEN PV
does not implement ESPFIX64
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8B9JoTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoV8LEAC6QJPDvqYUl4r0rNIRG+S6D99lQOse
1smxvgXX4UaRz5Tgz6kvYUcucqmmnTfvnO8cg82LASeFw1xfVPPAtl3GZjoClwhv
0NJkKYcMm5QUOSVjJmjkcbAld//FyRfxHuJ8HMEtrbvkys2qWBmLzMaUNhFDNhcc
73UMmyuyL4kef9v/iAeR5WXG5+b+j9lZDiC1lTWuEKs10d1EdTwt2O/wtSRRPpMn
kL1qGTJAL+iRyRe7weLOkC2KZ9+Gq2NtyJQutkthZtGe5+pLT3AT6AlWxeg1HU8q
pxaQP25oe8/8naIoOmwiuwAP2qmm5eHedzXoN0h7i2XmofYOJaWeF95K7oDro8Nj
2deCx1bk0wr/RUxbYlfUacs8S+wmMWe7+BPnHXZphkSq5Vx+oXIw6mJOqmNb7Yiv
7ld1QwSD5dyWCEk1af16XKsFvSIRiGh8FypfTiTxyk+z7HIWBNXlu8OWHn1A7Sra
iaolCZfXtTJzm4w5+VVT2FX3s7jJrmMM4iSLtM2ISo2k+1HMlTbgLE6/yGjQ3ZaY
U298W7Pm8CwBRgzyKBvZVfncm0U/B0FNo/8C0jsJKPIOdpoLhs+u7sjpyaNC+toz
GE0skoWZxMhga4xPF84ua/l1VGncVUN1d5/dmnXz8xdyxFlktUtkt2iPE4G0rt3S
Xgh2uLHOgST6Kw==
=lI9c
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A series of fixes for x86:
- Reset MXCSR in kernel_fpu_begin() to prevent using a stale user
space value.
- Prevent writing MSR_TEST_CTRL on CPUs which are not explicitly
whitelisted for split lock detection. Some CPUs which do not
support it crash even when the MSR is written to 0 which is the
default value.
- Fix the XEN PV fallout of the entry code rework
- Fix the 32bit fallout of the entry code rework
- Add more selftests to ensure that these entry problems don't come
back.
- Disable 16 bit segments on XEN PV. It's not supported because XEN
PV does not implement ESPFIX64"
* tag 'x86-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ldt: Disable 16-bit segments on Xen PV
x86/entry/32: Fix #MC and #DB wiring on x86_32
x86/entry/xen: Route #DB correctly on Xen PV
x86/entry, selftests: Further improve user entry sanity checks
x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER
selftests/x86: Consolidate and fix get/set_eflags() helpers
selftests/x86/syscall_nt: Clear weird flags after each test
selftests/x86/syscall_nt: Add more flag combinations
x86/entry/64/compat: Fix Xen PV SYSENTER frame setup
x86/entry: Move SYSENTER's regs->sp and regs->flags fixups into C
x86/entry: Assert that syscalls are on the right stack
x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted
x86/fpu: Reset MXCSR to default in kernel_fpu_begin()
Before, clang version 9 threw errors such as: error:
use of GNU old-style field designator extension [-Werror,-Wgnu-designator]
{ tstamp: true, swtstamp: true }
^~~~~~~
.tstamp =
Fix these warnings in tools/testing/selftests/net in the same manner as
commit 121e357ac7 ("selftests/harness: Update named initializer syntax").
N.B. rxtimestamp.c is the only affected file in the directory.
Signed-off-by: Tanner Love <tannerlove@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2020-07-04
The following pull-request contains BPF updates for your *net-next* tree.
We've added 73 non-merge commits during the last 17 day(s) which contain
a total of 106 files changed, 5233 insertions(+), 1283 deletions(-).
The main changes are:
1) bpftool ability to show PIDs of processes having open file descriptors
for BPF map/program/link/BTF objects, relying on BPF iterator progs
to extract this info efficiently, from Andrii Nakryiko.
2) Addition of BPF iterator progs for dumping TCP and UDP sockets to
seq_files, from Yonghong Song.
3) Support access to BPF map fields in struct bpf_map from programs
through BTF struct access, from Andrey Ignatov.
4) Add a bpf_get_task_stack() helper to be able to dump /proc/*/stack
via seq_file from BPF iterator progs, from Song Liu.
5) Make SO_KEEPALIVE and related options available to bpf_setsockopt()
helper, from Dmitry Yakunin.
6) Optimize BPF sk_storage selection of its caching index, from Martin
KaFai Lau.
7) Removal of redundant synchronize_rcu()s from BPF map destruction which
has been a historic leftover, from Alexei Starovoitov.
8) Several improvements to test_progs to make it easier to create a shell
loop that invokes each test individually which is useful for some CIs,
from Jesper Dangaard Brouer.
9) Fix bpftool prog dump segfault when compiled without skeleton code on
older clang versions, from John Fastabend.
10) Bunch of cleanups and minor improvements, from various others.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the options --ipv4, --ipv6 to specify running over ipv4 and/or
ipv6. If neither is specified, then run both.
Signed-off-by: Tanner Love <tannerlove@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BPF selftests show a compilation error as follows:
libbpf: invalid relo for 'entries' in special section 0xfff2; forgot to
initialize global var?..
Fix it by initializing 'entries' to zeros.
Fixes: c7568114bc ("selftests/bpf: Add bpf_iter test with bpf_get_task_stack()")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200703181719.3747072-1-songliubraving@fb.com
This kselftest fixes update for Linux 5.8-rc4 consists of tpm test
fixes from arkko Sakkinen.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl79/1wACgkQCwJExA0N
QxxDOg//Q5SUe3nLy3AVKats4Ma2cuIi1hb1vsrLpGbl3OU9Bbf9uV84DBrXzb0C
1Mshp9hxGWphbgkBL9Bc4/DJ3dX6WXbZXDfOblHn15w1SaD6HC/8WAU802KR9gZj
KqfFUMTncDeejk6KET8UVbphG1rVax0AQK0TAB4SnHPLt3noZYRvdVfEiCyKIQ6o
zBvXsweKKH7QehoACtm4myVnQnDhdhvoSvJt8nRox2EU38fCR+MSfYZmDHaakDpp
0JYEPVIvk+vn2qfcGDuFax4AzfNrRfe9NbaqEX9g7vVCrOITNGPQOsH/+3UHj/9r
scTRE8ItN4oHYEVtnWbIIBVYZt6VKVYbWVvfVoS2XQpaEFs3qAw241rCpb7Cg0E6
3f6QEVnhkFCNI9VCA1Z30puWQ6HmuroeHLBAfr8CB2Z14WC4kadMzUbTnj0BgJAh
7GACDPVy6ZOsqS5WUZ2v7kMqxJAJYwPUH+xZaW3M7yb5O9BdXsPDSDXblSOGhJX6
C+n/lK3DtsIYoMLW5eeRFfb5V8YZRfBP6nvjnuopgYm0H7BNfflIpybljR9lpAMo
0LRbkdEjTS9TpX7OravT7qZY3VWyr68wZLTGBQUp6lhFJhS4qjjL9mjefLBy9XQb
37VsnYXOUfnwcFXXXjxFlhVKqAIr14bE+lI0mfbnY4npqKvwjFY=
=RDv/
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-fixes-5.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"tpm test fixes from Jarkko Sakkinen"
* tag 'linux-kselftest-fixes-5.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: tpm: Use /bin/sh instead of /bin/bash
selftests: tpm: Use 'test -e' instead of 'test -f'
Revert "tpm: selftest: cleanup after unseal with wrong auth/policy test"
This kunit fixes update for Linux 5.8-rc4 consists of fixes to build
and run-times failures. Also includes troubleshooting tips updates
to kunit user documentation.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl7985cACgkQCwJExA0N
QxxFvhAA6Wa+1UMR4VKLZgfc2dL85LeV9tZO714ssIt1rxcgv2dHswsE0nbJmHyM
DsfEufqOsvpX7/ic/JqrwIl+iDGrKlV9wo+ZLl+tdt5jVeB6OP6Dr5C3jvD3eZhX
zUHxr04QGzQuJnS6gAOIrCa/qBz17duAEij6xj4if/6OAkL2Igb3PGFzhpjVKqJL
TLY5UJ80D+QHJ7o8FWsaB8bNDMu7gmOBgfMb1qGB60cFppE+regoQRtkZefLap26
MixOFgRD5DyNoGqTZzJqSn7IZxvERoHfxKchzpAUHsNn9tI0r15X016Wcgf2+B+T
2eyRJDkTP3dt4oFuML4CXeQvZOgrcZNIWeVFmBK9NcmRg0WDnWPzCE2Mm+lnZD8e
0fefiaLBZw5+ztaz24S/M3mTpZQru8N2FDgLJmpLcPulIuDYpm4tB2PkBc0AmF35
6gC3WDa6cw1qbbDgN83xd9VdlACBe2fYzenhZCqDzgE1zGquORkhuAQYZfdGrixi
ojpn7IKN+JeufiFZuu1xOJeAojIZ4JU42FxM0S1PSXf9deqICzfa1LSOWEaL+V4G
GaPq/nnMhtY2rMGFAQXyCP4YQe2XQU/Jt1SOdFA/UZ1W+oYXwjOlSVo9xpjJV/6y
4TAQ7Yg8S87CUbffYpBLw3Xkg8E0L9ih+E+UOineMcUiu6yxA2Q=
=XeG4
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-kunit-fixes-5.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit fixes from Shuah Khan
"Fixes for build and run-times failures.
Also includes troubleshooting tips updates to kunit user
documentation"
* tag 'linux-kselftest-kunit-fixes-5.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
Documentation: kunit: Add some troubleshooting tips to the FAQ
kunit: kunit_tool: Fix invalid result when build fails
kunit: show error if kunit results are not present
kunit: kunit_config: Fix parsing of CONFIG options with space
It is common for networking tests creating its netns and making its own
setting under this new netns (e.g. changing tcp sysctl). If the test
forgot to restore to the original netns, it would affect the
result of other tests.
This patch saves the original netns at the beginning and then restores it
after every test. Since the restore "setns()" is not expensive, it does it
on all tests without tracking if a test has created a new netns or not.
The new restore_netns() could also be done in test__end_subtest() such
that each subtest will get an automatic netns reset. However,
the individual test would lose flexibility to have total control
on netns for its own subtests. In some cases, forcing a test to do
unnecessary netns re-configure for each subtest is time consuming.
e.g. In my vm, forcing netns re-configure on each subtest in sk_assign.c
increased the runtime from 1s to 8s. On top of that, test_progs.c
is also doing per-test (instead of per-subtest) cleanup for cgroup.
Thus, this patch also does per-test restore_netns(). The only existing
per-subtest cleanup is reset_affinity() and no test is depending on this.
Thus, it is removed from test__end_subtest() to give a consistent
expectation to the individual tests. test_progs.c only ensures
any affinity/netns/cgroup change made by an earlier test does not
affect the following tests.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200702004858.2103728-1-kafai@fb.com
This patch makes a few changes to the network_helpers.c
1) Enforce SO_RCVTIMEO and SO_SNDTIMEO
This patch enforces timeout to the network fds through setsockopt
SO_RCVTIMEO and SO_SNDTIMEO.
It will remove the need for SOCK_NONBLOCK that requires a more demanding
timeout logic with epoll/select, e.g. epoll_create, epoll_ctrl, and
then epoll_wait for timeout.
That removes the need for connect_wait() from the
cgroup_skb_sk_lookup.c. The needed change is made in
cgroup_skb_sk_lookup.c.
2) start_server():
Add optional addr_str and port to start_server().
That removes the need of the start_server_with_port(). The caller
can pass addr_str==NULL and/or port==0.
I have a future tcp-hdr-opt test that will pass a non-NULL addr_str
and it is in general useful for other future tests.
"int timeout_ms" is also added to control the timeout
on the "accept(listen_fd)".
3) connect_to_fd(): Fully use the server_fd.
The server sock address has already been obtained from
getsockname(server_fd). The sockaddr includes the family,
so the "int family" arg is redundant.
Since the server address is obtained from server_fd, there
is little reason not to get the server's socket type from the
server_fd also. getsockopt(server_fd) can be used to do that,
so "int type" arg is also removed.
"int timeout_ms" is added.
4) connect_fd_to_fd():
"int timeout_ms" is added.
Some code is also refactored to connect_fd_to_addr() which is
shared with connect_to_fd().
5) Preserve errno:
Some callers need to check errno, e.g. cgroup_skb_sk_lookup.c.
Make changes to do it more consistently in save_errno_close()
and log_err().
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200702004852.2103003-1-kafai@fb.com
Add the ktest config option MAIL_MAX_SIZE that will limit the size of the
log file that is placed into the email on failure.
Link: https://lore.kernel.org/r/20200701231756.790637968@goodmis.org
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
If a failure happens and an email is sent, show the contents of the log of
the last test that failed in the email.
Link: http://lore.kernel.org/r/20200701231756.619246244@goodmis.org
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The script generates two random files that are then sent via tcp and
mptcp connections.
In order to compare throughput over consecutive runs add an option
to provide the file size on the command line: "-f 128000".
Also add an option, -t, to enable tcp tests. This is useful to
compare throughput of mptcp connections and tcp connections.
Example: run tests with a 4mb file size, 300ms delay 0.01% loss,
default gso/tso/gro settings and with large write/blocking io:
mptcp_connect.sh -t -f $((4 * 1024 * 1024)) -d 300 -l 0.01% -r 0 -e "" -m mmap
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The program test_progs have some very useful ability to specify a list of
test name substrings for selecting which tests to run.
This patch add the ability to list the selected test names without running
them. This is practical for seeing which tests gets selected with given
select arguments (which can also contain a exclude list via --name-blacklist).
This output can also be used by shell-scripts in a for-loop:
for N in $(./test_progs --list -t xdp); do \
./test_progs -t $N 2>&1 > result_test_${N}.log & \
done ; wait
This features can also be used for looking up a test number and returning
a testname. If the selection was empty then a shell EXIT_FAILURE is
returned. This is useful for scripting. e.g. like this:
n=1;
while [ $(./test_progs --list -n $n) ] ; do \
./test_progs -n $n ; n=$(( n+1 )); \
done
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/159363985751.930467.9610992940793316982.stgit@firesoul
It can be practial to get the number of tests that test_progs contain.
This could for example be used to create a shell for-loop construct that
runs the individual tests.
Like:
for N in $(seq 1 $(./test_progs -c)); do
./test_progs -n $N 2>&1 > result_test_${N}.log &
done ; wait
V2: Add the ability to return the count for the selected tests. This is
useful for getting a count e.g. after excluding some tests with option -b.
The current beakers test script like to report the max test count upfront.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/159363985244.930467.12617117873058936829.stgit@firesoul
When a user selects a non-existing test the summary is printed with
indication 0 for all info types, and shell "success" (EXIT_SUCCESS) is
indicated. This can be understood by a human end-user, but for shell
scripting is it useful to indicate a shell failure (EXIT_FAILURE).
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/159363984736.930467.17956007131403952343.stgit@firesoul
The test_vmlinux test uses hrtimer_nanosleep as hook to test tracing
programs. But in a kernel built by clang, which performs more aggresive
inlining, that function gets inlined into its caller SyS_nanosleep.
Therefore, even though fentry and kprobe do hook on the function,
they aren't triggered by the call to nanosleep in the test.
A possible fix is switching to use a function that is less likely to
be inlined, such as hrtimer_range_start_ns. The EXPORT_SYMBOL functions
shouldn't be inlined based on the description of [1], therefore safe
to use for this test. Also the arguments of this function include the
duration of sleep, therefore suitable for test verification.
[1] af3b56289b time: don't inline EXPORT_SYMBOL functions
Tested:
In a clang build kernel, before this change, the test fails:
test_vmlinux:PASS:skel_open 0 nsec
test_vmlinux:PASS:skel_attach 0 nsec
test_vmlinux:PASS:tp 0 nsec
test_vmlinux:PASS:raw_tp 0 nsec
test_vmlinux:PASS:tp_btf 0 nsec
test_vmlinux:FAIL:kprobe not called
test_vmlinux:FAIL:fentry not called
After switching to hrtimer_range_start_ns, the test passes:
test_vmlinux:PASS:skel_open 0 nsec
test_vmlinux:PASS:skel_attach 0 nsec
test_vmlinux:PASS:tp 0 nsec
test_vmlinux:PASS:raw_tp 0 nsec
test_vmlinux:PASS:tp_btf 0 nsec
test_vmlinux:PASS:kprobe 0 nsec
test_vmlinux:PASS:fentry 0 nsec
Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200701175315.1161242-1-haoluo@google.com
The log file should be up to date to whatever is happening in ktest.
Disable buffering to the LOG output file handle.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Currently, every write to the log file is done by opening the file, writing
to it, then closing the file. This rather expensive. Just open it at the
beginning and close it at the end.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The new test is similar to other bpf_iter tests. It dumps all
/proc/<pid>/stack to a seq_file. Here is some example output:
pid: 2873 num_entries: 3
[<0>] worker_thread+0xc6/0x380
[<0>] kthread+0x135/0x150
[<0>] ret_from_fork+0x22/0x30
pid: 2874 num_entries: 9
[<0>] __bpf_get_stack+0x15e/0x250
[<0>] bpf_prog_22a400774977bb30_dump_task_stack+0x4a/0xb3c
[<0>] bpf_iter_run_prog+0x81/0x170
[<0>] __task_seq_show+0x58/0x80
[<0>] bpf_seq_read+0x1c3/0x3b0
[<0>] vfs_read+0x9e/0x170
[<0>] ksys_read+0xa7/0xe0
[<0>] do_syscall_64+0x4c/0xa0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Note: bpf_iter test as-is doesn't print the contents of the seq_file. To
see the example above, it is necessary to add printf() to do_dummy_read.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200630062846.664389-5-songliubraving@fb.com
There is a NOT DEFINED operator, but there is not an operator that can
negate any other expression.
For example: NOT (${FOO} == boot || ${BAR} == run)
Add the keyword NOT to allow the ktest.pl config files to negate operators.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Currently, if a PRE_TEST is defined and ran, but fails, there's nothing
currently available to make the test fail too. Add a PRE_TEST_DIE option that
when set, if a PRE_TEST is defined and fails, the test will die too.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
If a log file is defined and the test were to error, a print statement is
made that shows the user where the log file is to examine it further. But
this is not done if the test were to succeed.
I find it annoying that it does not show where the log file is on success,
as I run several different tests that place their log files in various
locations, and even though the test pass, there's things I want to look at
in the log file (like warnings). It is much easier to find where the log
file is, if it is displayed at the end of a test.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
When performing a automatic config bisect via ktest.pl, it is very useful to
have a copy of each of the bisects used. This way, if a bisect were to go
wrong, it is possible to retrace the steps and continue at the location
before the error was made.
The ktest.pl will make a copy of the good and bad configs, labeled as such,
as well as a number attached to it that represents the iteration of the
bisect. These files are saved in the ktest temp directory where it currently
stores the good and bad config files.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Debuggers expect that doing PTRACE_GETREGS, then poking at a tracee
and maybe letting it run for a while, then doing PTRACE_SETREGS will
put the tracee back where it was. In the specific case of a 32-bit
tracer and tracee, the PTRACE_GETREGS/SETREGS data structure doesn't
have fs_base or gs_base fields, so FSBASE and GSBASE fields are
never stored anywhere. Everything used to still work because
nonzero FS or GS would result full reloads of the segment registers
when the tracee resumes, and the bases associated with FS==0 or
GS==0 are irrelevant to 32-bit code.
Adding FSGSBASE support broke this: when FSGSBASE is enabled, FSBASE
and GSBASE are now restored independently of FS and GS for all tasks
when context-switched in. This means that, if a 32-bit tracer
restores a previous state using PTRACE_SETREGS but the tracee's
pre-restore and post-restore bases don't match, then the tracee is
resumed with the wrong base.
Fix it by explicitly loading the base when a 32-bit tracer pokes FS
or GS on a 64-bit kernel.
Also add a test case.
Fixes: 673903495c ("x86/process/64: Use FSBSBASE in switch_to() if available")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/229cc6a50ecbb701abd50fe4ddaf0eda888898cd.1593192140.git.luto@kernel.org
The manual call to set_thread_area() via int $0x80 was missing any
indication that the descriptor was a pointer, causing gcc to
occasionally generate wrong code. Add the missing constraint.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/432968af67259ca92d68b774a731aff468eae610.1593192140.git.luto@kernel.org
There are several copies of get_eflags() and set_eflags() and they all are
buggy. Consolidate them and fix them. The fixes are:
Add memory clobbers. These are probably unnecessary but they make sure
that the compiler doesn't move something past one of these calls when it
shouldn't.
Respect the redzone on x86_64. There has no failure been observed related
to this, but it's definitely a bug.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/982ce58ae8dea2f1e57093ee894760e35267e751.1593191971.git.luto@kernel.org
Similarly to bpftool Makefile, allow to specify custom location of vmlinux.h
to be used during the build. This allows simpler testing setups with
checked-in pre-generated vmlinux.h.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200630004759.521530-2-andriin@fb.com
Daniel Borkmann says:
====================
pull-request: bpf 2020-06-30
The following pull-request contains BPF updates for your *net* tree.
We've added 28 non-merge commits during the last 9 day(s) which contain
a total of 35 files changed, 486 insertions(+), 232 deletions(-).
The main changes are:
1) Fix an incorrect verifier branch elimination for PTR_TO_BTF_ID pointer
types, from Yonghong Song.
2) Fix UAPI for sockmap and flow_dissector progs that were ignoring various
arguments passed to BPF_PROG_{ATTACH,DETACH}, from Lorenz Bauer & Jakub Sitnicki.
3) Fix broken AF_XDP DMA hacks that are poking into dma-direct and swiotlb
internals and integrate it properly into DMA core, from Christoph Hellwig.
4) Fix RCU splat from recent changes to avoid skipping ingress policy when
kTLS is enabled, from John Fastabend.
5) Fix BPF ringbuf map to enforce size to be the power of 2 in order for its
position masking to work, from Andrii Nakryiko.
6) Fix regression from CAP_BPF work to re-allow CAP_SYS_ADMIN for loading
of network programs, from Maciej Żenczykowski.
7) Fix libbpf section name prefix for devmap progs, from Jesper Dangaard Brouer.
8) Fix formatting in UAPI documentation for BPF helpers, from Quentin Monnet.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add two tests for PTR_TO_BTF_ID vs. null ptr comparison,
one for PTR_TO_BTF_ID in the ctx structure and the
other for PTR_TO_BTF_ID after one level pointer chasing.
In both cases, the test ensures condition is not
removed.
For example, for this test
struct bpf_fentry_test_t {
struct bpf_fentry_test_t *a;
};
int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
{
if (arg == 0)
test7_result = 1;
return 0;
}
Before the previous verifier change, we have xlated codes:
int test7(long long unsigned int * ctx):
; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
0: (79) r1 = *(u64 *)(r1 +0)
; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
1: (b4) w0 = 0
2: (95) exit
After the previous verifier change, we have:
int test7(long long unsigned int * ctx):
; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
0: (79) r1 = *(u64 *)(r1 +0)
; if (arg == 0)
1: (55) if r1 != 0x0 goto pc+4
; test7_result = 1;
2: (18) r1 = map[id:6][0]+48
4: (b7) r2 = 1
5: (7b) *(u64 *)(r1 +0) = r2
; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
6: (b4) w0 = 0
7: (95) exit
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200630171241.2523875-1-yhs@fb.com
Calling bpf_prog_detach is incorrect, since it takes target_fd as
its argument. The intention here is to pass it as attach_bpf_fd,
so use bpf_prog_detach2 and pass zero for target_fd.
Fixes: 06716e04a0 ("selftests/bpf: Extend test_flow_dissector to cover link creation")
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200629095630.7933-7-lmb@cloudflare.com
Pass 0 as target_fd when attaching and detaching flow dissector.
Additionally, pass the expected program when detaching.
Fixes: 1f043f87bb ("selftests/bpf: Add tests for attaching bpf_link to netns")
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200629095630.7933-6-lmb@cloudflare.com
This case, while not particularly useful, is worth covering because we
expect the operation to succeed as opposed when re-attaching the same
program directly with PROG_ATTACH.
While at it, update the tests summary that fell out of sync when tests
extended to cover links.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200625141357.910330-5-jakub@cloudflare.com
Apart from read and write access, memory protection keys can
also be used for restricting execute permission of pages on
powerpc. This adds a test to verify if the feature works as
expected.
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200604125610.649668-4-sandipan@linux.ibm.com
The Power ISA mandates that all writes to the Authority
Mask Register (AMR) must always be preceded as well as
succeeded by a context synchronizing instruction.
This makes sure that the tests follow this requirement
when attempting to update a pkey's access rights.
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200604125610.649668-2-sandipan@linux.ibm.com
Add tests to check ethtool report about extended state.
The tests configure several states and verify that the correct extended
state is reported by ethtool.
Check extended state with substate (Autoneg) and extended state without
substate (No cable).
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add NETIF_NO_CABLE port to tests topology.
The port can also be declared as an environment variable and tests can be
run like that:
NETIF_NO_CABLE=eth9 ./test.sh eth{1..8}
The NETIF_NO_CABLE port will be used by ethtool_extended_state test.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently different_speeds_get() is used only by ethtool.sh tests.
The function can be useful for another tests that check ethtool
configurations.
Move the function to ethtool_lib in order to allow other tests to use
it.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This test is inspired by the mlxsw RED selftest. It is much simpler to set
up (also because there is no point in testing PRIO / RED encapsulation). It
tests bare RED, ECN and ECN+nodrop modes of operation. On top of that it
tests RED early_drop and mark qevents.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The reverted commit illegitly uses tpm2-tools. External dependencies are
absolutely forbidden from these tests. There is also the problem that
clearing is not necessarily wanted behavior if the test/target computer is
not used only solely for testing.
Fixes: a9920d3bad ("tpm: selftest: cleanup after unseal with wrong auth/policy test")
Cc: Tadeusz Struk <tadeusz.struk@intel.com>
Cc: stable@vger.kernel.org
Cc: linux-integrity@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
In the dim distant past, qemu commands needed to be run from the
rcutorture directory, but this is no longer the case. This commit
therefore removes the now-useless "cd $KVM" from the kvm-test-1-run.sh
script.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Currently, the qemu command is constructed twice, once to dump it
to the qemu-cmd file and again to execute it. This is of course an
accident waiting to happen, but is done to ensure that the remainder
of the script has an accurate idea of the running qemu command's PID.
This commit therefore places both the qemu command and the PID capture
into a new temporary file and sources that temporary file. Thus the
single construction of the qemu command into the qemu-cmd file suffices
for both purposes.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit adds a script that transforms qemu-cmd files to allow them
and the corresponding kernels to be run in contexts other than the one
that they were created for, including on systems other than the one that
they were built on. For example, this allows the build products from a
--buildonly run to be transformed to allow distributed rcutorture testing.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Using --kcsan when the compiler does not support KCSAN results in this:
:CONFIG_KCSAN=y: improperly set
:CONFIG_KCSAN_REPORT_ONCE_IN_MS=100000: improperly set
:CONFIG_KCSAN_VERBOSE=y: improperly set
:CONFIG_KCSAN_INTERRUPT_WATCHER=y: improperly set
Clean KCSAN run in /home/git/linux-rcu/tools/testing/selftests/rcutorture/res/2020.06.16-09.53.16
This is a bit obtuse, so this commit adds checks resulting in this:
:CONFIG_KCSAN=y: improperly set
:CONFIG_KCSAN_REPORT_ONCE_IN_MS=100000: improperly set
:CONFIG_KCSAN_VERBOSE=y: improperly set
:CONFIG_KCSAN_INTERRUPT_WATCHER=y: improperly set
Compiler or architecture does not support KCSAN!
Did you forget to switch your compiler with --kmake-arg CC=<cc-that-supports-kcsan>?
Suggested-by: Marco Elver <elver@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Marco Elver <elver@google.com>
Currently, kvm-recheck.sh complains that qemu failed for --buildonly
runs, which is sort of true given that qemu can hardly succeed if not
invoked in the first place. Nevertheless, this commit swaps the order
of checks in kvm-recheck.sh so that --buildonly runs will be summarized
more straightforwardly.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
We need to pass the arguments provided to --kmake-arg to all make
invocations. In particular, the make invocations generating the configs
need to see the final make arguments, e.g. if config variables depend on
particular variables that are passed to make.
For example, when using '--kcsan --kmake-arg CC=clang-11', we would lose
CONFIG_KCSAN=y due to 'make oldconfig' not seeing that we want to use a
compiler that supports KCSAN.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit pulls the simple pattern-based error detection from the
console log into a new console-badness.sh file. This will enable future
commits to end a run on the first error.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
When bisecting RCU issues, it is often the case that the first error in
an unsuccessful run will happen quickly, but that a successful run must
go on for some time in order to obtain a sufficiently low false-negative
error rate. In many cases, a bisection requires multiple concurrent
runs, in which case the first failure in any run indicates failure,
pure and simple. In such cases, it would speed things up greatly if
the first failure terminated all runs.
This commit therefore adds scripting that checks for a file named "STOP"
in the top-level results directory, terminating the run when it appears.
Note that in-progress builds will continue until completion, but future
builds and all runs will be cut short.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
One reason to do a --buildonly run is to use the build products elsewhere,
for example, to do the actual test on some other system. Part of doing
the test is the actual qemu command, which is not currently produced
by --buildonly runs. This commit therefore causes --buildonly runs to
create this file.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Leaving off the kvm.sh script's --cpus argument results in the script
testing the scenarios sequentially, which can be quite slow. However,
having to specify the actual number of CPUs can be error-prone.
This commit therefore adds a --allcpus argument that causes kvm.sh to
use all available CPUs.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The identify_qemu_vcpus bash function can return numbers including
whitespace characters, which can be a bit annoying in some bash
dollar-sign substitutions. This commit therefore strips all spaces and
tabs from the value that identify_qemu_vcpus outputs.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The current console parsing assumes that console lines containing "!!!"
are statistics lines from which it can parse the number of rcutorture
too-short grace-period failures. This prints confusing output for
other problems, including memory exhaustion. This commit therefore
differentiates between these cases and prints an appropriate error string.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The torture-test recheck logic fails to set the configfile variable to
the current scenario, so this commit properly initializes this variable.
This change isn't critical given that all errors for a given scenario
follow that scenario's heading, but it is easier on the eyes to repeat it.
And this repetition also prevents confusion as to whether a given message
goes with the previous heading or the next one.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit adds a kvm-check-branches.sh script that takes a list
of commits and commit ranges and runs a short rcutorture test on all
scenarios on each specified commit. A summary is printed at the end, and
the script returns success if all rcutorture runs completed without error.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
On some (probably misconfigured) systems, the torture-test scripting
will cause qemu to complain about missing EFI firmware, often because
qemu is trying to traverse broken symbolic links to find that firmware.
Which is a bit silly given that the default torture-test guest OS has
but a single binary for its userspace, and thus is unlikely to do much
in the way of networking in any case.
This commit therefore avoids such problems by specifying "-net none"
to qemu unless the TORTURE_QEMU_INTERACTIVE environment variable is set
(for example, by having specified "--interactive" to kvm.sh), in which
case "-net nic -net user" is specified to qemu instead. Either choice
may be overridden by specifying the "-net" argument of your choice to
the kvm.sh "--qemu-args" parameter.
Link: https://lore.kernel.org/lkml/20190701141403.GA246562@google.com
Reported-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
This commit renames the rcutorture config/refperf to config/refscale to
further avoid conflation with the Linux kernel's perf feature.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit further avoids conflation of refperf with the kernel's perf
feature by renaming kernel/rcu/refperf.c to kernel/rcu/refscale.c,
and also by similarly renaming the functions and variables inside
this file. This has the side effect of changing the names of the
kernel boot parameters, so kernel-parameters.txt and ver_functions.sh
are also updated.
The rcutorture --torture type remains refperf, and this will be
addressed in a separate commit.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The old Kconfig option name is all too easy to conflate with the
unrelated "perf" feature, so this commit renames RCU_REF_PERF_TEST to
RCU_REF_SCALE_TEST.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Currently, it is necessary to manually edit the console output to see
anything more than statistics, and sometimes the statistics can indicate
outliers that need more investigation. This commit therefore dumps out
the per-experiment measurements, sorted in ascending order, just before
dumping out the statistics.
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The experiment-number column is currently labeled "Threads", which is
misleading at best. This commit therefore relabels it as "Runs", and
adjusts the scripts accordingly.
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit updates the rcutorture scripting to include the new refperf
torture-test module.
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
I'd like arch-specific tests to XFAIL when on a mismatched architecture
so that we can more easily compare test coverage across all systems.
Lacking kernel configs or CPU features count as a FAIL, not an XFAIL.
Additionally fixes a build failure under 32-bit UML.
Fixes: b09511c253 ("lkdtm: Add a DOUBLE_FAULT crash type on x86")
Fixes: cea23efb4d ("lkdtm/bugs: Make double-fault test always available")
Fixes: 6cb6982f42 ("lkdtm: arm64: test kernel pointer authentication")
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200625203704.317097-5-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since we expect to see warnings every time for many tests, just reset
the WARN_ONCE flags each time the script runs.
Fixes: 46d1a0f03d ("selftests/lkdtm: Add tests for LKDTM targets")
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200625203704.317097-4-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a selftest for the usage of FPU code in kernel mode.
Currently only implemented for x86. In the future, kernel FPU testing
could be unified between the different architectures supporting it.
[ bp:
- Split out from a conglomerate patch, put comments over statements.
- run the test only on debugfs write.
- Add bare-minimum run_test_fpu.sh, run 1000 iterations on all CPUs
by default.
- Add conditionally -msse2 so that clang doesn't generate library
calls.
- Use cc-option to detect gcc 7.1 not supporting -mpreferred-stack-boundary=3 (amluto).
- Document stuff so that we don't forget.
- Fix:
ld: lib/test_fpu.o: in function `test_fpu_get':
>> test_fpu.c:(.text+0x16e): undefined reference to `__sanitizer_cov_trace_cmpd'
>> ld: test_fpu.c:(.text+0x1a7): undefined reference to `__sanitizer_cov_trace_cmpd'
ld: test_fpu.c:(.text+0x1e0): undefined reference to `__sanitizer_cov_trace_cmpd'
]
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Petteri Aimonen <jpa@git.mail.kapsi.fi>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lkml.kernel.org/r/20200624114646.28953-3-bp@alien8.de
Validate that BPF object with broken (in multiple ways) BPF program can still
be successfully loaded, if that broken BPF program is disabled.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200625232629.3444003-3-andriin@fb.com
A fix for a crash in nested KVM when CONFIG_DEBUG_VIRTUAL=y.
Two minor build fixes.
Thanks to:
Aneesh Kumar K.V, Arseny Solokha, Harish.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl73L58THG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgFHvD/9InTRjjJ8VPoZri5d4UNP1xPcAMF6P
u9KppOVJDHaRgI58xtWgfH3eXVg90OY/u6nVAfaXIlrhH5BCEnx/+gRRoQXBR8vE
MkoGevVCo6/2FpzAQHAQpaz5BrqipMA0EGBrUvK4+doWJf+lCcuTu4pXCawlbjr+
Hbc4NChEuqLYkEnWidQL+MxAoJH9Z1HiXmilWRj+8rG0HnMMH6oqDurxUhGmA6/I
1LHarXkPeWoxx1UQLMN5kDsTLdxCLmz9oqG5EON2vQjob7Zu8sv4WvEZjx7qibHy
yEe1YUO7xqv6OatGyWvF6FXq6H4udK1djlvFz4FBjOg6YPusrmSIDT5JLth3Q9Fi
P8Z1DsiHAljgq3bfwIFdFQRG35JAzV1LmEI89z6VADGPPZuIwUCoagnfnQUBaPfg
VTlnYJlUEmCDAH6R/OSUIOpVqmblCukeSIObvigcsyamPrAZZKl0PPw3ns+Uzykw
WJ1ukCyppscPwZ00+hByUXOmX6ZBMHoFakgpd3b09RUHiMbrjvpgaUFEwqSl1AsA
yJ83uqaywqjewKH0uMwqRtw7j/BfXa47hYYqNeBcxqu9h5ORMchAIB4rgbO2LDgr
ZsvhiSlBqEL8j5HnzVYEaFPniE6NWesx1JJLOD41fWN+Fa9SFzkVfZy6rQyhOLU8
rjQQ+iJJVFreqw==
=TvGN
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- A fix for a crash in nested KVM when CONFIG_DEBUG_VIRTUAL=y.
- Two minor build fixes.
Thanks to: Aneesh Kumar K.V, Arseny Solokha, Harish.
* tag 'powerpc-5.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
selftests/powerpc: Fix build failure in ebb tests
powerpc/kvm/book3s64: Fix kernel crash with nested kvm & DEBUG_VIRTUAL
powerpc/fsl_booke/32: Fix build with CONFIG_RANDOMIZE_BASE
- Fix unwinding through vDSO sigreturn trampoline
- Fix build warnings by raising minimum LD version for PAC
- Whitelist some Kryo Cortex-A55 derivatives for Meltdown and SSB
- Fix perf register PC reporting for compat tasks
- Fix 'make clean' warning for arm64 signal selftests
- Fix ftrace when BTI is compiled in
- Avoid building the compat vDSO using GCC plugins
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl71tgEQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNB5sB/48VLEeDtkRtHVQntLG9SFogwDkHjkRW/lo
kgO5APEcdhZZq3mBY2fIww5iX5Et7vRpx8ovempmqZGhO9B4ZMSNG0DFxoYdtXTU
jgox+LzkW+hYldK1Bv03ioLZgIz6Lc8zyK6kRB7NuDN88VEVds0ksYmcAojeIN9b
vmpquEAoVppm0VPjt6VA0xQ6HtiKfvlV7PW6Pqs0dKovnNY982jRXBMzaGBbDFQ7
3eKmW4PBru/Ew16J172vf/0sBJQBiZrSdXCqv/USKvPHkUDkJiYsaWLpsWx4m4to
bE/OS6aWx94NcgxPUca3y2G2OhPU+VFiXjuJ0kvzt4EJIuW/CGUf
=2kBR
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The big fix here is to our vDSO sigreturn trampoline as, after a
painfully long stint of debugging, it turned out that fixing some of
our CFI directives in the merge window lit up a bunch of logic in
libgcc which has been shown to SEGV in some cases during asynchronous
pthread cancellation.
It looks like we can fix this by extending the directives to restore
most of the interrupted register state from the sigcontext, but it's
risky and hard to test so we opted to remove the CFI directives for
now and rely on the unwinder fallback path like we used to.
- Fix unwinding through vDSO sigreturn trampoline
- Fix build warnings by raising minimum LD version for PAC
- Whitelist some Kryo Cortex-A55 derivatives for Meltdown and SSB
- Fix perf register PC reporting for compat tasks
- Fix 'make clean' warning for arm64 signal selftests
- Fix ftrace when BTI is compiled in
- Avoid building the compat vDSO using GCC plugins"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Add KRYO{3,4}XX silver CPU cores to SSB safelist
arm64: perf: Report the PC value in REGS_ABI_32 mode
kselftest: arm64: Remove redundant clean target
arm64: kpti: Add KRYO{3, 4}XX silver CPU cores to kpti safelist
arm64: Don't insert a BTI instruction at inner labels
arm64: vdso: Don't use gcc plugins for building vgettimeofday.c
arm64: vdso: Only pass --no-eh-frame-hdr when linker supports it
arm64: Depend on newer binutils when building PAC
arm64: compat: Remove 32-bit sigreturn code from the vDSO
arm64: compat: Always use sigpage for sigreturn trampoline
arm64: compat: Allow 32-bit vdso and sigpage to co-exist
arm64: vdso: Disable dwarf unwinding through the sigreturn trampoline
When separating out different phases of running tests[1]
(build/exec/parse/etc), the format of the KunitResult tuple changed
(adding an elapsed_time variable). This is not populated during a build
failure, causing kunit.py to crash.
This fixes [1] to probably populate the result variable, causing a
failing build to be reported properly.
[1]:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=45ba7a893ad89114e773b3dc32f6431354c465d6
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Currently, if the kernel is configured incorrectly or if it crashes before any
kunit tests are run, kunit finishes without error, reporting
that 0 test cases were run.
To fix this, an error is shown when the tap header is not found, which
indicates that kunit was not able to run at all.
Signed-off-by: Uriel Guajardo <urielguajardo@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Commit 8b59cd81dc ("kbuild: ensure full rebuild when the compiler is
updated") introduced a new CONFIG option CONFIG_CC_VERSION_TEXT. On my
system, this is set to "gcc (GCC) 10.1.0" which breaks KUnit config
parsing which did not like the spaces in the string.
Fix this by updating the regex to allow strings containing spaces.
Fixes: 8b59cd81dc ("kbuild: ensure full rebuild when the compiler is updated")
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Changeset 1eecbcdca2 ("docs: move protection-keys.rst to the core-api book")
from Jun 7, 2019 converted protection-keys.txt file to ReST.
A recent change at protection_keys.c partially reverted such
changeset, causing it to point to a non-existing file:
- * Tests x86 Memory Protection Keys (see Documentation/core-api/protection-keys.rst)
+ * Tests Memory Protection Keys (see Documentation/vm/protection-keys.txt)
It sounds to me that the changeset that introduced such change
4645e3563673 ("selftests/vm/pkeys: rename all references to pkru to a generic name")
could also have other side effects, as it sounds that it was not
generated against uptream code, but, instead, against a version
older than Jun 7, 2019.
Fixes: 4645e3563673 ("selftests/vm/pkeys: rename all references to pkru to a generic name")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/cf65aa052669f55b9dc976a5c8026aef5840741d.1592895969.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
We use OUTPUT directory as TMPOUT for checking no-pie option.
Since commit f2f02ebd8f ("kbuild: improve cc-option to clean up all
temporary files") when building powerpc/ from selftests directory, the
OUTPUT directory points to powerpc/pmu/ebb/ and gets removed when
checking for -no-pie option in try-run routine, subsequently build
fails with the following:
$ make -C powerpc
...
TARGET=ebb; BUILD_TARGET=$OUTPUT/$TARGET; mkdir -p $BUILD_TARGET; make OUTPUT=$BUILD_TARGET -k -C $TARGET all
make[2]: Entering directory '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb'
make[2]: *** No rule to make target 'Makefile'.
make[2]: Failed to remake makefile 'Makefile'.
make[2]: *** No rule to make target 'ebb.c', needed by '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb/reg_access_test'.
make[2]: *** No rule to make target 'ebb_handler.S', needed by '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb/reg_access_test'.
make[2]: *** No rule to make target 'trace.c', needed by '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb/reg_access_test'.
make[2]: *** No rule to make target 'busy_loop.S', needed by '/home/linux-master/tools/testing/selftests/powerpc/pmu/ebb/reg_access_test'.
make[2]: Target 'all' not remade because of errors.
Fix this by adding a suffix to the OUTPUT directory so that the
failure is avoided.
Fixes: 9686813f6e ("selftests/powerpc: Fix try-run when source tree is not writable")
Signed-off-by: Harish <harish@linux.ibm.com>
[mpe: Mention that commit that triggered the breakage]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200625165721.264904-1-harish@linux.ibm.com
Minor overlapping changes in xfrm_device.c, between the double
ESP trailing bug fix setting the XFRM_INIT flag and the changes
in net-next preparing for bonding encryption support.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Don't insert ESP trailer twice in IPSEC code, from Huy Nguyen.
2) The default crypto algorithm selection in Kconfig for IPSEC is out
of touch with modern reality, fix this up. From Eric Biggers.
3) bpftool is missing an entry for BPF_MAP_TYPE_RINGBUF, from Andrii
Nakryiko.
4) Missing init of ->frame_sz in xdp_convert_zc_to_xdp_frame(), from
Hangbin Liu.
5) Adjust packet alignment handling in ax88179_178a driver to match
what the hardware actually does. From Jeremy Kerr.
6) register_netdevice can leak in the case one of the notifiers fail,
from Yang Yingliang.
7) Use after free in ip_tunnel_lookup(), from Taehee Yoo.
8) VLAN checks in sja1105 DSA driver need adjustments, from Vladimir
Oltean.
9) tg3 driver can sleep forever when we get enough EEH errors, fix from
David Christensen.
10) Missing {READ,WRITE}_ONCE() annotations in various Intel ethernet
drivers, from Ciara Loftus.
11) Fix scanning loop break condition in of_mdiobus_register(), from
Florian Fainelli.
12) MTU limit is incorrect in ibmveth driver, from Thomas Falcon.
13) Endianness fix in mlxsw, from Ido Schimmel.
14) Use after free in smsc95xx usbnet driver, from Tuomas Tynkkynen.
15) Missing bridge mrp configuration validation, from Horatiu Vultur.
16) Fix circular netns references in wireguard, from Jason A. Donenfeld.
17) PTP initialization on recovery is not done properly in qed driver,
from Alexander Lobakin.
18) Endian conversion of L4 ports in filters of cxgb4 driver is wrong,
from Rahul Lakkireddy.
19) Don't clear bound device TX queue of socket prematurely otherwise we
get problems with ktls hw offloading, from Tariq Toukan.
20) ipset can do atomics on unaligned memory, fix from Russell King.
21) Align ethernet addresses properly in bridging code, from Thomas
Martitz.
22) Don't advertise ipv4 addresses on SCTP sockets having ipv6only set,
from Marcelo Ricardo Leitner.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (149 commits)
rds: transport module should be auto loaded when transport is set
sch_cake: fix a few style nits
sch_cake: don't call diffserv parsing code when it is not needed
sch_cake: don't try to reallocate or unshare skb unconditionally
ethtool: fix error handling in linkstate_prepare_data()
wil6210: account for napi_gro_receive never returning GRO_DROP
hns: do not cast return value of napi_gro_receive to null
socionext: account for napi_gro_receive never returning GRO_DROP
wireguard: receive: account for napi_gro_receive never returning GRO_DROP
vxlan: fix last fdb index during dump of fdb with nhid
sctp: Don't advertise IPv4 addresses if ipv6only is set on the socket
tc-testing: avoid action cookies with odd length.
bpf: tcp: bpf_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT
tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT
net: dsa: sja1105: fix tc-gate schedule with single element
net: dsa: sja1105: recalculate gating subschedule after deleting tc-gate rules
net: dsa: sja1105: unconditionally free old gating config
net: dsa: sja1105: move sja1105_compose_gating_subschedule at the top
net: macb: free resources on failure path of at91ether_open()
net: macb: call pm_runtime_put_sync on failure path
...
Update odd length cookie hexstrings in csum.json, tunnel_key.json and
bpf.json to be even length to comply with check enforced in commit
0149dabf2a1b ("tc: m_actions: check cookie hexstring len") in iproute2.
Signed-off-by: Briana Oursler <briana.oursler@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Apply the fix from:
"tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT"
to the BPF implementation of TCP CUBIC congestion control.
Repeating the commit description here for completeness:
Mirja Kuehlewind reported a bug in Linux TCP CUBIC Hystart, where
Hystart HYSTART_DELAY mechanism can exit Slow Start spuriously on an
ACK when the minimum rtt of a connection goes down. From inspection it
is clear from the existing code that this could happen in an example
like the following:
o The first 8 RTT samples in a round trip are 150ms, resulting in a
curr_rtt of 150ms and a delay_min of 150ms.
o The 9th RTT sample is 100ms. The curr_rtt does not change after the
first 8 samples, so curr_rtt remains 150ms. But delay_min can be
lowered at any time, so delay_min falls to 100ms. The code executes
the HYSTART_DELAY comparison between curr_rtt of 150ms and delay_min
of 100ms, and the curr_rtt is declared far enough above delay_min to
force a (spurious) exit of Slow start.
The fix here is simple: allow every RTT sample in a round trip to
lower the curr_rtt.
Fixes: 6de4a9c430 ("bpf: tcp: Add bpf_cubic example")
Reported-by: Mirja Kuehlewind <mirja.kuehlewind@ericsson.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adjust the SEC("xdp_devmap/") prog type prefix to contain a
slash "/" for expected attach type BPF_XDP_DEVMAP. This is consistent
with other prog types like tracing.
Fixes: 2778797037 ("libbpf: Add SEC name for xdp programs attached to device map")
Suggested-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/159309521882.821855.6873145686353617509.stgit@firesoul
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net, they are:
1) Unaligned atomic access in ipset, from Russell King.
2) Missing module description, from Rob Gill.
3) Patches to fix a module unload causing NULL pointer dereference in
xtables, from David Wilder. For the record, I posting here his cover
letter explaining the problem:
A crash happened on ppc64le when running ltp network tests triggered by
"rmmod iptable_mangle".
See previous discussion in this thread:
https://lists.openwall.net/netdev/2020/06/03/161 .
In the crash I found in iptable_mangle_hook() that
state->net->ipv4.iptable_mangle=NULL causing a NULL pointer dereference.
net->ipv4.iptable_mangle is set to NULL in +iptable_mangle_net_exit() and
called when ip_mangle modules is unloaded. A rmmod task was found running
in the crash dump. A 2nd crash showed the same problem when running
"rmmod iptable_filter" (net->ipv4.iptable_filter=NULL).
To fix this I added .pre_exit hook in all iptable_foo.c. The pre_exit will
un-register the underlying hook and exit would do the table freeing. The
netns core does an unconditional +synchronize_rcu after the pre_exit hooks
insuring no packets are in flight that have picked up the pointer before
completing the un-register.
These patches include changes for both iptables and ip6tables.
We tested this fix with ltp running iptables01.sh and iptables01.sh -6 a
loop for 72 hours.
4) Add a selftest for conntrack helper assignment, from Florian Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
These newly added macros will be used in subsequent bpf iterator
tcp{4,6} and udp{4,6} programs.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230819.3989050-1-yhs@fb.com
Refactor bpf_iter_ipv6_route.c and bpf_iter_netlink.c
so net macros, originally from various include/linux header
files, are moved to a new header file
bpf_tracing_net.h. The goal is to improve reuse so
networking tracing programs do not need to
copy these macros every time they use them.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230817.3988962-1-yhs@fb.com
Commit b9f4c01f3e ("selftest/bpf: Make bpf_iter selftest
compilable against old vmlinux.h") and Commit dda18a5c0b
("selftests/bpf: Convert bpf_iter_test_kern{3, 4}.c to define
own bpf_iter_meta") redefined newly introduced types
in bpf programs so the bpf program can still compile
properly with old kernels although loading may fail.
Since this patch set introduced new types and the same
workaround is needed, so let us move the workaround
to a separate header file so they do not clutter
bpf programs.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230816.3988656-1-yhs@fb.com
check that 'nft ... ct helper set <foo>' works:
1. configure ftp helper via nft and assign it to
connections on port 2121
2. check with 'conntrack -L' that the next connection
has the ftp helper attached to it.
Also add a test for auto-assign (old behaviour).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXvNBJgAKCRCRxhvAZXjc
oulGAPoCPfCguA8TPcy4tq4byGPoThyO4XnWR6XcUDOEzhbzzAEA+s5S7iRV8W92
p2gzbI4Kncq4dQNEtUvfPHQZDAEwTA0=
=eZDz
-----END PGP SIGNATURE-----
Merge tag 'for-linus-2020-06-24' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull thread fix from Christian Brauner:
"This fixes a regression introduced with 303cc571d1 ("nsproxy: attach
to namespaces via pidfds").
The LTP testsuite reported a regression where users would now see
EBADF returned instead of EINVAL when an fd was passed that referred
to an open file but the file was not a namespace file.
Fix this by continuing to report EINVAL and add a regression test"
* tag 'for-linus-2020-06-24' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
tests: test for setns() EINVAL regression
nsproxy: restore EINVAL for non-namespace file descriptor
This patch adds support of SO_KEEPALIVE flag and TCP related options
to bpf_setsockopt() routine. This is helpful if we want to enable or tune
TCP keepalive for applications which don't do it in the userspace code.
v3:
- update kernel-doc in uapi (Nikita Vetoshkin <nekto0n@yandex-team.ru>)
v4:
- update kernel-doc in tools too (Alexei Starovoitov)
- add test to selftests (Alexei Starovoitov)
Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200620153052.9439-3-zeil@yandex-team.ru
./test_progs-no_alu32 -t get_stack_raw_tp
fails due to:
52: (85) call bpf_get_stack#67
53: (bf) r8 = r0
54: (bf) r1 = r8
55: (67) r1 <<= 32
56: (c7) r1 s>>= 32
; if (usize < 0)
57: (c5) if r1 s< 0x0 goto pc+26
R0=inv(id=0,smax_value=800) R1_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff)) R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R8_w=inv(id=0,smax_value=800) R9=inv800
; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
58: (1f) r9 -= r8
; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
59: (bf) r2 = r7
60: (0f) r2 += r1
regs=1 stack=0 before 52: (85) call bpf_get_stack#67
; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
61: (bf) r1 = r6
62: (bf) r3 = r9
63: (b7) r4 = 0
64: (85) call bpf_get_stack#67
R0=inv(id=0,smax_value=800) R1_w=ctx(id=0,off=0,imm=0) R2_w=map_value(id=0,off=0,ks=4,vs=1600,umax_value=800,var_off=(0x0; 0x3ff),s32_max_value=1023,u32_max_value=1023) R3_w=inv(id=0,umax_value=9223372036854776608)
R3 unbounded memory access, use 'var &= const' or 'if (var < const)'
In the C code:
usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
if (usize < 0)
return 0;
ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
if (ksize < 0)
return 0;
We used to have problem with pointer arith in R2.
Now it's a problem with two integers in R3.
'if (usize < 0)' is comparing R1 and makes it [0,800], but R8 stays [-inf,800].
Both registers represent the same 'usize' variable.
Then R9 -= R8 is doing 800 - [-inf, 800]
so the result of "max_len - usize" looks unbounded to the verifier while
it's obvious in C code that "max_len - usize" should be [0, 800].
To workaround the problem convert ksize and usize variables from int to long.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The arm64 signal tests generate warnings during build since both they and
the toplevel lib.mk define a clean target:
Makefile:25: warning: overriding recipe for target 'clean'
../../lib.mk:126: warning: ignoring old recipe for target 'clean'
Since the inclusion of lib.mk is in the signal Makefile there is no
situation where this warning could be avoided so just remove the redundant
clean target.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20200624104933.21125-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Run rxtimestamp as part of TEST_PROGS. Analogous to other tests, add
new rxtimestamp.sh wrapper script, so that the test runs isolated
from background traffic in a private network namespace.
Also ignore failures of test case #6 by default. This case verifies
that a receive timestamp is not reported if timestamp reporting is
enabled for a socket, but generation is disabled. Receive timestamp
generation has to be enabled globally, as no associated socket is
known yet. A background process that enables rx timestamp generation
therefore causes a false positive. Ntpd is one example that does.
Add a "--strict" option to cause failure in the event that any test
case fails, including test #6. This is useful for environments that
are known to not have such background processes.
Tested:
make -C tools/testing/selftests TARGETS="net" run_tests
Signed-off-by: Tanner Love <tannerlove@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bpf_object__find_program_by_title(), used by CO-RE relocation code, doesn't
return .text "BPF program", if it is a function storage for sub-programs.
Because of that, any CO-RE relocation in helper non-inlined functions will
fail. Fix this by searching for .text-corresponding BPF program manually.
Adjust one of bpf_iter selftest to exhibit this pattern.
Fixes: ddc7c30426 ("libbpf: implement BPF CO-RE offset relocation algorithm")
Reported-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200619230423.691274-1-andriin@fb.com
Extend original variable-length tests with a case to catch a common
existing pattern of testing for < 0 for errors. Note because
verifier also tracks upper bounds and we know it can not be greater
than MAX_LEN here we can skip upper bound check.
In ALU64 enabled compilation converting from long->int return types
in probe helpers results in extra instruction pattern, <<= 32, s >>= 32.
The trade-off is the non-ALU64 case works. If you really care about
every extra insn (XDP case?) then you probably should be using original
int type.
In addition adding a sext insn to bpf might help the verifier in the
general case to avoid these types of tricks.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200623032224.4020118-3-andriin@fb.com
Add selftest that validates variable-length data reading and concatentation
with one big shared data array. This is a common pattern in production use for
monitoring and tracing applications, that potentially can read a lot of data,
but overall read much less. Such pattern allows to determine precisely what
amount of data needs to be sent over perfbuf/ringbuf and maximize efficiency.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200623032224.4020118-2-andriin@fb.com
Before, we took a reference to the creating netns if the new netns was
different. This caused issues with circular references, with two
wireguard interfaces swapping namespaces. The solution is to rather not
take any extra references at all, but instead simply invalidate the
creating netns pointer when that netns is deleted.
In order to prevent this from happening again, this commit improves the
rough object leak tracking by allowing it to account for created and
destroyed interfaces, aside from just peers and keys. That then makes it
possible to check for the object leak when having two interfaces take a
reference to each others' namespaces.
Fixes: e7096c131e ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Validate libbpf is able to handle weak and strong kernel symbol externs in BPF
code correctly.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20200619231703.738941-4-andriin@fb.com
Add a test that checks that pedit adjusts port numbers of tcp and udp
packets.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add selftests to test access to map pointers from bpf program for all
map types except struct_ops (that one would need additional work).
verifier test focuses mostly on scenarios that must be rejected.
prog_tests test focuses on accessing multiple fields both scalar and a
nested struct from bpf program and verifies that those fields have
expected values.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/139a6a17f8016491e39347849b951525335c6eb4.1592600985.git.rdna@fb.com
There are multiple use-cases when it's convenient to have access to bpf
map fields, both `struct bpf_map` and map type specific struct-s such as
`struct bpf_array`, `struct bpf_htab`, etc.
For example while working with sock arrays it can be necessary to
calculate the key based on map->max_entries (some_hash % max_entries).
Currently this is solved by communicating max_entries via "out-of-band"
channel, e.g. via additional map with known key to get info about target
map. That works, but is not very convenient and error-prone while
working with many maps.
In other cases necessary data is dynamic (i.e. unknown at loading time)
and it's impossible to get it at all. For example while working with a
hash table it can be convenient to know how much capacity is already
used (bpf_htab.count.counter for BPF_F_NO_PREALLOC case).
At the same time kernel knows this info and can provide it to bpf
program.
Fill this gap by adding support to access bpf map fields from bpf
program for both `struct bpf_map` and map type specific fields.
Support is implemented via btf_struct_access() so that a user can define
their own `struct bpf_map` or map type specific struct in their program
with only necessary fields and preserve_access_index attribute, cast a
map to this struct and use a field.
For example:
struct bpf_map {
__u32 max_entries;
} __attribute__((preserve_access_index));
struct bpf_array {
struct bpf_map map;
__u32 elem_size;
} __attribute__((preserve_access_index));
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(max_entries, 4);
__type(key, __u32);
__type(value, __u32);
} m_array SEC(".maps");
SEC("cgroup_skb/egress")
int cg_skb(void *ctx)
{
struct bpf_array *array = (struct bpf_array *)&m_array;
struct bpf_map *map = (struct bpf_map *)&m_array;
/* .. use map->max_entries or array->map.max_entries .. */
}
Similarly to other btf_struct_access() use-cases (e.g. struct tcp_sock
in net/ipv4/bpf_tcp_ca.c) the patch allows access to any fields of
corresponding struct. Only reading from map fields is supported.
For btf_struct_access() to work there should be a way to know btf id of
a struct that corresponds to a map type. To get btf id there should be a
way to get a stringified name of map-specific struct, such as
"bpf_array", "bpf_htab", etc for a map type. Two new fields are added to
`struct bpf_map_ops` to handle it:
* .map_btf_name keeps a btf name of a struct returned by map_alloc();
* .map_btf_id is used to cache btf id of that struct.
To make btf ids calculation cheaper they're calculated once while
preparing btf_vmlinux and cached same way as it's done for btf_id field
of `struct bpf_func_proto`
While calculating btf ids, struct names are NOT checked for collision.
Collisions will be checked as a part of the work to prepare btf ids used
in verifier in compile time that should land soon. The only known
collision for `struct bpf_htab` (kernel/bpf/hashtab.c vs
net/core/sock_map.c) was fixed earlier.
Both new fields .map_btf_name and .map_btf_id must be set for a map type
for the feature to work. If neither is set for a map type, verifier will
return ENOTSUPP on a try to access map_ptr of corresponding type. If
just one of them set, it's verifier misconfiguration.
Only `struct bpf_array` for BPF_MAP_TYPE_ARRAY and `struct bpf_htab` for
BPF_MAP_TYPE_HASH are supported by this patch. Other map types will be
supported separately.
The feature is available only for CONFIG_DEBUG_INFO_BTF=y and gated by
perfmon_capable() so that unpriv programs won't have access to bpf map
fields.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/6479686a0cd1e9067993df57b4c3eef0e276fec9.1592600985.git.rdna@fb.com
If the kernel erroneously allows WRGSBASE and user code writes a
negative value, paranoid_entry will get confused. Check for this by
writing a negative value to GSBASE and doing SYSENTER with TF set. A
successful run looks like:
[RUN] SYSENTER with TF, invalid state, and GSBASE < 0
[SKIP] Illegal instruction
A failed run causes a kernel hang, and I believe it's because we
double-fault and then get a never ending series of page faults and,
when we exhaust the double fault stack we double fault again,
starting the process over.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/f4f71efc91b9eae5e3dae21c9aee1c70cf5f370e.1590620529.git.luto@kernel.org
Extend the alignment handler selftest to exercise prefixed load store
instructions. Add tests for prefixed VSX, floating point and integer
instructions.
Skip prefix tests if ISA version does not support prefixed instructions.
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Tested-by: Alistair Popple <alistair@popple.id.au>
[mpe: Fixup PPC_FEATURE2_ARCH_3_1 naming as noted by Alistair]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200520021103.19798-2-jniethe5@gmail.com
The alignment handler selftest needs cache-inhibited memory and
currently /dev/fb0 is relied on to provided this. This prevents running
the test on systems without /dev/fb0 (e.g., mambo). Read the commandline
arguments for an optional path to be used instead, as well as an
optional offset to be for mmaping this path.
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Tested-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200520021103.19798-1-jniethe5@gmail.com
Since iproute2 commit f72c3ad00f3b ("tc: m_tunnel_key: add options
support for vxlan"), the geneve opt output use key word "geneve_opts"
instead of "geneve_opt". To make compatibility for both old and new
iproute2, let's accept both "geneve_opt" and "geneve_opts".
Suggested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Tested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new strict mode functionality is tested in different configurations and
on different network namespaces.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Few ptrace fixes mostly for strace and seccomp_bpf kernel tests
findings.
- Cleanup unused pm callbacks in virtio ccw.
- Replace kmalloc + memset with kzalloc in crypto.
- Use $(LD) for vDSO linkage to make clang happy.
- Fix vDSO clock_getres() to preserve the same behaviour as
posix_get_hrtimer_res().
- Fix workqueue cpumask warning when NUMA=n and nr_node_ids=2.
- Reduce SLSB writes during input processing, improve warnings and
cleanup qdio_data usage in qdio.
- Few fixes to use scnprintf() instead of snprintf().
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAl7uJy8ACgkQjYWKoQLX
FBitMwgAovHP6O19ZS2RE2Ps20CjM+z0sLLGHF6aMrV7OqmOWrNnFzN4jT2j42Ck
idSZ6sehVd3Uj6K8NnzrlSS3sjGRhVaQJEjjN+rLyw0HBwxspJJfW5HgcoMtqNH1
oo+nt+zw5jk+6MqHx4QEwTxN5rgGs6UMhiLIAIlkDu4bivgohvGUxe4RUrN/mINx
cdYqomCkvovLT5sBTaWyXKNCDAdAWgNpOfdqc9MjOUXSbUg3lrUol0gUULzenPo7
wUN+sZ0di0Ox0+2+4m8LU1av/kMTLSSvnR9DW5KdpGTon1nwpZcdJnhI5o1v7uaU
pIaMOYNieEHJ2DnieR9iBBSbGoNCmw==
=gkgN
-----END PGP SIGNATURE-----
Merge tag 's390-5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- a few ptrace fixes mostly for strace and seccomp_bpf kernel tests
findings
- cleanup unused pm callbacks in virtio ccw
- replace kmalloc + memset with kzalloc in crypto
- use $(LD) for vDSO linkage to make clang happy
- fix vDSO clock_getres() to preserve the same behaviour as
posix_get_hrtimer_res()
- fix workqueue cpumask warning when NUMA=n and nr_node_ids=2
- reduce SLSB writes during input processing, improve warnings and
cleanup qdio_data usage in qdio
- a few fixes to use scnprintf() instead of snprintf()
* tag 's390-5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: fix syscall_get_error for compat processes
s390/qdio: warn about unexpected SLSB states
s390/qdio: clean up usage of qdio_data
s390/numa: let NODES_SHIFT depend on NEED_MULTIPLE_NODES
s390/vdso: fix vDSO clock_getres()
s390/vdso: Use $(LD) instead of $(CC) to link vDSO
s390/protvirt: use scnprintf() instead of snprintf()
s390: use scnprintf() in sys_##_prefix##_##_name##_show
s390/crypto: use scnprintf() instead of snprintf()
s390/zcrypt: use kzalloc
s390/virtio: remove unused pm callbacks
s390/qdio: reduce SLSB writes during Input Queue processing
selftests/seccomp: s390 shares the syscall and return value register
s390/ptrace: fix setting syscall number
s390/ptrace: pass invalid syscall numbers to tracing
s390/ptrace: return -ENOSYS when invalid syscall is supplied
s390/seccomp: pass syscall arguments via seccomp_data
s390/qdio: fine-tune SLSB update
This Kselftest update for Linux 5.8-rc2 consists of:
- ftrace "requires:" list for simplifying and unifying requirement
checks for each test case, adding "requires:" line instead of
checking required ftrace interfaces in each test case.
- a minor spelling correction patch
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl7s440ACgkQCwJExA0N
Qxyl1A//YDbxzJlB6zrmSmagIKC0KGKDcSrAJNsV6K5lIEf7yPi2DkvQT6jpo27M
Xtv+GTF8nuAJRuRceawKp622o+jT7dd7dUt0FFInsj2xPwTLFy0E+AvZ5j2cqps7
gqcdOeHU/u6QrprUX4JCoOf0NdjJsKwzfOXiaOlBGG26bqMZFOnhbd/DclwWbStJ
gOIqMxp7feQ9yjm5ngSZVmQ6sbM+6ZwsywD1kXrSL68gQ+s3segnzDoaoHSFoGwj
lysOHbx3+/Q+pH18nnFAp6H9csr+iMVqlj1FGN62JUVhkaMdlFhnk/cpO97Lvuwf
nOU3BC0FWSjvquAhL2ZqGnfeXBoHSO9wfpOcC/Dyro4QozvlzigM1PBS3WoRbxzl
2ArZ985xhSqeuiJKa4CwLQWR495FOasm4gHBe9nVrqaiwAORZfdqeBgTJBTw8Y7K
W3+DmA4ttZE6QNZokfgpGLWT4K9PRm8F5ZdzUX8+sqjAQmlMp30LW7lhhAlbprqc
dKUAIB1dJwwT2gPsa/ntIf1SZEbMD8FlrAZDDBNZkId/IdgtRUrl9OCPGAwyrRJ7
pA4YxqBQqB1hwL51Dmk1KOlhePIfeDu9DFzUPgKETvuZA6ITmyM/artYaexcpDoF
7pvbLNsNsL7yuNF8y7wCSFj4IP8kPYRCBpmXR4hhugJI15M14dw=
=nqUq
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest cleanups from Shuah Khan:
- ftrace "requires:" list for simplifying and unifying requirement
checks for each test case, adding "requires:" line instead of
checking required ftrace interfaces in each test case.
- a minor spelling correction patch
* tag 'linux-kselftest-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/ftrace: Support ":README" suffix for requires
selftests/ftrace: Support ":tracer" suffix for requires
selftests/ftrace: Convert check_filter_file() with requires list
selftests/ftrace: Convert required interface checks into requires list
selftests/ftrace: Add "requires:" list support
selftests/ftrace: Return unsupported for the unconfigured features
selftests/ftrace: Allow ":" in description
tools: testing: ftrace: trigger: fix spelling mistake
The ETF qdisc can queue skbs that it could not pace on the errqueue.
Address a few issues in the selftest
- recv buffer size was too small, and incorrectly calculated
- compared errno to ee_code instead of ee_errno
- missed invalid request error type
v2:
- fix a few checkpatch --strict indentation warnings
Fixes: ea6a547669 ("selftests/net: make so_txtime more robust to timer variance")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added two test_verifier subtests for 32bit pointer/scalar arithmetic
with BPF_SUB operator. They are passing verifier now.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200618234632.3321367-1-yhs@fb.com
Make it bit easier to parse the kernel logs during the selftests by
adding a "===== TEST: $test =====" delimiter when each individual test
begins.
Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Reviewed-by: Yannick Cote <ycote@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200618181040.21132-4-joe.lawrence@redhat.com
The livepatch selftests currently grep on "taints" to filter out
"tainting kernel with TAINT_LIVEPATCH" messages which may be logged when
loading livepatch modules.
Further filter the log to drop "loading out-of-tree module taints
kernel" in the rare case the klp_test modules have been built
out-of-tree.
Look for the longer "taints kernel" or "tainting kernel" strings to
avoid inadvertent partial matching.
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Reviewed-by: Yannick Cote <ycote@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200618181040.21132-3-joe.lawrence@redhat.com
Inspired by commit f131d9edc2 ("selftests/lkdtm: Don't clear dmesg
when running tests"), keep a reference dmesg copy when beginning each
test. This way check_result() can compare against the initial copy
rather than relying upon an empty log.
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Reviewed-by: Yannick Cote <ycote@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200618181040.21132-2-joe.lawrence@redhat.com
This validates that GS selector and base are independently preserved in
ptrace commands.
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200528201402.1708239-17-sashal@kernel.org
The test validates that the selector is not changed when a ptracer writes
the ptracee's GS base.
Originally-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200528201402.1708239-16-sashal@kernel.org
Alexei Starovoitov says:
====================
pull-request: bpf 2020-06-17
The following pull-request contains BPF updates for your *net* tree.
We've added 10 non-merge commits during the last 2 day(s) which contain
a total of 14 files changed, 158 insertions(+), 59 deletions(-).
The main changes are:
1) Important fix for bpf_probe_read_kernel_str() return value, from Andrii.
2) [gs]etsockopt fix for large optlen, from Stanislav.
3) devmap allocation fix, from Toke.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We are relying on the fact, that we can pass > sizeof(int) optvals
to the SOL_IP+IP_FREEBIND option (the kernel will take first 4 bytes).
In the BPF program we check that we can only touch PAGE_SIZE bytes,
but the real optlen is PAGE_SIZE * 2. In both cases, we override it to
some predefined value and trim the optlen.
Also, let's modify exiting IP_TOS usecase to test optlen=0 case
where BPF program just bypasses the data as is.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200617010416.93086-2-sdf@google.com
Verify that setns() reports EINVAL when an fd is passed that refers to an
open file but the file is not a file descriptor useable to interact with
namespaces.
Cc: Jan Stancek <jstancek@redhat.com>
Cc: Cyril Hrubis <chrubis@suse.cz>
Link: https://lore.kernel.org/lkml/20200615085836.GR12456@shao2-debian
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This adds basic tests for the new close_range() syscall.
- test that no invalid flags can be passed
- test that a range of file descriptors is correctly closed
- test that a range of file descriptors is correctly closed if there there
are already closed file descriptors in the range
- test that max_fd is correctly capped to the current fdtable maximum
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jann Horn <jannh@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Dmitry V. Levin <ldv@altlinux.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: linux-api@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Add ":README" suffix support for the requires list, so that
the testcase can list up the required string for README file
to the requires list.
Note that the required string is treated as a fixed string,
instead of regular expression. Also, the testcase can specify
a string containing spaces with quotes. E.g.
# requires: "place: [<module>:]<symbol>":README
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Add ":tracer" suffix support for the requires list, so that
the testcase can list up the required tracer (e.g. function)
to the requires list.
For example, if the testcase requires function_graph tracer,
it can write requires list as below instead of checking
available_tracers.
# requires: function_graph:tracer
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Since check_filter_file() is basically checking the filter
tracefs file, we can convert it into requires list.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Introduce "requires:" list to check required ftrace interface
for each test. This will simplify the interface checking code
and unify the error message. Another good point is, it can
skip the ftrace initializing.
Note that this requires list must be written as a shell
comment.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
As same as other test cases, return unsupported if kprobe_events
or argument access feature are not found.
There can be a new arch which does not port those features yet,
and an older kernel which doesn't support it.
Those can not enable the features.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Allow ":" in the description line. Currently if there is ":"
in the test description line, the description is cut at that
point, but that was unintended.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
s390 cannot set syscall number and reture code at the same time,
so set the appropriate flag to indicate it.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].
[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Pull networking fixes from David Miller:
1) Fix cfg80211 deadlock, from Johannes Berg.
2) RXRPC fails to send norigications, from David Howells.
3) MPTCP RM_ADDR parsing has an off by one pointer error, fix from
Geliang Tang.
4) Fix crash when using MSG_PEEK with sockmap, from Anny Hu.
5) The ucc_geth driver needs __netdev_watchdog_up exported, from
Valentin Longchamp.
6) Fix hashtable memory leak in dccp, from Wang Hai.
7) Fix how nexthops are marked as FDB nexthops, from David Ahern.
8) Fix mptcp races between shutdown and recvmsg, from Paolo Abeni.
9) Fix crashes in tipc_disc_rcv(), from Tuong Lien.
10) Fix link speed reporting in iavf driver, from Brett Creeley.
11) When a channel is used for XSK and then reused again later for XSK,
we forget to clear out the relevant data structures in mlx5 which
causes all kinds of problems. Fix from Maxim Mikityanskiy.
12) Fix memory leak in genetlink, from Cong Wang.
13) Disallow sockmap attachments to UDP sockets, it simply won't work.
From Lorenz Bauer.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
net: ethernet: ti: ale: fix allmulti for nu type ale
net: ethernet: ti: am65-cpsw-nuss: fix ale parameters init
net: atm: Remove the error message according to the atomic context
bpf: Undo internal BPF_PROBE_MEM in BPF insns dump
libbpf: Support pre-initializing .bss global variables
tools/bpftool: Fix skeleton codegen
bpf: Fix memlock accounting for sock_hash
bpf: sockmap: Don't attach programs to UDP sockets
bpf: tcp: Recv() should return 0 when the peer socket is closed
ibmvnic: Flush existing work items before device removal
genetlink: clean up family attributes allocations
net: ipa: header pad field only valid for AP->modem endpoint
net: ipa: program upper nibbles of sequencer type
net: ipa: fix modem LAN RX endpoint id
net: ipa: program metadata mask differently
ionic: add pcie_print_link_status
rxrpc: Fix race between incoming ACK parser and retransmitter
net/mlx5: E-Switch, Fix some error pointer dereferences
net/mlx5: Don't fail driver on failure to create debugfs
net/mlx5e: CT: Fix ipv6 nat header rewrite actions
...
Alexei Starovoitov says:
====================
pull-request: bpf 2020-06-12
The following pull-request contains BPF updates for your *net* tree.
We've added 26 non-merge commits during the last 10 day(s) which contain
a total of 27 files changed, 348 insertions(+), 93 deletions(-).
The main changes are:
1) sock_hash accounting fix, from Andrey.
2) libbpf fix and probe_mem sanitizing, from Andrii.
3) sock_hash fixes, from Jakub.
4) devmap_val fix, from Jesper.
5) load_bytes_relative fix, from YiFei.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove invalid assumption in libbpf that .bss map doesn't have to be updated
in kernel. With addition of skeleton and memory-mapped initialization image,
.bss doesn't have to be all zeroes when BPF map is created, because user-code
might have initialized those variables from user-space.
Fixes: eba9c5f498 ("libbpf: Refactor global data map initialization")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200612194504.557844-1-andriin@fb.com
- Loongson port
PPC:
- Fixes
ARM:
- Fixes
x86:
- KVM_SET_USER_MEMORY_REGION optimizations
- Fixes
- Selftest fixes
The guest side of the asynchronous page fault work has been delayed to 5.9
in order to sync with Thomas's interrupt entry rework.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl7icj4UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPHGQgAj9+5j+f5v06iMP/+ponWwsVfh+5/
UR1gPbpMSFMKF0U+BCFxsBeGKWPDiz9QXaLfy6UGfOFYBI475Su5SoZ8/i/o6a2V
QjcKIJxBRNs66IG/774pIpONY8/mm/3b6vxmQktyBTqjb6XMGlOwoGZixj/RTp85
+uwSICxMlrijg+fhFMwC4Bo/8SFg+FeBVbwR07my88JaLj+3cV/NPolG900qLSa6
uPqJ289EQ86LrHIHXCEWRKYvwy77GFsmBYjKZH8yXpdzUlSGNexV8eIMAz50figu
wYRJGmHrRqwuzFwEGknv8SA3s2HVggXO4WVkWWCeJyO8nIVfYFUhME5l6Q==
=+Hh0
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM updates from Paolo Bonzini:
"The guest side of the asynchronous page fault work has been delayed to
5.9 in order to sync with Thomas's interrupt entry rework, but here's
the rest of the KVM updates for this merge window.
MIPS:
- Loongson port
PPC:
- Fixes
ARM:
- Fixes
x86:
- KVM_SET_USER_MEMORY_REGION optimizations
- Fixes
- Selftest fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (62 commits)
KVM: x86: do not pass poisoned hva to __kvm_set_memory_region
KVM: selftests: fix sync_with_host() in smm_test
KVM: async_pf: Inject 'page ready' event only if 'page not present' was previously injected
KVM: async_pf: Cleanup kvm_setup_async_pf()
kvm: i8254: remove redundant assignment to pointer s
KVM: x86: respect singlestep when emulating instruction
KVM: selftests: Don't probe KVM_CAP_HYPERV_ENLIGHTENED_VMCS when nested VMX is unsupported
KVM: selftests: do not substitute SVM/VMX check with KVM_CAP_NESTED_STATE check
KVM: nVMX: Consult only the "basic" exit reason when routing nested exit
KVM: arm64: Move hyp_symbol_addr() to kvm_asm.h
KVM: arm64: Synchronize sysreg state on injecting an AArch32 exception
KVM: arm64: Make vcpu_cp1x() work on Big Endian hosts
KVM: arm64: Remove host_cpu_context member from vcpu structure
KVM: arm64: Stop sparse from moaning at __hyp_this_cpu_ptr
KVM: arm64: Handle PtrAuth traps early
KVM: x86: Unexport x86_fpu_cache and make it static
KVM: selftests: Ignore KVM 5-level paging support for VM_MODE_PXXV48_4K
KVM: arm64: Save the host's PtrAuth keys in non-preemptible context
KVM: arm64: Stop save/restoring ACTLR_EL1
KVM: arm64: Add emulation for 32bit guests accessing ACTLR2
...
It was reported that older GCCs compile smm_test in a way that breaks
it completely:
kvm_exit: reason EXIT_CPUID rip 0x4014db info 0 0
func 7ffffffd idx 830 rax 0 rbx 0 rcx 0 rdx 0, cpuid entry not found
...
kvm_exit: reason EXIT_MSR rip 0x40abd9 info 0 0
kvm_msr: msr_read 487 = 0x0 (#GP)
...
Note, '7ffffffd' was supposed to be '80000001' as we're checking for
SVM. Dropping '-O2' from compiler flags help. Turns out, asm block in
sync_with_host() is wrong. We us 'in 0xe, %%al' instruction to sync
with the host and in 'AL' register we actually pass the parameter
(stage) but after sync 'AL' gets written to but GCC thinks the value
is still there and uses it to compute 'EAX' for 'cpuid'.
smm_test can't fully use standard ucall() framework as we need to
write a very simple SMI handler there. Fix the immediate issue by
making RAX input/output operand. While on it, make sync_with_host()
static inline.
Reported-by: Marcelo Bandeira Condotta <mcondotta@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200610164116.770811-1-vkuznets@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM_CAP_HYPERV_ENLIGHTENED_VMCS will be reported as supported even when
nested VMX is not, fix evmcs_test/hyperv_cpuid tests to check for both.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200610135847.754289-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
state_test/smm_test use KVM_CAP_NESTED_STATE check as an indicator for
nested VMX/SVM presence and this is incorrect. Check for the required
features dirrectly.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200610135847.754289-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When cgroup_skb/egress triggers the MAC header is not set. Added a
test that asserts reading MAC header is a -EFAULT but NET header
succeeds. The test result from within the eBPF program is stored in
an 1-element array map that the userspace then reads and asserts on.
Another assertion is added that reading from a large offset, past
the end of packet, returns -EFAULT.
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/9028ccbea4385a620e69c0a104f469ffd655c01e.1591812755.git.zhuyifei@google.com
The loop exits with "timeout" set to -1 and not to 0 so the test needs to
be fixed.
Fixes: e7b592f6caca ("khugepaged: add self test")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Zi Yan <ziy@nvidia.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Link: http://lkml.kernel.org/r/20200605110736.GH978434@mwanda
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
getopt_long requires the last element to be filled with zeros.
Otherwise, passing an unrecognized option can cause a segfault.
Fixes: 16e7812241 ("selftests/net: Add a test to validate behavior of rx timestamps")
Signed-off-by: Tanner Love <tannerlove@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using BPF_PROG_ATTACH to attach a program to a cgroup in
BPF_F_ALLOW_MULTI mode, it is not possible to replace a program
with itself. This is because the check for duplicate programs
doesn't take the replacement program into account.
Replacing a program with itself might seem weird, but it has
some uses: first, it allows resetting the associated cgroup storage.
Second, it makes the API consistent with the non-ALLOW_MULTI usage,
where it is possible to replace a program with itself. Third, it
aligns BPF_PROG_ATTACH with bpf_link, where replacing itself is
also supported.
Sice this code has been refactored a few times this change will
only apply to v5.7 and later. Adjustments could be made to
commit 1020c1f24a ("bpf: Simplify __cgroup_bpf_attach") and
commit d7bf2c10af ("bpf: allocate cgroup storage entries on attaching bpf programs")
as well as commit 324bda9e6c ("bpf: multi program support for cgroup+bpf")
Fixes: af6eea5743 ("bpf: Implement bpf_link-based cgroup BPF program attachment")
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200608162202.94002-1-lmb@cloudflare.com
No new features this release. Mostly clean ups, restructuring and
documentation.
- Have ftrace_bug() show ftrace errors before the WARN, as the WARN will
reboot the box before the error messages are printed if panic_on_warn
is set.
- Have traceoff_on_warn disable tracing sooner (before prints)
- Write a message to the trace buffer that its being disabled when
disable_trace_on_warning() is set.
- Separate out synthetic events from histogram code to let it be used by
other parts of the kernel.
- More documentation on histogram design.
- Other small fixes and clean ups.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXt+LEhQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qj2zAP9sD/W4jafYayucj+MvRP7sy+Q0iAH7
WMn8fkk958cgfQD8D1QFtkkx+3O3TRT6ApGf11w5+JgSWUE2gSbW9H4fPQk=
=X5t4
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"No new features this release. Mostly clean ups, restructuring and
documentation.
- Have ftrace_bug() show ftrace errors before the WARN, as the WARN
will reboot the box before the error messages are printed if
panic_on_warn is set.
- Have traceoff_on_warn disable tracing sooner (before prints)
- Write a message to the trace buffer that its being disabled when
disable_trace_on_warning() is set.
- Separate out synthetic events from histogram code to let it be used
by other parts of the kernel.
- More documentation on histogram design.
- Other small fixes and clean ups"
* tag 'trace-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Remove obsolete PREEMPTIRQ_EVENTS kconfig option
tracing/doc: Fix ascii-art in histogram-design.rst
tracing: Add a trace print when traceoff_on_warning is triggered
ftrace,bug: Improve traceoff_on_warn
selftests/ftrace: Distinguish between hist and synthetic event checks
tracing: Move synthetic events to a separate file
tracing: Fix events.rst section numbering
tracing/doc: Fix typos in histogram-design.rst
tracing: Add hist_debug trace event files for histogram debugging
tracing: Add histogram-design document
tracing: Check state.disabled in synth event trace functions
tracing/probe: reverse arguments to list_add
tools/bootconfig: Add a summary of test cases and return error
ftrace: show debugging information when panic_on_warn set
This Kunit update for Linux 5.8-rc1 consists of:
- Several config fragment fixes from Anders Roxell to improve
test coverage.
- Improvements to kunit run script to use defconfig as default and
restructure the code for config/build/exec/parse from Vitor Massaru Iha
and David Gow.
- Miscellaneous documentation warn fix.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl7etrcACgkQCwJExA0N
QxzGYg/+KHpPhB31IAjNFKCRqwDooftst3dohhzguxJLpDHdEmVJ4moQhLr4gL+/
qpi3T9hr4Rx++n/A5NoxDvyJvGr+FAL40U+Of7F2UyHpqQmfKPj37I+yvyeR1JEL
z4+yXEpfQLZaQkmZ7f3GWHyqN3+xwvyTEy7NYUad7xMxLF/99No+I6RMD6yp3srS
wUUeuBIesSFT0LXYrgI+wgsNGUESlj/McjiP5eMj6UtlMgKpzmfzH56Fia8uw1pw
6QtpntxDHjtxVfp8YKM4qExI54YI2t6sgHTIoOUsMWD5Q2kHd8kNf1L+lb1sKYUF
j7lzol5nuqqchAVQYjHzNHa8XKndvexGyWMsPz1gAnkpgVrvBTSFcavdDpDuDZ0T
HoJZnk9XPsguBQjDxapzPYfAQ81Un/rEmZQ8/X2TaNjdSIH1hHljhaP2OZ6eND/Q
iobq9x8nC9D95TIqjDbRw3Sp2na/pZLN8Gp27hmKlc+L1XzV8NuZe/WGOUe3lsrq
fG1ZSLo/iRau8gHuF6fRSrGIzQSCEMGKl3jlQ28OT9HGMAgTlncEwVzQId48/AsS
UOY+bIAnRZuK+B5F/vw6L3o1e3c17z5bruVlb0M0alP5b7P9/3WLNHsHA3r8haZF
F6PwIWu41wdRjJf2HI7zD5LaQe/7oU3jfwvuA7n2z8Py+zGx7m4=
=S+HY
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-kunit-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kunit updates from Shuah Khan:
"This consists of:
- Several config fragment fixes from Anders Roxell to improve test
coverage.
- Improvements to kunit run script to use defconfig as default and
restructure the code for config/build/exec/parse from Vitor Massaru
Iha and David Gow.
- Miscellaneous documentation warn fix"
* tag 'linux-kselftest-kunit-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
security: apparmor: default KUNIT_* fragments to KUNIT_ALL_TESTS
fs: ext4: default KUNIT_* fragments to KUNIT_ALL_TESTS
drivers: base: default KUNIT_* fragments to KUNIT_ALL_TESTS
lib: Kconfig.debug: default KUNIT_* fragments to KUNIT_ALL_TESTS
kunit: default KUNIT_* fragments to KUNIT_ALL_TESTS
kunit: Kconfig: enable a KUNIT_ALL_TESTS fragment
kunit: Fix TabError, remove defconfig code and handle when there is no kunitconfig
kunit: use KUnit defconfig by default
kunit: use --build_dir=.kunit as default
Documentation: test.h - fix warnings
kunit: kunit_tool: Separate out config/build/exec/parse
This Kselftest update for Linux 5.8-rc1 consists of:
- Several fixes from Masami Hiramatsu to improve coverage for
lib and sysctl tests.
- Clean up to vdso test and a new test for getcpu() from Mark Brown.
- Add new gen_tar selftests Makefile target generate selftest package
running "make gen_tar" in selftests directory from Veronika Kabatova.
- Other miscellaneous fixes to timens, exec, tpm2 tests.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl7ernMACgkQCwJExA0N
QxzgRg/+PYwHxTtKYX2VyQ533oTUXMdTAW8el8M72/ngRBRJMtkjYu9Jc5PfAaP9
vsyDrC8Kz6D3Nw0Gf+nH2LFJ2hJs0OooBb4HqouAyyNitXGkSiyGb6+1ICmW0Ro8
NIlwCYrIuoPUKS9ivRoKs0Vsnt0xh12asTlEwUPEEjmvE7Jf+6eoWIExu7i3nSnc
symOpc3ElCXex4AlRIMI07naQqlT7rtOSQMDpjEXzMMW4rKMlgO8dFCvLkZQh/lj
gUt45W9vF/Sem6a9jRMPrb8vEGkC2WURNsUVbl4JUbKp8vAXr5bYUD9CMXNdK6r2
o241nNScjr9RkZoN94DtXfcQbpwnvnhomJzqe/BgKVhfROhv4xf6fdvkxnAMYeXD
J/szKvLkKkvZ5O19rBqEUCXxYdiSgFOAgHOvnUKfjKKvhMrTQ//3r8L4K6qv489A
PAyoLuu8GjEVxX2dISB01ASwHeEUTZ6uF2wsu5i2dVWfTAIIpeL2Xnjsa+WmIcH7
iwBI/2RCuIeC3hViwGKuiSKaN70E/rgXcKO8jxnsF/2+hZnCuMg1D7PJLdIomJjI
6EiPukLGkZxJcHK2kdJSXIR0elEZ4pH7kweanIDPTxDiUqGwK7ZAFrA2feDJ11Iz
dye9uC7sjGh27FRbVvoDvZ9659SZ1cqYS+wRcTkPx3K1wS1kuZI=
=IB5c
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest updates from Shuah Khan:
"This consists of:
- Several fixes from Masami Hiramatsu to improve coverage for lib and
sysctl tests.
- Clean up to vdso test and a new test for getcpu() from Mark Brown.
- Add new gen_tar selftests Makefile target generate selftest package
running "make gen_tar" in selftests directory from Veronika
Kabatova.
- Other miscellaneous fixes to timens, exec, tpm2 tests"
* tag 'linux-kselftest-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/sysctl: Make sysctl test driver as a module
selftests/sysctl: Fix to load test_sysctl module
lib: Make test_sysctl initialized as module
lib: Make prime number generator independently selectable
selftests/ftrace: Return unsupported if no error_log file
selftests/ftrace: Use printf for backslash included command
selftests/timens: handle a case when alarm clocks are not supported
Kernel selftests: Add check if TPM devices are supported
selftests: vdso: Add a selftest for vDSO getcpu()
selftests: vdso: Use a header file to prototype parse_vdso API
selftests: vdso: Rename vdso_test to vdso_test_gettimeofday
selftests/exec: Verify execve of non-regular files fail
selftests: introduce gen_tar Makefile target
Explicitly set the VA width to 48 bits for the x86_64-only PXXV48_4K VM
mode instead of asserting the guest VA width is 48 bits. The fact that
KVM supports 5-level paging is irrelevant unless the selftests opt-in to
5-level paging by setting CR4.LA57 for the guest. The overzealous
assert prevents running the selftests on a kernel with 5-level paging
enabled.
Incorporate LA57 into the assert instead of removing the assert entirely
as a sanity check of KVM's CPUID output.
Fixes: 567a9f1e9d ("KVM: selftests: Introduce VM_MODE_PXXV48_4K")
Reported-by: Sergio Perez Gonzalez <sergio.perez.gonzalez@intel.com>
Cc: Adriana Cervantes Jimenez <adriana.cervantes.jimenez@intel.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200528021530.28091-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If user passed an interface option longer than 15 characters, then
device.ifr_name and hwtstamp.ifr_name became non-null-terminated
strings. The compiler warned about this:
timestamping.c:353:2: warning: ‘strncpy’ specified bound 16 equals \
destination size [-Wstringop-truncation]
353 | strncpy(device.ifr_name, interface, sizeof(device.ifr_name));
Fixes: cb9eff0978 ("net: new user space API for time stamping of incoming and outgoing packets")
Signed-off-by: Tanner Love <tannerlove@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This selftest tests for cases where sendfile's 'count'
parameter is provided with a size greater than the intended
file size.
Motivation: When sendfile is provided with 'count' parameter
value that is greater than the size of the file, kTLS example
fails to send the file correctly. Last chunk of the file is
not sent, and the data integrity is compromised.
The reason is that the last chunk has MSG_MORE flag set
because of which it gets added to pending records, but is
not pushed.
Note that if user space were to send SSL_shutdown control
message, pending records would get flushed and the issue
would not happen. So a shutdown control message following
sendfile can mask the issue.
Signed-off-by: Pooja Trivedi <pooja.trivedi@stackpath.com>
Signed-off-by: Mallesham Jatharkonda <mallesham.jatharkonda@oneconvergence.com>
Signed-off-by: Josh Tway <josh.tway@stackpath.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge still more updates from Andrew Morton:
"Various trees. Mainly those parts of MM whose linux-next dependents
are now merged. I'm still sitting on ~160 patches which await merges
from -next.
Subsystems affected by this patch series: mm/proc, ipc, dynamic-debug,
panic, lib, sysctl, mm/gup, mm/pagemap"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (52 commits)
doc: cgroup: update note about conditions when oom killer is invoked
module: move the set_fs hack for flush_icache_range to m68k
nommu: use flush_icache_user_range in brk and mmap
binfmt_flat: use flush_icache_user_range
exec: use flush_icache_user_range in read_code
exec: only build read_code when needed
m68k: implement flush_icache_user_range
arm: rename flush_cache_user_range to flush_icache_user_range
xtensa: implement flush_icache_user_range
sh: implement flush_icache_user_range
asm-generic: add a flush_icache_user_range stub
mm: rename flush_icache_user_range to flush_icache_user_page
arm,sparc,unicore32: remove flush_icache_user_range
riscv: use asm-generic/cacheflush.h
powerpc: use asm-generic/cacheflush.h
openrisc: use asm-generic/cacheflush.h
m68knommu: use asm-generic/cacheflush.h
microblaze: use asm-generic/cacheflush.h
ia64: use asm-generic/cacheflush.h
hexagon: use asm-generic/cacheflush.h
...
Testing is done by a new parameter debug.test_sysctl.boot_int which
defaults to 0 and it's expected that the tester passes a boot parameter
that sets it to 1. The test checks if it's set to 1.
To distinguish true failure from parameter not being set, the test
checks /proc/cmdline for the expected parameter, and whether test_sysctl
is built-in and not a module.
[vbabka@suse.cz: skip the new test if boot_int sysctl is not present]
Link: http://lkml.kernel.org/r/305af605-1e60-cf84-fada-6ce1ca37c102@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Guilherme G . Piccoli" <gpiccoli@canonical.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200427180433.7029-6-vbabka@suse.cz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The testing script recommends CONFIG_TEST_SYSCTL=y, but actually only
works with CONFIG_TEST_SYSCTL=m. Testing of sysctl setting via boot
param however requires the test to be built-in, so make sure the test
script supports it.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Guilherme G . Piccoli" <gpiccoli@canonical.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200427180433.7029-5-vbabka@suse.cz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix test race, in which background poll can get either 5 or 6 samples,
depending on timing of notification. Prevent this by open-coding sample
triggering and forcing notification for the very last sample only.
Also switch to using atomic increments and exchanges for more obviously
reliable counting and checking. Additionally, check expected processed sample
counters for single-threaded use cases as well.
Fixes: 9a5f25ad30 ("selftests/bpf: Fix sample_cnt shared between two threads")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200608003615.3549991-1-andriin@fb.com
This change makes the test feel more familiar with narrowing to a
typical usage by operating on a number of identical structure instances
and populating the same two new shadow variables symmetrically while
keeping the same testing and verification criteria for the extra
variables.
Signed-off-by: Yannick Cote <ycote@redhat.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200603182058.109470-4-ycote@redhat.com
The test-klp-callbacks script includes a few tests which rely on kernel
task timings that may not always execute as expected under system load.
These may generate out of sequence kernel log messages that result in
test failure.
Instead of using sleep timing windows to orchestrate these tests, add a
block_transition module parameter to communicate the test purpose and
utilize flush_queue() to serialize the test module's task output.
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200603182058.109470-2-ycote@redhat.com
for ntb tests
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEoE9b9c3U2JxX98mqbmZLrHqL0iMFAl7a3aUACgkQbmZLrHqL
0iPmmA//fZ5PuqEfAjsCQjxQjVsvh195pBPL4vcwtpu9R7xDKztoRqOMzHbmvLXK
db+E9erWPFESjJFMqH4u3kAVIGvKSRkbjsVH7rhhdgabjB6IAs4nJr+ucvOD1fp+
OO3AJl8cedJurj5yUhCEJ13lT3Y/90YqJLdtqkAi0m9iABH7J54SmxZosVj1XUBt
PoIyF1PGXCeVv+v0VTjRsm67kGL4K3dggOPJFPZ56trhLshOlCrcaRt/MzVVMAud
P9ZU9h02sp62E87anUhe6TsR6G0BgRbOvvX39VtxoaJjfoMFEBGFzEPEj+3V1tfa
jeSM3jE9sCvbFFxuarvyHNoCRY4lntGjzP8lM1sCatSjp5mJnEFSC3tSGyY+cAFr
LB2How8Bikrq/PQ/H768UXL9ChYv+T5hsHRcz4yllKkyl9OwJAUpqlvBMJUNIMu3
Yvrhj9oG6EH28dK7nuzNxXIPPjBgkbetCK/jhfn6XZT9jP2p5iXv4qA3bjCsn11E
0cPCXVwMAkwcgVaTuPWdNFILXGfijcwfpBlsgHak0MvureQz+ANVJqWpZwJyWQB5
aiLr0xzW9qTVfX+vGAopHAoFD2If1eS/wTqqXF5TYbZT8/cuwjzGEl8aIPEP1ldz
Jyy/tVK97Lk8S6ZXceQucugAy4CKAIcRmlulkxYjH6fbVf2jyfo=
=cp25
-----END PGP SIGNATURE-----
Merge tag 'ntb-5.8' of git://github.com/jonmason/ntb
Pull NTB updates from Jon Mason:
"Intel Icelake NTB support, Intel driver bug fixes, and lots of bug
fixes for ntb tests"
* tag 'ntb-5.8' of git://github.com/jonmason/ntb:
NTB: ntb_test: Fix bug when counting remote files
NTB: perf: Fix race condition when run with ntb_test
NTB: perf: Fix support for hardware that doesn't have port numbers
NTB: perf: Don't require one more memory window than number of peers
NTB: ntb_pingpong: Choose doorbells based on port number
NTB: Fix the default port and peer numbers for legacy drivers
NTB: Revert the change to use the NTB device dev for DMA allocations
NTB: ntb_tool: reading the link file should not end in a NULL byte
ntb_perf: avoid false dma unmap of destination address
ntb_perf: increase sleep time from one milli sec to one sec
ntb_tool: pass correct struct device to dma_alloc_coherent
ntb_perf: pass correct struct device to dma_alloc_coherent
ntb: hw: remove the code that sets the DMA mask
NTB: correct ntb_peer_spad_addr and ntb_peer_spad_read comment typos
ntb: intel: fix static declaration
ntb: intel: add hw workaround for NTB BAR alignment
ntb: intel: Add Icelake (gen4) support for Intel NTB
NTB: Fix static check warning in perf_clear_test
include/ntb: Fix typo in ntb_unregister_device description
When remote files are counted in get_files_count, without using SSH,
the code returns 0 because there is a colon prepended to $LOC. $VPATH
should have been used instead of $LOC.
Fixes: 06bd0407d0 ("NTB: ntb_test: Update ntb_tool Scratchpad tests")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Allen Hubbe <allenbh@gmail.com>
Tested-by: Alexander Fomichev <fomichev.ru@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
- Support for userspace to send requests directly to the on-chip GZIP
accelerator on Power9.
- Rework of our lockless page table walking (__find_linux_pte()) to make it
safe against parallel page table manipulations without relying on an IPI for
serialisation.
- A series of fixes & enhancements to make our machine check handling more
robust.
- Lots of plumbing to add support for "prefixed" (64-bit) instructions on
Power10.
- Support for using huge pages for the linear mapping on 8xx (32-bit).
- Remove obsolete Xilinx PPC405/PPC440 support, and an associated sound driver.
- Removal of some obsolete 40x platforms and associated cruft.
- Initial support for booting on Power10.
- Lots of other small features, cleanups & fixes.
Thanks to:
Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Andrey Abramov,
Aneesh Kumar K.V, Balamuruhan S, Bharata B Rao, Bulent Abali, Cédric Le
Goater, Chen Zhou, Christian Zigotzky, Christophe JAILLET, Christophe Leroy,
Dmitry Torokhov, Emmanuel Nicolet, Erhard F., Gautham R. Shenoy, Geoff Levand,
George Spelvin, Greg Kurz, Gustavo A. R. Silva, Gustavo Walbon, Haren Myneni,
Hari Bathini, Joel Stanley, Jordan Niethe, Kajol Jain, Kees Cook, Leonardo
Bras, Madhavan Srinivasan., Mahesh Salgaonkar, Markus Elfring, Michael
Neuling, Michal Simek, Nathan Chancellor, Nathan Lynch, Naveen N. Rao,
Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pingfan Liu, Qian Cai, Ram
Pai, Raphael Moreira Zinsly, Ravi Bangoria, Sam Bobroff, Sandipan Das, Segher
Boessenkool, Stephen Rothwell, Sukadev Bhattiprolu, Tyrel Datwyler, Wolfram
Sang, Xiongfeng Wang.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl7aYZ8THG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgPiKD/9zNCuZLFMAFrIdbm0HlYA2RGYZFT75
GUHsqYyei1pxA7PgM3KwJiXELVODsBv0eQbgNh1tbecKrxPRegN/cywd1KLjPZ7I
v5/qweQP8MvR0RhzjbhvUcO0jq/f8u2LbJr5mUfVzjU6tAvrvcWo3oZqDElsekCS
kgyOH3r1vZ2PLTMiGFhb0gWi2iqc+6BHU1AFCGPCMjB1Vu5d5+54VvZ/6lllGsOF
yg9CBXmmVvQ+Bn6tH4zdEB78FYxnAIwBqlbmL79i5ca+HQJ0Sw6HuPRy9XYq35p6
2EiXS4Wrgp7i7+1TN3HO362u5Onb8TSyQU7NS6yCFPoJ6JQxcJMBIw6mHhnXOPuZ
CrjgcdwUMjx8uDoKmX1Epbfuex2w+AysW+4yBHPFiSgl3klKC3D0wi95mR485w2F
rN8uzJtrDeFKcYZJG7IoB/cgFCCPKGf9HaXr8q0S/jBKMffx91ul3cfzlfdIXOCw
FDNw/+ZX7UD6ddFEG12ZTO+vdL8yf1uCRT/DIZwUiDMIA0+M6F4nc7j3lfyZfoO1
65f9UlhoLxScq7VH2fKH4UtZatO9cPID2z1CmiY4UbUIPtFDepSuYClgLF+Duf4b
rkfxhKU0+Ja1zNH5XNc+L+Bc5/W4lFiJXz02dYIjtHoUpWkc1aToOETVwzggYFNM
G3PXIBOI0jRgRw==
=o0WU
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Support for userspace to send requests directly to the on-chip GZIP
accelerator on Power9.
- Rework of our lockless page table walking (__find_linux_pte()) to
make it safe against parallel page table manipulations without
relying on an IPI for serialisation.
- A series of fixes & enhancements to make our machine check handling
more robust.
- Lots of plumbing to add support for "prefixed" (64-bit) instructions
on Power10.
- Support for using huge pages for the linear mapping on 8xx (32-bit).
- Remove obsolete Xilinx PPC405/PPC440 support, and an associated sound
driver.
- Removal of some obsolete 40x platforms and associated cruft.
- Initial support for booting on Power10.
- Lots of other small features, cleanups & fixes.
Thanks to: Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan,
Andrey Abramov, Aneesh Kumar K.V, Balamuruhan S, Bharata B Rao, Bulent
Abali, Cédric Le Goater, Chen Zhou, Christian Zigotzky, Christophe
JAILLET, Christophe Leroy, Dmitry Torokhov, Emmanuel Nicolet, Erhard F.,
Gautham R. Shenoy, Geoff Levand, George Spelvin, Greg Kurz, Gustavo A.
R. Silva, Gustavo Walbon, Haren Myneni, Hari Bathini, Joel Stanley,
Jordan Niethe, Kajol Jain, Kees Cook, Leonardo Bras, Madhavan
Srinivasan., Mahesh Salgaonkar, Markus Elfring, Michael Neuling, Michal
Simek, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin,
Oliver O'Halloran, Paul Mackerras, Pingfan Liu, Qian Cai, Ram Pai,
Raphael Moreira Zinsly, Ravi Bangoria, Sam Bobroff, Sandipan Das, Segher
Boessenkool, Stephen Rothwell, Sukadev Bhattiprolu, Tyrel Datwyler,
Wolfram Sang, Xiongfeng Wang.
* tag 'powerpc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (299 commits)
powerpc/pseries: Make vio and ibmebus initcalls pseries specific
cxl: Remove dead Kconfig options
powerpc: Add POWER10 architected mode
powerpc/dt_cpu_ftrs: Add MMA feature
powerpc/dt_cpu_ftrs: Enable Prefixed Instructions
powerpc/dt_cpu_ftrs: Advertise support for ISA v3.1 if selected
powerpc: Add support for ISA v3.1
powerpc: Add new HWCAP bits
powerpc/64s: Don't set FSCR bits in INIT_THREAD
powerpc/64s: Save FSCR to init_task.thread.fscr after feature init
powerpc/64s: Don't let DT CPU features set FSCR_DSCR
powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()
powerpc/32s: Fix another build failure with CONFIG_PPC_KUAP_DEBUG
powerpc/module_64: Use special stub for _mcount() with -mprofile-kernel
powerpc/module_64: Simplify check for -mprofile-kernel ftrace relocations
powerpc/module_64: Consolidate ftrace code
powerpc/32: Disable KASAN with pages bigger than 16k
powerpc/uaccess: Don't set KUEP by default on book3s/32
powerpc/uaccess: Don't set KUAP by default on book3s/32
powerpc/8xx: Reduce time spent in allow_user_access() and friends
...
Marcelo reports that kvm selftests fail to build with
"make ARCH=x86_64":
gcc -Wall -Wstrict-prototypes -Wuninitialized -O2 -g -std=gnu99
-fno-stack-protector -fno-PIE -I../../../../tools/include
-I../../../../tools/arch/x86_64/include -I../../../../usr/include/
-Iinclude -Ilib -Iinclude/x86_64 -I.. -c lib/kvm_util.c
-o /var/tmp/20200604202744-bin/lib/kvm_util.o
In file included from lib/kvm_util.c:11:
include/x86_64/processor.h:14:10: fatal error: asm/msr-index.h: No such
file or directory
#include <asm/msr-index.h>
^~~~~~~~~~~~~~~~~
compilation terminated.
"make ARCH=x86", however, works. The problem is that arch specific headers
for x86_64 live in 'tools/arch/x86/include', not in
'tools/arch/x86_64/include'.
Fixes: 66d69e081b ("selftests: fix kvm relocatable native/cross builds and installs")
Reported-by: Marcelo Bandeira Condotta <mcondotta@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200605142028.550068-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Merge yet more updates from Andrew Morton:
- More MM work. 100ish more to go. Mike Rapoport's "mm: remove
__ARCH_HAS_5LEVEL_HACK" series should fix the current ppc issue
- Various other little subsystems
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (127 commits)
lib/ubsan.c: fix gcc-10 warnings
tools/testing/selftests/vm: remove duplicate headers
selftests: vm: pkeys: fix multilib builds for x86
selftests: vm: pkeys: use the correct page size on powerpc
selftests/vm/pkeys: override access right definitions on powerpc
selftests/vm/pkeys: test correct behaviour of pkey-0
selftests/vm/pkeys: introduce a sub-page allocator
selftests/vm/pkeys: detect write violation on a mapped access-denied-key page
selftests/vm/pkeys: associate key on a mapped page and detect write violation
selftests/vm/pkeys: associate key on a mapped page and detect access violation
selftests/vm/pkeys: improve checks to determine pkey support
selftests/vm/pkeys: fix assertion in test_pkey_alloc_exhaust()
selftests/vm/pkeys: fix number of reserved powerpc pkeys
selftests/vm/pkeys: introduce powerpc support
selftests/vm/pkeys: introduce generic pkey abstractions
selftests: vm: pkeys: use the correct huge page size
selftests/vm/pkeys: fix alloc_random_pkey() to make it really random
selftests/vm/pkeys: fix assertion in pkey_disable_set/clear()
selftests/vm/pkeys: fix pkey_disable_clear()
selftests: vm: pkeys: add helpers for pkey bits
...
This ensures that both 32-bit and 64-bit binaries are generated when this
is built on a x86_64 system. Most of the changes have been borrowed from
tools/testing/selftests/x86/Makefile.
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/0326a442214d7a1b970d38296e63df3b217f5912.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Both 4K and 64K pages are supported on powerpc. Parts of the selftest
code perform alignment computations based on the PAGE_SIZE macro which is
currently hardcoded to 64K for powerpc. This causes some test failures on
kernels configured with 4K page size.
In some cases, we need to enforce function alignment on page size. Since
this can only be done at build time, 64K is used as the alignment factor
as that also ensures 4K alignment.
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/5dcdfbf3353acdc90f315172e800b49f5ca21299.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some platforms hardcode the x86 values for PKEY_DISABLE_ACCESS
and PKEY_DISABLE_WRITE such as those in:
/usr/include/bits/mman-shared.h.
This overrides the definitions with correct values for powerpc.
[sandipan@linux.ibm.com: fix powerpc access right definitions]
Link: http://lkml.kernel.org/r/1ba86fd8a94f38131cfe2d9f277001dd1ad1d34e.1588959697.git.sandipan@linux.ibm.com
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/f6eb38cb3a1e12eb2cdc9da6300bc5a5dfba0db9.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ensure that pkey-0 is allocated on start and that it can be attached
dynamically in various modes, without failures.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/9b7c54a9b4261894fe0c7e884c70b87214ff8fbb.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This introduces a new allocator that allocates 4K hardware pages to back
64K linux pages. This allocator is available only on powerpc.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/c4a82fa962ec71015b994fab1aaf83bdfd091553.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Detect write-violation on a page to which access-disabled key is
associated much after the page is mapped.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/6a7dd4069ee18a2a51b207a55aa197f3f3c59753.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Detect write-violation on a page to which write-disabled key is associated
much after the page is mapped.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/6bfe3b3832f8bcfb07d7f2cf116b45197f4587dd.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Detect access-violation on a page to which access-disabled key is
associated much after the page is mapped.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/4a19cf9252c03dd883887e9002881599e6900d06.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For the pkeys subsystem to work, both the CPU and the kernel need to have
support. So, additionally check if the kernel supports pkeys apart from
the CPU feature checks.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/8fb76c63ebdadcf068ecd2d23731032e195cd364.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some pkeys which are valid on the hardware are reserved and not available
for application use. These keys cannot be allocated.
test_pkey_alloc_exhaust() tries to account for these and has an assertion
which validates if all available pkeys have been exahaustively allocated.
However, the expression that is currently used is only valid for x86. On
powerpc, a pkey is additionally reserved as compared to x86. Hence, the
assertion is made to use an arch-specific helper to get the correct count
of reserved pkeys.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/38b08d0318820ae46af3aa6048384fd8056c3df7.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The number of reserved pkeys in a PowerNV environment is different from
that on PowerVM or KVM.
Tested on PowerVM and PowerNV environments.
Signed-off-by: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/0341a0ca961166814b44c9e724774672c18d54ca.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes use of the abstractions added earlier and introduces support
for powerpc.
For powerpc, after receiving the SIGSEGV, the signal handler must
explicitly restore access permissions for the faulting pkey to allow the
test to continue. As this makes use of pkey_access_allow(), all of its
dependencies and other similar functions have been moved ahead of the
signal handler.
[sandipan@linux.ibm.com: fix powerpc access right updates]
Link: http://lkml.kernel.org/r/5f65cf37be993760de8112a88da194e3ccbb2bf8.1588959697.git.sandipan@linux.ibm.com
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/b121e9fd33789ed9195276e32fe4e80bb6b88a31.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The huge page size can vary across architectures. This will ensure that
the correct huge page size is used when accessing the hugetlb controls
under sysfs. Instead of using a hardcoded page size (i.e. 2MB), this now
uses the HPAGE_SIZE macro which is arch-specific.
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/66882a5d6e45c73c3a52bc4aef9754e48afa4f88.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
alloc_random_pkey() was allocating the same pkey every time. Not all
pkeys were geting tested. This fixes it.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/0162f55816d4e783a0d6e49e554d0ab9a3c9a23b.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In some cases, a pkey's bits need not necessarily change in a way that the
value of the pkey register increases when performing a pkey_disable_set()
or decreases when performing a pkey_disable_clear().
For example, on powerpc, if a pkey's current state is PKEY_DISABLE_ACCESS
and we perform a pkey_write_disable() on it, the bits still remain the
same. We will observe something similar when the pkey's current state is
0 and a pkey_access_enable() is performed on it.
Either case would cause some assertions to fail. This fixes the problem.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/8240665131e43fc93eed4eea8194676c1ea39a7f.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, pkey_disable_clear() sets the specified bits instead clearing
them. This has been dead code up to now because its only callers i.e.
pkey_access/write_allow() are also unused.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/1f70bca60330a85dca42c3cd98212bb1cdf5a076.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This introduces some functions that help with setting or clearing bits of
a particular pkey. This also adds an abstraction for getting a pkey's bit
position in the pkey register as this may vary across architectures.
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/2ad9705f4f68ca7e72155cc583415e5a979546f1.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The size of the pkey register can vary across architectures. This
converts the data type of all its references to u64 in preparation for
multi-arch support.
To keep the definition of the u64 type consistent and remove format
specifier related warnings, __SANE_USERSPACE_TYPES__ is defined as
suggested by Michael Ellerman.
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/d3e271798455d940e395e56e1ff1e82a31bcb7aa.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This will help us ensure we print pkey_reg_t values correctly in different
architectures.
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/b40b7a95fdd4045d62530a2a34452934caf3b0bc.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Moved all the generic definition and helper functions to the
header file.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/57177f99e92a51295956715d5f2d5688a4d13927.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This renames PKRU references to "pkey_reg" or "pkey" based on
the usage.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/2c6970bc6d2e99796cd5cc1101bd2ecf7eccb937.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "selftests, powerpc, x86: Memory Protection Keys", v19.
Memory protection keys enables an application to protect its address space
from inadvertent access by its own code.
This feature is now enabled on powerpc and has been available since
4.16-rc1. The patches move the selftests to arch neutral directory and
enhance their test coverage.
Tested on powerpc64 and x86_64 (Skylake-SP).
This patch (of 24):
Move selftest files from tools/testing/selftests/x86/ to
tools/testing/selftests/vm/.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Shuah Khan <shuah@kernel.org>
Link: http://lkml.kernel.org/r/14d25194c3e2e652e0047feec4487e269e76e8c9.1585646528.git.sandipan@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Test some bit clears/sets to make sure assembly doesn't change, and that
the set_bit and clear_bit functions work and don't cause sparse warnings.
Instruct Kbuild to build this file with extra warning level -Wextra, to
catch new issues, and also doesn't hurt to build with C=1.
This was used to test changes to arch/x86/include/asm/bitops.h.
In particular, sparse (C=1) was very concerned when the last bit before a
natural boundary, like 7, or 31, was being tested, as this causes sign
extension (0xffffff7f) for instance when clearing bit 7.
Recommended usage:
make defconfig
scripts/config -m CONFIG_TEST_BITOPS
make modules_prepare
make C=1 W=1 lib/test_bitops.ko
objdump -S -d lib/test_bitops.ko
insmod lib/test_bitops.ko
rmmod lib/test_bitops.ko
<check dmesg>, there should be no compiler/sparse warnings and no
error messages in log.
Link: http://lkml.kernel.org/r/20200310221747.2848474-2-jesse.brandeburg@intel.com
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
CcL Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull execve updates from Eric Biederman:
"Last cycle for the Nth time I ran into bugs and quality of
implementation issues related to exec that could not be easily be
fixed because of the way exec is implemented. So I have been digging
into exec and cleanup up what I can.
I don't think I have exec sorted out enough to fix the issues I
started with but I have made some headway this cycle with 4 sets of
changes.
- promised cleanups after introducing exec_update_mutex
- trivial cleanups for exec
- control flow simplifications
- remove the recomputation of bprm->cred
The net result is code that is a bit easier to understand and work
with and a decrease in the number of lines of code (if you don't count
the added tests)"
* 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (24 commits)
exec: Compute file based creds only once
exec: Add a per bprm->file version of per_clear
binfmt_elf_fdpic: fix execfd build regression
selftests/exec: Add binfmt_script regression test
exec: Remove recursion from search_binary_handler
exec: Generic execfd support
exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC
exec: Move the call of prepare_binprm into search_binary_handler
exec: Allow load_misc_binary to call prepare_binprm unconditionally
exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds
exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds
exec: Teach prepare_exec_creds how exec treats uids & gids
exec: Set the point of no return sooner
exec: Move handling of the point of no return to the top level
exec: Run sync_mm_rss before taking exec_update_mutex
exec: Fix spelling of search_binary_handler in a comment
exec: Move the comment from above de_thread to above unshare_sighand
exec: Rename flush_old_exec begin_new_exec
exec: Move most of setup_new_exec into flush_old_exec
exec: In setup_new_exec cache current in the local variable me
...
Pull proc updates from Eric Biederman:
"This has four sets of changes:
- modernize proc to support multiple private instances
- ensure we see the exit of each process tid exactly
- remove has_group_leader_pid
- use pids not tasks in posix-cpu-timers lookup
Alexey updated proc so each mount of proc uses a new superblock. This
allows people to actually use mount options with proc with no fear of
messing up another mount of proc. Given the kernel's internal mounts
of proc for things like uml this was a real problem, and resulted in
Android's hidepid mount options being ignored and introducing security
issues.
The rest of the changes are small cleanups and fixes that came out of
my work to allow this change to proc. In essence it is swapping the
pids in de_thread during exec which removes a special case the code
had to handle. Then updating the code to stop handling that special
case"
* 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
proc: proc_pid_ns takes super_block as an argument
remove the no longer needed pid_alive() check in __task_pid_nr_ns()
posix-cpu-timers: Replace __get_task_for_clock with pid_for_clock
posix-cpu-timers: Replace cpu_timer_pid_type with clock_pid_type
posix-cpu-timers: Extend rcu_read_lock removing task_struct references
signal: Remove has_group_leader_pid
exec: Remove BUG_ON(has_group_leader_pid)
posix-cpu-timer: Unify the now redundant code in lookup_task
posix-cpu-timer: Tidy up group_leader logic in lookup_task
proc: Ensure we see the exit of each process tid exactly once
rculist: Add hlists_swap_heads_rcu
proc: Use PIDTYPE_TGID in next_tgid
Use proc_pid_ns() to get pid_namespace from the proc superblock
proc: use named enums for better readability
proc: use human-readable values for hidepid
docs: proc: add documentation for "hidepid=4" and "subset=pid" options and new mount behavior
proc: add option to mount only a pids subset
proc: instantiate only pids that we can ptrace on 'hidepid=4' mount option
proc: allow to mount many instances of proc in one pid namespace
proc: rename struct proc_fs_info to proc_fs_opts
Merge more updates from Andrew Morton:
"More mm/ work, plenty more to come
Subsystems affected by this patch series: slub, memcg, gup, kasan,
pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs,
thp, mmap, kconfig"
* akpm: (131 commits)
arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
riscv: support DEBUG_WX
mm: add DEBUG_WX support
drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup
mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()
powerpc/mm: drop platform defined pmd_mknotpresent()
mm: thp: don't need to drain lru cache when splitting and mlocking THP
hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
sparc32: register memory occupied by kernel as memblock.memory
include/linux/memblock.h: fix minor typo and unclear comment
mm, mempolicy: fix up gup usage in lookup_node
tools/vm/page_owner_sort.c: filter out unneeded line
mm: swap: memcg: fix memcg stats for huge pages
mm: swap: fix vmstats for huge pages
mm: vmscan: limit the range of LRU type balancing
mm: vmscan: reclaim writepage is IO cost
mm: vmscan: determine anon/file pressure balance at the reclaim root
mm: balance LRU lists based on relative thrashing
mm: only count actual rotations as LRU reclaim cost
...
'max_ptes_shared' specifies how many pages can be shared across multiple
processes. Exceeding the number would block the collapse::
/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_shared
A higher value may increase memory footprint for some workloads.
By default, at least half of pages has to be not shared.
[colin.king@canonical.com: fix several spelling mistakes]
Link: http://lkml.kernel.org/r/20200420084241.65433-1-colin.king@canonical.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Link: http://lkml.kernel.org/r/20200416160026.16538-9-kirill.shutemov@linux.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "thp/khugepaged improvements and CoW semantics", v4.
The patchset adds khugepaged selftest (anon-THP only for now), expands
cases khugepaged can handle and switches anon-THP copy-on-write handling
to 4k.
This patch (of 8):
The test checks if khugepaged is able to recover huge page where we expect
to do so. It only covers anon-THP for now.
Currently the test shows few failures. They are going to be addressed by
the following patches.
[colin.king@canonical.com: fix several spelling mistakes]
Link: http://lkml.kernel.org/r/20200420084241.65433-1-colin.king@canonical.com
[aneesh.kumar@linux.ibm.com: replace the usage of system(3) in the test]
Link: http://lkml.kernel.org/r/20200429110727.89388-1-aneesh.kumar@linux.ibm.com
[kirill@shutemov.name: fixup for issues I've noticed]
Link: http://lkml.kernel.org/r/20200429124816.jp272trghrzxx5j5@box
[jhubbard@nvidia.com: add khugepaged to .gitignore]
Link: http://lkml.kernel.org/r/20200517002509.362401-1-jhubbard@nvidia.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Link: http://lkml.kernel.org/r/20200416160026.16538-1-kirill.shutemov@linux.intel.com
Link: http://lkml.kernel.org/r/20200416160026.16538-2-kirill.shutemov@linux.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking updates from David Miller:
1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
Augusto von Dentz.
2) Add GSO partial support to igc, from Sasha Neftin.
3) Several cleanups and improvements to r8169 from Heiner Kallweit.
4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
device self-test. From Andrew Lunn.
5) Start moving away from custom driver versions, use the globally
defined kernel version instead, from Leon Romanovsky.
6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.
7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.
8) Add sriov and vf support to hinic, from Luo bin.
9) Support Media Redundancy Protocol (MRP) in the bridging code, from
Horatiu Vultur.
10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.
11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
Dubroca. Also add ipv6 support for espintcp.
12) Lots of ReST conversions of the networking documentation, from Mauro
Carvalho Chehab.
13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
from Doug Berger.
14) Allow to dump cgroup id and filter by it in inet_diag code, from
Dmitry Yakunin.
15) Add infrastructure to export netlink attribute policies to
userspace, from Johannes Berg.
16) Several optimizations to sch_fq scheduler, from Eric Dumazet.
17) Fallback to the default qdisc if qdisc init fails because otherwise
a packet scheduler init failure will make a device inoperative. From
Jesper Dangaard Brouer.
18) Several RISCV bpf jit optimizations, from Luke Nelson.
19) Correct the return type of the ->ndo_start_xmit() method in several
drivers, it's netdev_tx_t but many drivers were using
'int'. From Yunjian Wang.
20) Add an ethtool interface for PHY master/slave config, from Oleksij
Rempel.
21) Add BPF iterators, from Yonghang Song.
22) Add cable test infrastructure, including ethool interfaces, from
Andrew Lunn. Marvell PHY driver is the first to support this
facility.
23) Remove zero-length arrays all over, from Gustavo A. R. Silva.
24) Calculate and maintain an explicit frame size in XDP, from Jesper
Dangaard Brouer.
25) Add CAP_BPF, from Alexei Starovoitov.
26) Support terse dumps in the packet scheduler, from Vlad Buslov.
27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.
28) Add devm_register_netdev(), from Bartosz Golaszewski.
29) Minimize qdisc resets, from Cong Wang.
30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
eliminate set_fs/get_fs calls. From Christoph Hellwig.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
selftests: net: ip_defrag: ignore EPERM
net_failover: fixed rollback in net_failover_open()
Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
vmxnet3: allow rx flow hash ops only when rss is enabled
hinic: add set_channels ethtool_ops support
selftests/bpf: Add a default $(CXX) value
tools/bpf: Don't use $(COMPILE.c)
bpf, selftests: Use bpf_probe_read_kernel
s390/bpf: Use bcr 0,%0 as tail call nop filler
s390/bpf: Maintain 8-byte stack alignment
selftests/bpf: Fix verifier test
selftests/bpf: Fix sample_cnt shared between two threads
bpf, selftests: Adapt cls_redirect to call csum_level helper
bpf: Add csum_level helper for fixing up csum levels
bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
crypto/chtls: IPv6 support for inline TLS
Crypto/chcr: Fixes a coccinile check error
Crypto/chcr: Fixes compilations warnings
...
- Move the arch-specific code into arch/arm64/kvm
- Start the post-32bit cleanup
- Cherry-pick a few non-invasive pre-NV patches
x86:
- Rework of TLB flushing
- Rework of event injection, especially with respect to nested virtualization
- Nested AMD event injection facelift, building on the rework of generic code
and fixing a lot of corner cases
- Nested AMD live migration support
- Optimization for TSC deadline MSR writes and IPIs
- Various cleanups
- Asynchronous page fault cleanups (from tglx, common topic branch with tip tree)
- Interrupt-based delivery of asynchronous "page ready" events (host side)
- Hyper-V MSRs and hypercalls for guest debugging
- VMX preemption timer fixes
s390:
- Cleanups
Generic:
- switch vCPU thread wakeup from swait to rcuwait
The other architectures, and the guest side of the asynchronous page fault
work, will come next week.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl7VJcYUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPf6QgAq4wU5wdd1lTGz/i3DIhNVJNJgJlp
ozLzRdMaJbdbn5RpAK6PEBd9+pt3+UlojpFB3gpJh2Nazv2OzV4yLQgXXXyyMEx1
5Hg7b4UCJYDrbkCiegNRv7f/4FWDkQ9dx++RZITIbxeskBBCEI+I7GnmZhGWzuC4
7kj4ytuKAySF2OEJu0VQF6u0CvrNYfYbQIRKBXjtOwuRK4Q6L63FGMJpYo159MBQ
asg3B1jB5TcuGZ9zrjL5LkuzaP4qZZHIRs+4kZsH9I6MODHGUxKonrkablfKxyKy
CFK+iaHCuEXXty5K0VmWM3nrTfvpEjVjbMc7e1QGBQ5oXsDM0pqn84syRg==
=v7Wn
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Move the arch-specific code into arch/arm64/kvm
- Start the post-32bit cleanup
- Cherry-pick a few non-invasive pre-NV patches
x86:
- Rework of TLB flushing
- Rework of event injection, especially with respect to nested
virtualization
- Nested AMD event injection facelift, building on the rework of
generic code and fixing a lot of corner cases
- Nested AMD live migration support
- Optimization for TSC deadline MSR writes and IPIs
- Various cleanups
- Asynchronous page fault cleanups (from tglx, common topic branch
with tip tree)
- Interrupt-based delivery of asynchronous "page ready" events (host
side)
- Hyper-V MSRs and hypercalls for guest debugging
- VMX preemption timer fixes
s390:
- Cleanups
Generic:
- switch vCPU thread wakeup from swait to rcuwait
The other architectures, and the guest side of the asynchronous page
fault work, will come next week"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (256 commits)
KVM: selftests: fix rdtsc() for vmx_tsc_adjust_test
KVM: check userspace_addr for all memslots
KVM: selftests: update hyperv_cpuid with SynDBG tests
x86/kvm/hyper-v: Add support for synthetic debugger via hypercalls
x86/kvm/hyper-v: enable hypercalls regardless of hypercall page
x86/kvm/hyper-v: Add support for synthetic debugger interface
x86/hyper-v: Add synthetic debugger definitions
KVM: selftests: VMX preemption timer migration test
KVM: nVMX: Fix VMX preemption timer migration
x86/kvm/hyper-v: Explicitly align hcall param for kvm_hyperv_exit
KVM: x86/pmu: Support full width counting
KVM: x86/pmu: Tweak kvm_pmu_get_msr to pass 'struct msr_data' in
KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT
KVM: x86: acknowledgment mechanism for async pf page ready notifications
KVM: x86: interrupt based APF 'page ready' event delivery
KVM: introduce kvm_read_guest_offset_cached()
KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present()
KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info
Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously"
KVM: VMX: Replace zero-length array with flexible-array
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXtYhfgAKCRCRxhvAZXjc
oghSAP9uVX3vxYtEtNvu9WtEn1uYZcSKZoF1YrcgY7UfSmna0gEAruzyZcai4CJL
WKv+4aRq2oYk+hsqZDycAxIsEgWvNg8=
=ZWj3
-----END PGP SIGNATURE-----
Merge tag 'threads-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull thread updates from Christian Brauner:
"We have been discussing using pidfds to attach to namespaces for quite
a while and the patches have in one form or another already existed
for about a year. But I wanted to wait to see how the general api
would be received and adopted.
This contains the changes to make it possible to use pidfds to attach
to the namespaces of a process, i.e. they can be passed as the first
argument to the setns() syscall.
When only a single namespace type is specified the semantics are
equivalent to passing an nsfd. That means setns(nsfd, CLONE_NEWNET)
equals setns(pidfd, CLONE_NEWNET).
However, when a pidfd is passed, multiple namespace flags can be
specified in the second setns() argument and setns() will attach the
caller to all the specified namespaces all at once or to none of them.
Specifying 0 is not valid together with a pidfd. Here are just two
obvious examples:
setns(pidfd, CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWNET);
setns(pidfd, CLONE_NEWUSER);
Allowing to also attach subsets of namespaces supports various
use-cases where callers setns to a subset of namespaces to retain
privilege, perform an action and then re-attach another subset of
namespaces.
Apart from significantly reducing the number of syscalls needed to
attach to all currently supported namespaces (eight "open+setns"
sequences vs just a single "setns()"), this also allows atomic setns
to a set of namespaces, i.e. either attaching to all namespaces
succeeds or we fail without having changed anything.
This is centered around a new internal struct nsset which holds all
information necessary for a task to switch to a new set of namespaces
atomically. Fwiw, with this change a pidfd becomes the only token
needed to interact with a container. I'm expecting this to be
picked-up by util-linux for nsenter rather soon.
Associated with this change is a shiny new test-suite dedicated to
setns() (for pidfds and nsfds alike)"
* tag 'threads-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
selftests/pidfd: add pidfd setns tests
nsproxy: attach to namespaces via pidfds
nsproxy: add struct nsset
When running with conntrack rules, the dropped overlap fragments may cause
EPERM to be returned to sendto. Instead of completely failing, just ignore
those errors and continue. If this causes packets with overlap fragments to
be dropped as expected, that is okay. And if it causes packets that are
expected to be received to be dropped, which should not happen, it will be
detected as failure.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This series adds a selftest for hmm_range_fault() and several of the
DEVICE_PRIVATE migration related actions, and another simplification for
hmm_range_fault()'s API.
- Simplify hmm_range_fault() with a simpler return code, no
HMM_PFN_SPECIAL, and no customizable output PFN format
- Add a selftest for hmm_range_fault() and DEVICE_PRIVATE related
functionality
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl7VQr8ACgkQOG33FX4g
mxrpcg/+O+oZ2p8FDTZi/0BTaU0crUiKwJngmmv78UuvD8nzhOZ0fkhK2lsXn9Uo
70lYbfDUSX2TbReP7y39VArW0v+Bj7wo9/7AZ+R2o5A0ajC6kccjGdnb7uEc3L6v
CR+uumRYf/ZNz13cbuRBbYEz477DGnz+3vhBb4FLNTFj9XiNAC61jA1WUI0ep6x3
lDrkhDatqmdBJ+EqZDMq2+UH+lWbkptQT7hPqgEp6o7FqdnySxRd+rT3hALz5wNP
fbryfWXM7V1eh7Kxr2mBJJqIkgbdhGLj2yLl1Iz11BbG6u7AT20r23WTvJ7hUCyt
18574twdltZ81gheqqN7KVYYAo+5seMfP14QdthqzzBMo3pOeLG0JMVqQNisDPgn
Tf4lWF/GR7ajKxyRbLdvUgRE7pFQ9VMAiP86GoIpBFmSZQQDwcecnoYxg60zsTwR
yuf60gopfNsSWNmDqKT3td12PQyFQYHYT6ue1eW6Rb9P+yA++tZaGkvGFn7kHeNV
ZeUqsKEy6a9l6cDrFzNmsCcdNZg/qmw9mKFfa/4RRulU5jlskt/e52NiLaLU2rsr
0Tot3j5tMufLLorZPprMI3Z/M9ohVAS5DkX6ttcZDs5v0iGQEUOOnq0cXmwlJQ9I
0CHr2ImjiDr9v2fS+5ixaRNSHfnQWnHxcqq79UZiTjtPW1Daauo=
=twev
-----END PGP SIGNATURE-----
Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull hmm updates from Jason Gunthorpe:
"This series adds a selftest for hmm_range_fault() and several of the
DEVICE_PRIVATE migration related actions, and another simplification
for hmm_range_fault()'s API.
- Simplify hmm_range_fault() with a simpler return code, no
HMM_PFN_SPECIAL, and no customizable output PFN format
- Add a selftest for hmm_range_fault() and DEVICE_PRIVATE related
functionality"
* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
MAINTAINERS: add HMM selftests
mm/hmm/test: add selftests for HMM
mm/hmm/test: add selftest driver for HMM
mm/hmm: remove the customizable pfn format from hmm_range_fault
mm/hmm: remove HMM_PFN_SPECIAL
drm/amdgpu: remove dead code after hmm_range_fault()
mm/hmm: make hmm_range_fault return 0 or -1
When using make kselftest TARGETS=bpf, tools/bpf is built with
MAKEFLAGS=rR, which causes $(CXX) to be undefined, which in turn causes
the build to fail with
CXX test_cpp
/bin/sh: 2: g: not found
Fix by adding a default $(CXX) value, like tools/build/feature/Makefile
already does.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200602175649.2501580-3-iii@linux.ibm.com
Since commit 0ebeea8ca8 ("bpf: Restrict bpf_probe_read{, str}() only to
archs where they work") 44 verifier tests fail on s390 due to not having
bpf_probe_read anymore. Fix by using bpf_probe_read_kernel.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200602174448.2501214-1-iii@linux.ibm.com
Adjust verifier test due to addition of new field.
Fixes: c3c16f2ea6 ("bpf: Add rx_queue_mapping to bpf_sock")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Make sample_cnt volatile to fix possible selftests failure due to compiler
optimization preventing latest sample_cnt value to be visible to main thread.
sample_cnt is incremented in background thread, which is then joined into main
thread. So in terms of visibility sample_cnt update is ok. But because it's
not volatile, compiler might make optimizations that would prevent main thread
to see latest updated value. Fix this by marking global variable volatile.
Fixes: cb1c9ddd55 ("selftests/bpf: Add BPF ringbuf selftests")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200602050349.215037-1-andriin@fb.com
Adapt bpf_skb_adjust_room() to pass in BPF_F_ADJ_ROOM_NO_CSUM_RESET flag and
use the new bpf_csum_level() helper to inc/dec the checksum level by one after
the encap/decap.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/e7458f10e3f3d795307cbc5ad870112671d9c6f7.1591108731.git.daniel@iogearbox.net
Fix config file to require CONFIG_TEST_SYSCTL=m instead of y
because this driver introduces a test sysctl interfaces which
are normally not used, and only used for the selftest.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Fix to load test_sysctl.ko module correctly.
sysctl.sh checks whether the test module is embedded (or loaded
already) or not at first, and if not, it returns skip error
instead of trying modprobe. Thus, there is no chance to load the
test_sysctl test module.
Instead, this removes that module embedded check and returns
skip error only if it ensures that there is no embedded test
module *and* no loadable test module.
This also avoid referring config file since that is not
installed.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Extend the existing flow_dissector test case to run tests once using direct
prog attachments, and then for the second time using indirect attachment
via link.
The intention is to exercises the newly added high-level API for attaching
programs to network namespace with links (bpf_program__attach_netns).
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200531082846.2117903-13-jakub@cloudflare.com
Switch flow dissector test setup from custom BPF object loader to BPF
skeleton to save boilerplate and prepare for testing higher-level API for
attaching flow dissector with bpf_link.
To avoid depending on program order in the BPF object when populating the
flow dissector PROG_ARRAY map, change the program section names to contain
the program index into the map. This follows the example set by tailcall
tests.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200531082846.2117903-12-jakub@cloudflare.com
test_flow_dissector leaves a TAP device after it's finished, potentially
interfering with other tests that will run after it. Fix it by closing the
TAP descriptor on cleanup.
Fixes: 0905beec9f ("selftests/bpf: run flow dissector tests in skb-less mode")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200531082846.2117903-11-jakub@cloudflare.com
Extend the existing test case for flow dissector attaching to cover:
- link creation,
- link updates,
- link info querying,
- mixing links with direct prog attachment.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200531082846.2117903-10-jakub@cloudflare.com
- Branch Target Identification (BTI)
* Support for ARMv8.5-BTI in both user- and kernel-space. This
allows branch targets to limit the types of branch from which
they can be called and additionally prevents branching to
arbitrary code, although kernel support requires a very recent
toolchain.
* Function annotation via SYM_FUNC_START() so that assembly
functions are wrapped with the relevant "landing pad"
instructions.
* BPF and vDSO updates to use the new instructions.
* Addition of a new HWCAP and exposure of BTI capability to
userspace via ID register emulation, along with ELF loader
support for the BTI feature in .note.gnu.property.
* Non-critical fixes to CFI unwind annotations in the sigreturn
trampoline.
- Shadow Call Stack (SCS)
* Support for Clang's Shadow Call Stack feature, which reserves
platform register x18 to point at a separate stack for each
task that holds only return addresses. This protects function
return control flow from buffer overruns on the main stack.
* Save/restore of x18 across problematic boundaries (user-mode,
hypervisor, EFI, suspend, etc).
* Core support for SCS, should other architectures want to use it
too.
* SCS overflow checking on context-switch as part of the existing
stack limit check if CONFIG_SCHED_STACK_END_CHECK=y.
- CPU feature detection
* Removed numerous "SANITY CHECK" errors when running on a system
with mismatched AArch32 support at EL1. This is primarily a
concern for KVM, which disabled support for 32-bit guests on
such a system.
* Addition of new ID registers and fields as the architecture has
been extended.
- Perf and PMU drivers
* Minor fixes and cleanups to system PMU drivers.
- Hardware errata
* Unify KVM workarounds for VHE and nVHE configurations.
* Sort vendor errata entries in Kconfig.
- Secure Monitor Call Calling Convention (SMCCC)
* Update to the latest specification from Arm (v1.2).
* Allow PSCI code to query the SMCCC version.
- Software Delegated Exception Interface (SDEI)
* Unexport a bunch of unused symbols.
* Minor fixes to handling of firmware data.
- Pointer authentication
* Add support for dumping the kernel PAC mask in vmcoreinfo so
that the stack can be unwound by tools such as kdump.
* Simplification of key initialisation during CPU bringup.
- BPF backend
* Improve immediate generation for logical and add/sub
instructions.
- vDSO
- Minor fixes to the linker flags for consistency with other
architectures and support for LLVM's unwinder.
- Clean up logic to initialise and map the vDSO into userspace.
- ACPI
- Work around for an ambiguity in the IORT specification relating
to the "num_ids" field.
- Support _DMA method for all named components rather than only
PCIe root complexes.
- Minor other IORT-related fixes.
- Miscellaneous
* Initialise debug traps early for KGDB and fix KDB cacheflushing
deadlock.
* Minor tweaks to early boot state (documentation update, set
TEXT_OFFSET to 0x0, increase alignment of PE/COFF sections).
* Refactoring and cleanup
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl7U9csQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNLBHCACs/YU4SM7Om5f+7QnxIKao5DBr2CnGGvdC
yTfDghFDTLQVv3MufLlfno3yBe5G8sQpcZfcc+hewfcGoMzVZXu8s7LzH6VSn9T9
jmT3KjDMrg0RjSHzyumJp2McyelTk0a4FiKArSIIKsJSXUyb1uPSgm7SvKVDwEwU
JGDzL9IGilmq59GiXfDzGhTZgmC37QdwRoRxDuqtqWQe5CHoRXYexg87HwBKOQxx
HgU9L7ehri4MRZfpyjaDrr6quJo3TVnAAKXNBh3mZAskVS9ZrfKpEH0kYWYuqybv
znKyHRecl/rrGePV8RTMtrwnSdU26zMXE/omsVVauDfG9hqzqm+Q
=w3qi
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"A sizeable pile of arm64 updates for 5.8.
Summary below, but the big two features are support for Branch Target
Identification and Clang's Shadow Call stack. The latter is currently
arm64-only, but the high-level parts are all in core code so it could
easily be adopted by other architectures pending toolchain support
Branch Target Identification (BTI):
- Support for ARMv8.5-BTI in both user- and kernel-space. This allows
branch targets to limit the types of branch from which they can be
called and additionally prevents branching to arbitrary code,
although kernel support requires a very recent toolchain.
- Function annotation via SYM_FUNC_START() so that assembly functions
are wrapped with the relevant "landing pad" instructions.
- BPF and vDSO updates to use the new instructions.
- Addition of a new HWCAP and exposure of BTI capability to userspace
via ID register emulation, along with ELF loader support for the
BTI feature in .note.gnu.property.
- Non-critical fixes to CFI unwind annotations in the sigreturn
trampoline.
Shadow Call Stack (SCS):
- Support for Clang's Shadow Call Stack feature, which reserves
platform register x18 to point at a separate stack for each task
that holds only return addresses. This protects function return
control flow from buffer overruns on the main stack.
- Save/restore of x18 across problematic boundaries (user-mode,
hypervisor, EFI, suspend, etc).
- Core support for SCS, should other architectures want to use it
too.
- SCS overflow checking on context-switch as part of the existing
stack limit check if CONFIG_SCHED_STACK_END_CHECK=y.
CPU feature detection:
- Removed numerous "SANITY CHECK" errors when running on a system
with mismatched AArch32 support at EL1. This is primarily a concern
for KVM, which disabled support for 32-bit guests on such a system.
- Addition of new ID registers and fields as the architecture has
been extended.
Perf and PMU drivers:
- Minor fixes and cleanups to system PMU drivers.
Hardware errata:
- Unify KVM workarounds for VHE and nVHE configurations.
- Sort vendor errata entries in Kconfig.
Secure Monitor Call Calling Convention (SMCCC):
- Update to the latest specification from Arm (v1.2).
- Allow PSCI code to query the SMCCC version.
Software Delegated Exception Interface (SDEI):
- Unexport a bunch of unused symbols.
- Minor fixes to handling of firmware data.
Pointer authentication:
- Add support for dumping the kernel PAC mask in vmcoreinfo so that
the stack can be unwound by tools such as kdump.
- Simplification of key initialisation during CPU bringup.
BPF backend:
- Improve immediate generation for logical and add/sub instructions.
vDSO:
- Minor fixes to the linker flags for consistency with other
architectures and support for LLVM's unwinder.
- Clean up logic to initialise and map the vDSO into userspace.
ACPI:
- Work around for an ambiguity in the IORT specification relating to
the "num_ids" field.
- Support _DMA method for all named components rather than only PCIe
root complexes.
- Minor other IORT-related fixes.
Miscellaneous:
- Initialise debug traps early for KGDB and fix KDB cacheflushing
deadlock.
- Minor tweaks to early boot state (documentation update, set
TEXT_OFFSET to 0x0, increase alignment of PE/COFF sections).
- Refactoring and cleanup"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
KVM: arm64: Check advertised Stage-2 page size capability
arm64/cpufeature: Add get_arm64_ftr_reg_nowarn()
ACPI/IORT: Remove the unused __get_pci_rid()
arm64/cpuinfo: Add ID_MMFR4_EL1 into the cpuinfo_arm64 context
arm64/cpufeature: Add remaining feature bits in ID_AA64PFR1 register
arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register
arm64/cpufeature: Add remaining feature bits in ID_AA64ISAR0 register
arm64/cpufeature: Add remaining feature bits in ID_MMFR4 register
arm64/cpufeature: Add remaining feature bits in ID_PFR0 register
arm64/cpufeature: Introduce ID_MMFR5 CPU register
arm64/cpufeature: Introduce ID_DFR1 CPU register
arm64/cpufeature: Introduce ID_PFR2 CPU register
arm64/cpufeature: Make doublelock a signed feature in ID_AA64DFR0
arm64/cpufeature: Drop TraceFilt feature exposure from ID_DFR0 register
arm64/cpufeature: Add explicit ftr_id_isar0[] for ID_ISAR0 register
arm64: mm: Add asid_gen_match() helper
firmware: smccc: Fix missing prototype warning for arm_smccc_version_init
arm64: vdso: Fix CFI directives in sigreturn trampoline
arm64: vdso: Don't prefix sigreturn trampoline with a BTI C instruction
...
This test intended to verify if SO_BINDTODEVICE option works in
bpf_setsockopt. Because we already in the SOL_SOCKET level in this
connect bpf prog its safe to verify the sanity in the beginning of
the connect_v4_prog by calling the bind_to_device test helper.
The testing environment already created by the test_sock_addr.sh
script so this test assume that two netdevices already existing in
the system: veth pair with names test_sock_addr1 and test_sock_addr2.
The test will try to bind the socket to those devices first.
Then the test assume there are no netdevice with "nonexistent_dev"
name so the bpf_setsockopt will give use ENODEV error.
At the end the test remove the device binding from the socket
by binding it to an empty name.
Signed-off-by: Ferenc Fejes <fejes@inf.elte.hu>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/3f055b8e45c65639c5c73d0b4b6c589e60b86f15.1590871065.git.fejes@inf.elte.hu
This adds a test for bpf ingress policy. To ensure data writes happen
as expected with extra TLS headers we run these tests with data
verification enabled by default. This will test receive packets have
"PASS" stamped into the front of the payload.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/159079363965.5745.3390806911628980210.stgit@john-Precision-5820-Tower
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add tests to verify ability to add an XDP program to a
entry in a DEVMAP.
Add negative tests to show DEVMAP programs can not be
attached to devices as a normal XDP program, and accesses
to egress_ifindex require BPF_XDP_DEVMAP attach type.
Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200529220716.75383-6-dsahern@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Extend bench framework with ability to have benchmark-provided child argument
parser for custom benchmark-specific parameters. This makes bench generic code
modular and independent from any specific benchmark.
Also implement a set of benchmarks for new BPF ring buffer and existing perf
buffer. 4 benchmarks were implemented: 2 variations for each of BPF ringbuf
and perfbuf:,
- rb-libbpf utilizes stock libbpf ring_buffer manager for reading data;
- rb-custom implements custom ring buffer setup and reading code, to
eliminate overheads inherent in generic libbpf code due to callback
functions and the need to update consumer position after each consumed
record, instead of batching updates (due to pessimistic assumption that
user callback might take long time and thus could unnecessarily hold ring
buffer space for too long);
- pb-libbpf uses stock libbpf perf_buffer code with all the default
settings, though uses higher-performance raw event callback to minimize
unnecessary overhead;
- pb-custom implements its own custom consumer code to minimize any possible
overhead of generic libbpf implementation and indirect function calls.
All of the test support default, no data notification skipped, mode, as well
as sampled mode (with --rb-sampled flag), which allows to trigger epoll
notification less frequently and reduce overhead. As will be shown, this mode
is especially critical for perf buffer, which suffers from high overhead of
wakeups in kernel.
Otherwise, all benchamrks implement similar way to generate a batch of records
by using fentry/sys_getpgid BPF program, which pushes a bunch of records in
a tight loop and records number of successful and dropped samples. Each record
is a small 8-byte integer, to minimize the effect of memory copying with
bpf_perf_event_output() and bpf_ringbuf_output().
Benchmarks that have only one producer implement optional back-to-back mode,
in which record production and consumption is alternating on the same CPU.
This is the highest-throughput happy case, showing ultimate performance
achievable with either BPF ringbuf or perfbuf.
All the below scenarios are implemented in a script in
benchs/run_bench_ringbufs.sh. Tests were performed on 28-core/56-thread
Intel Xeon CPU E5-2680 v4 @ 2.40GHz CPU.
Single-producer, parallel producer
==================================
rb-libbpf 12.054 ± 0.320M/s (drops 0.000 ± 0.000M/s)
rb-custom 8.158 ± 0.118M/s (drops 0.001 ± 0.003M/s)
pb-libbpf 0.931 ± 0.007M/s (drops 0.000 ± 0.000M/s)
pb-custom 0.965 ± 0.003M/s (drops 0.000 ± 0.000M/s)
Single-producer, parallel producer, sampled notification
========================================================
rb-libbpf 11.563 ± 0.067M/s (drops 0.000 ± 0.000M/s)
rb-custom 15.895 ± 0.076M/s (drops 0.000 ± 0.000M/s)
pb-libbpf 9.889 ± 0.032M/s (drops 0.000 ± 0.000M/s)
pb-custom 9.866 ± 0.028M/s (drops 0.000 ± 0.000M/s)
Single producer on one CPU, consumer on another one, both running at full
speed. Curiously, rb-libbpf has higher throughput than objectively faster (due
to more lightweight consumer code path) rb-custom. It appears that faster
consumer causes kernel to send notifications more frequently, because consumer
appears to be caught up more frequently. Performance of perfbuf suffers from
default "no sampling" policy and huge overhead that causes.
In sampled mode, rb-custom is winning very significantly eliminating too
frequent in-kernel wakeups, the gain appears to be more than 2x.
Perf buffer achieves even more impressive wins, compared to stock perfbuf
settings, with 10x improvements in throughput with 1:500 sampling rate. The
trade-off is that with sampling, application might not get next X events until
X+1st arrives, which is not always acceptable. With steady influx of events,
though, this shouldn't be a problem.
Overall, single-producer performance of ring buffers seems to be better no
matter the sampled/non-sampled modes, but it especially beats ring buffer
without sampling due to its adaptive notification approach.
Single-producer, back-to-back mode
==================================
rb-libbpf 15.507 ± 0.247M/s (drops 0.000 ± 0.000M/s)
rb-libbpf-sampled 14.692 ± 0.195M/s (drops 0.000 ± 0.000M/s)
rb-custom 21.449 ± 0.157M/s (drops 0.000 ± 0.000M/s)
rb-custom-sampled 20.024 ± 0.386M/s (drops 0.000 ± 0.000M/s)
pb-libbpf 1.601 ± 0.015M/s (drops 0.000 ± 0.000M/s)
pb-libbpf-sampled 8.545 ± 0.064M/s (drops 0.000 ± 0.000M/s)
pb-custom 1.607 ± 0.022M/s (drops 0.000 ± 0.000M/s)
pb-custom-sampled 8.988 ± 0.144M/s (drops 0.000 ± 0.000M/s)
Here we test a back-to-back mode, which is arguably best-case scenario both
for BPF ringbuf and perfbuf, because there is no contention and for ringbuf
also no excessive notification, because consumer appears to be behind after
the first record. For ringbuf, custom consumer code clearly wins with 21.5 vs
16 million records per second exchanged between producer and consumer. Sampled
mode actually hurts a bit due to slightly slower producer logic (it needs to
fetch amount of data available to decide whether to skip or force notification).
Perfbuf with wakeup sampling gets 5.5x throughput increase, compared to
no-sampling version. There also doesn't seem to be noticeable overhead from
generic libbpf handling code.
Perfbuf back-to-back, effect of sample rate
===========================================
pb-sampled-1 1.035 ± 0.012M/s (drops 0.000 ± 0.000M/s)
pb-sampled-5 3.476 ± 0.087M/s (drops 0.000 ± 0.000M/s)
pb-sampled-10 5.094 ± 0.136M/s (drops 0.000 ± 0.000M/s)
pb-sampled-25 7.118 ± 0.153M/s (drops 0.000 ± 0.000M/s)
pb-sampled-50 8.169 ± 0.156M/s (drops 0.000 ± 0.000M/s)
pb-sampled-100 8.887 ± 0.136M/s (drops 0.000 ± 0.000M/s)
pb-sampled-250 9.180 ± 0.209M/s (drops 0.000 ± 0.000M/s)
pb-sampled-500 9.353 ± 0.281M/s (drops 0.000 ± 0.000M/s)
pb-sampled-1000 9.411 ± 0.217M/s (drops 0.000 ± 0.000M/s)
pb-sampled-2000 9.464 ± 0.167M/s (drops 0.000 ± 0.000M/s)
pb-sampled-3000 9.575 ± 0.273M/s (drops 0.000 ± 0.000M/s)
This benchmark shows the effect of event sampling for perfbuf. Back-to-back
mode for highest throughput. Just doing every 5th record notification gives
3.5x speed up. 250-500 appears to be the point of diminishing return, with
almost 9x speed up. Most benchmarks use 500 as the default sampling for pb-raw
and pb-custom.
Ringbuf back-to-back, effect of sample rate
===========================================
rb-sampled-1 1.106 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-sampled-5 4.746 ± 0.149M/s (drops 0.000 ± 0.000M/s)
rb-sampled-10 7.706 ± 0.164M/s (drops 0.000 ± 0.000M/s)
rb-sampled-25 12.893 ± 0.273M/s (drops 0.000 ± 0.000M/s)
rb-sampled-50 15.961 ± 0.361M/s (drops 0.000 ± 0.000M/s)
rb-sampled-100 18.203 ± 0.445M/s (drops 0.000 ± 0.000M/s)
rb-sampled-250 19.962 ± 0.786M/s (drops 0.000 ± 0.000M/s)
rb-sampled-500 20.881 ± 0.551M/s (drops 0.000 ± 0.000M/s)
rb-sampled-1000 21.317 ± 0.532M/s (drops 0.000 ± 0.000M/s)
rb-sampled-2000 21.331 ± 0.535M/s (drops 0.000 ± 0.000M/s)
rb-sampled-3000 21.688 ± 0.392M/s (drops 0.000 ± 0.000M/s)
Similar benchmark for ring buffer also shows a great advantage (in terms of
throughput) of skipping notifications. Skipping every 5th one gives 4x boost.
Also similar to perfbuf case, 250-500 seems to be the point of diminishing
returns, giving roughly 20x better results.
Keep in mind, for this test, notifications are controlled manually with
BPF_RB_NO_WAKEUP and BPF_RB_FORCE_WAKEUP. As can be seen from previous
benchmarks, adaptive notifications based on consumer's positions provides same
(or even slightly better due to simpler load generator on BPF side) benefits in
favorable back-to-back scenario. Over zealous and fast consumer, which is
almost always caught up, will make thoughput numbers smaller. That's the case
when manual notification control might prove to be extremely beneficial.
Ringbuf back-to-back, reserve+commit vs output
==============================================
reserve 22.819 ± 0.503M/s (drops 0.000 ± 0.000M/s)
output 18.906 ± 0.433M/s (drops 0.000 ± 0.000M/s)
Ringbuf sampled, reserve+commit vs output
=========================================
reserve-sampled 15.350 ± 0.132M/s (drops 0.000 ± 0.000M/s)
output-sampled 14.195 ± 0.144M/s (drops 0.000 ± 0.000M/s)
BPF ringbuf supports two sets of APIs with various usability and performance
tradeoffs: bpf_ringbuf_reserve()+bpf_ringbuf_commit() vs bpf_ringbuf_output().
This benchmark clearly shows superiority of reserve+commit approach, despite
using a small 8-byte record size.
Single-producer, consumer/producer competing on the same CPU, low batch count
=============================================================================
rb-libbpf 3.045 ± 0.020M/s (drops 3.536 ± 0.148M/s)
rb-custom 3.055 ± 0.022M/s (drops 3.893 ± 0.066M/s)
pb-libbpf 1.393 ± 0.024M/s (drops 0.000 ± 0.000M/s)
pb-custom 1.407 ± 0.016M/s (drops 0.000 ± 0.000M/s)
This benchmark shows one of the worst-case scenarios, in which producer and
consumer do not coordinate *and* fight for the same CPU. No batch count and
sampling settings were able to eliminate drops for ringbuffer, producer is
just too fast for consumer to keep up. But ringbuf and perfbuf still able to
pass through quite a lot of messages, which is more than enough for a lot of
applications.
Ringbuf, multi-producer contention
==================================
rb-libbpf nr_prod 1 10.916 ± 0.399M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 2 4.931 ± 0.030M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 3 4.880 ± 0.006M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 4 3.926 ± 0.004M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 8 4.011 ± 0.004M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 12 3.967 ± 0.016M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 16 2.604 ± 0.030M/s (drops 0.001 ± 0.002M/s)
rb-libbpf nr_prod 20 2.233 ± 0.003M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 24 2.085 ± 0.015M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 28 2.055 ± 0.004M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 32 1.962 ± 0.004M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 36 2.089 ± 0.005M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 40 2.118 ± 0.006M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 44 2.105 ± 0.004M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 48 2.120 ± 0.058M/s (drops 0.000 ± 0.001M/s)
rb-libbpf nr_prod 52 2.074 ± 0.024M/s (drops 0.007 ± 0.014M/s)
Ringbuf uses a very short-duration spinlock during reservation phase, to check
few invariants, increment producer count and set record header. This is the
biggest point of contention for ringbuf implementation. This benchmark
evaluates the effect of multiple competing writers on overall throughput of
a single shared ringbuffer.
Overall throughput drops almost 2x when going from single to two
highly-contended producers, gradually dropping with additional competing
producers. Performance drop stabilizes at around 20 producers and hovers
around 2mln even with 50+ fighting producers, which is a 5x drop compared to
non-contended case. Good kernel implementation in kernel helps maintain decent
performance here.
Note, that in the intended real-world scenarios, it's not expected to get even
close to such a high levels of contention. But if contention will become
a problem, there is always an option of sharding few ring buffers across a set
of CPUs.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200529075424.3139988-5-andriin@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Both singleton BPF ringbuf and BPF ringbuf with map-in-map use cases are tested.
Also reserve+submit/discards and output variants of API are validated.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200529075424.3139988-4-andriin@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This commit adds a new MPSC ring buffer implementation into BPF ecosystem,
which allows multiple CPUs to submit data to a single shared ring buffer. On
the consumption side, only single consumer is assumed.
Motivation
----------
There are two distinctive motivators for this work, which are not satisfied by
existing perf buffer, which prompted creation of a new ring buffer
implementation.
- more efficient memory utilization by sharing ring buffer across CPUs;
- preserving ordering of events that happen sequentially in time, even
across multiple CPUs (e.g., fork/exec/exit events for a task).
These two problems are independent, but perf buffer fails to satisfy both.
Both are a result of a choice to have per-CPU perf ring buffer. Both can be
also solved by having an MPSC implementation of ring buffer. The ordering
problem could technically be solved for perf buffer with some in-kernel
counting, but given the first one requires an MPSC buffer, the same solution
would solve the second problem automatically.
Semantics and APIs
------------------
Single ring buffer is presented to BPF programs as an instance of BPF map of
type BPF_MAP_TYPE_RINGBUF. Two other alternatives considered, but ultimately
rejected.
One way would be to, similar to BPF_MAP_TYPE_PERF_EVENT_ARRAY, make
BPF_MAP_TYPE_RINGBUF could represent an array of ring buffers, but not enforce
"same CPU only" rule. This would be more familiar interface compatible with
existing perf buffer use in BPF, but would fail if application needed more
advanced logic to lookup ring buffer by arbitrary key. HASH_OF_MAPS addresses
this with current approach. Additionally, given the performance of BPF
ringbuf, many use cases would just opt into a simple single ring buffer shared
among all CPUs, for which current approach would be an overkill.
Another approach could introduce a new concept, alongside BPF map, to
represent generic "container" object, which doesn't necessarily have key/value
interface with lookup/update/delete operations. This approach would add a lot
of extra infrastructure that has to be built for observability and verifier
support. It would also add another concept that BPF developers would have to
familiarize themselves with, new syntax in libbpf, etc. But then would really
provide no additional benefits over the approach of using a map.
BPF_MAP_TYPE_RINGBUF doesn't support lookup/update/delete operations, but so
doesn't few other map types (e.g., queue and stack; array doesn't support
delete, etc).
The approach chosen has an advantage of re-using existing BPF map
infrastructure (introspection APIs in kernel, libbpf support, etc), being
familiar concept (no need to teach users a new type of object in BPF program),
and utilizing existing tooling (bpftool). For common scenario of using
a single ring buffer for all CPUs, it's as simple and straightforward, as
would be with a dedicated "container" object. On the other hand, by being
a map, it can be combined with ARRAY_OF_MAPS and HASH_OF_MAPS map-in-maps to
implement a wide variety of topologies, from one ring buffer for each CPU
(e.g., as a replacement for perf buffer use cases), to a complicated
application hashing/sharding of ring buffers (e.g., having a small pool of
ring buffers with hashed task's tgid being a look up key to preserve order,
but reduce contention).
Key and value sizes are enforced to be zero. max_entries is used to specify
the size of ring buffer and has to be a power of 2 value.
There are a bunch of similarities between perf buffer
(BPF_MAP_TYPE_PERF_EVENT_ARRAY) and new BPF ring buffer semantics:
- variable-length records;
- if there is no more space left in ring buffer, reservation fails, no
blocking;
- memory-mappable data area for user-space applications for ease of
consumption and high performance;
- epoll notifications for new incoming data;
- but still the ability to do busy polling for new data to achieve the
lowest latency, if necessary.
BPF ringbuf provides two sets of APIs to BPF programs:
- bpf_ringbuf_output() allows to *copy* data from one place to a ring
buffer, similarly to bpf_perf_event_output();
- bpf_ringbuf_reserve()/bpf_ringbuf_commit()/bpf_ringbuf_discard() APIs
split the whole process into two steps. First, a fixed amount of space is
reserved. If successful, a pointer to a data inside ring buffer data area
is returned, which BPF programs can use similarly to a data inside
array/hash maps. Once ready, this piece of memory is either committed or
discarded. Discard is similar to commit, but makes consumer ignore the
record.
bpf_ringbuf_output() has disadvantage of incurring extra memory copy, because
record has to be prepared in some other place first. But it allows to submit
records of the length that's not known to verifier beforehand. It also closely
matches bpf_perf_event_output(), so will simplify migration significantly.
bpf_ringbuf_reserve() avoids the extra copy of memory by providing a memory
pointer directly to ring buffer memory. In a lot of cases records are larger
than BPF stack space allows, so many programs have use extra per-CPU array as
a temporary heap for preparing sample. bpf_ringbuf_reserve() avoid this needs
completely. But in exchange, it only allows a known constant size of memory to
be reserved, such that verifier can verify that BPF program can't access
memory outside its reserved record space. bpf_ringbuf_output(), while slightly
slower due to extra memory copy, covers some use cases that are not suitable
for bpf_ringbuf_reserve().
The difference between commit and discard is very small. Discard just marks
a record as discarded, and such records are supposed to be ignored by consumer
code. Discard is useful for some advanced use-cases, such as ensuring
all-or-nothing multi-record submission, or emulating temporary malloc()/free()
within single BPF program invocation.
Each reserved record is tracked by verifier through existing
reference-tracking logic, similar to socket ref-tracking. It is thus
impossible to reserve a record, but forget to submit (or discard) it.
bpf_ringbuf_query() helper allows to query various properties of ring buffer.
Currently 4 are supported:
- BPF_RB_AVAIL_DATA returns amount of unconsumed data in ring buffer;
- BPF_RB_RING_SIZE returns the size of ring buffer;
- BPF_RB_CONS_POS/BPF_RB_PROD_POS returns current logical possition of
consumer/producer, respectively.
Returned values are momentarily snapshots of ring buffer state and could be
off by the time helper returns, so this should be used only for
debugging/reporting reasons or for implementing various heuristics, that take
into account highly-changeable nature of some of those characteristics.
One such heuristic might involve more fine-grained control over poll/epoll
notifications about new data availability in ring buffer. Together with
BPF_RB_NO_WAKEUP/BPF_RB_FORCE_WAKEUP flags for output/commit/discard helpers,
it allows BPF program a high degree of control and, e.g., more efficient
batched notifications. Default self-balancing strategy, though, should be
adequate for most applications and will work reliable and efficiently already.
Design and implementation
-------------------------
This reserve/commit schema allows a natural way for multiple producers, either
on different CPUs or even on the same CPU/in the same BPF program, to reserve
independent records and work with them without blocking other producers. This
means that if BPF program was interruped by another BPF program sharing the
same ring buffer, they will both get a record reserved (provided there is
enough space left) and can work with it and submit it independently. This
applies to NMI context as well, except that due to using a spinlock during
reservation, in NMI context, bpf_ringbuf_reserve() might fail to get a lock,
in which case reservation will fail even if ring buffer is not full.
The ring buffer itself internally is implemented as a power-of-2 sized
circular buffer, with two logical and ever-increasing counters (which might
wrap around on 32-bit architectures, that's not a problem):
- consumer counter shows up to which logical position consumer consumed the
data;
- producer counter denotes amount of data reserved by all producers.
Each time a record is reserved, producer that "owns" the record will
successfully advance producer counter. At that point, data is still not yet
ready to be consumed, though. Each record has 8 byte header, which contains
the length of reserved record, as well as two extra bits: busy bit to denote
that record is still being worked on, and discard bit, which might be set at
commit time if record is discarded. In the latter case, consumer is supposed
to skip the record and move on to the next one. Record header also encodes
record's relative offset from the beginning of ring buffer data area (in
pages). This allows bpf_ringbuf_commit()/bpf_ringbuf_discard() to accept only
the pointer to the record itself, without requiring also the pointer to ring
buffer itself. Ring buffer memory location will be restored from record
metadata header. This significantly simplifies verifier, as well as improving
API usability.
Producer counter increments are serialized under spinlock, so there is
a strict ordering between reservations. Commits, on the other hand, are
completely lockless and independent. All records become available to consumer
in the order of reservations, but only after all previous records where
already committed. It is thus possible for slow producers to temporarily hold
off submitted records, that were reserved later.
Reservation/commit/consumer protocol is verified by litmus tests in
Documentation/litmus-test/bpf-rb.
One interesting implementation bit, that significantly simplifies (and thus
speeds up as well) implementation of both producers and consumers is how data
area is mapped twice contiguously back-to-back in the virtual memory. This
allows to not take any special measures for samples that have to wrap around
at the end of the circular buffer data area, because the next page after the
last data page would be first data page again, and thus the sample will still
appear completely contiguous in virtual memory. See comment and a simple ASCII
diagram showing this visually in bpf_ringbuf_area_alloc().
Another feature that distinguishes BPF ringbuf from perf ring buffer is
a self-pacing notifications of new data being availability.
bpf_ringbuf_commit() implementation will send a notification of new record
being available after commit only if consumer has already caught up right up
to the record being committed. If not, consumer still has to catch up and thus
will see new data anyways without needing an extra poll notification.
Benchmarks (see tools/testing/selftests/bpf/benchs/bench_ringbuf.c) show that
this allows to achieve a very high throughput without having to resort to
tricks like "notify only every Nth sample", which are necessary with perf
buffer. For extreme cases, when BPF program wants more manual control of
notifications, commit/discard/output helpers accept BPF_RB_NO_WAKEUP and
BPF_RB_FORCE_WAKEUP flags, which give full control over notifications of data
availability, but require extra caution and diligence in using this API.
Comparison to alternatives
--------------------------
Before considering implementing BPF ring buffer from scratch existing
alternatives in kernel were evaluated, but didn't seem to meet the needs. They
largely fell into few categores:
- per-CPU buffers (perf, ftrace, etc), which don't satisfy two motivations
outlined above (ordering and memory consumption);
- linked list-based implementations; while some were multi-producer designs,
consuming these from user-space would be very complicated and most
probably not performant; memory-mapping contiguous piece of memory is
simpler and more performant for user-space consumers;
- io_uring is SPSC, but also requires fixed-sized elements. Naively turning
SPSC queue into MPSC w/ lock would have subpar performance compared to
locked reserve + lockless commit, as with BPF ring buffer. Fixed sized
elements would be too limiting for BPF programs, given existing BPF
programs heavily rely on variable-sized perf buffer already;
- specialized implementations (like a new printk ring buffer, [0]) with lots
of printk-specific limitations and implications, that didn't seem to fit
well for intended use with BPF programs.
[0] https://lwn.net/Articles/779550/
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200529075424.3139988-2-andriin@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
For write-only stacks and queues bpf_map_update_elem should be allowed, but
bpf_map_lookup_elem and bpf_map_lookup_and_delete_elem should fail with EPERM.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200527185700.14658-6-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Make comments inside the test_map_rdonly and test_map_wronly tests
consistent with logic.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200527185700.14658-4-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The test_map_rdonly and test_map_wronly tests should close file descriptors
which they open.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200527185700.14658-3-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Trivial fix to a typo in the test_map_wronly test: "read" -> "write"
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200527185700.14658-2-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Lets test using probe* in SCHED_CLS network programs as well just
to be sure these keep working. Its cheap to add the extra test
and provides a second context to test outside of sk_msg after
we generalized probe* helpers to all networking types.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/159033911685.12355.15951980509828906214.stgit@john-Precision-5820-Tower
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The test itself is not particularly useful but it encodes a common
pattern we have.
Namely do a sk storage lookup then depending on data here decide if
we need to do more work or alternatively allow packet to PASS. Then
if we need to do more work consult task_struct for more information
about the running task. Finally based on this additional information
drop or pass the data. In this case the suspicious check is not so
realisitic but it encodes the general pattern and uses the helpers
so we test the workflow.
This is a load test to ensure verifier correctly handles this case.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/159033909665.12355.6166415847337547879.stgit@john-Precision-5820-Tower
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The identation before this code
(`if not os.path.exists(cli_args.build_dir):``)
was with spaces instead of tabs after fixed up merge conflits,
this commit revert spaces to tabs:
[iha@bbking linux]$ tools/testing/kunit/kunit.py run
File "tools/testing/kunit/kunit.py", line 247
if not linux:
^
TabError: inconsistent use of tabs and spaces in indentation
[iha@bbking linux]$ tools/testing/kunit/kunit.py run
Traceback (most recent call last):
File "tools/testing/kunit/kunit.py", line 338, in <module>
main(sys.argv[1:])
File "tools/testing/kunit/kunit.py", line 215, in main
add_config_opts(config_parser)
[iha@bbking linux]$ tools/testing/kunit/kunit.py run
Traceback (most recent call last):
File "tools/testing/kunit/kunit.py", line 337, in <module>
main(sys.argv[1:])
File "tools/testing/kunit/kunit.py", line 255, in main
result = run_tests(linux, request)
File "tools/testing/kunit/kunit.py", line 133, in run_tests
request.defconfig,
AttributeError: 'KunitRequest' object has no attribute 'defconfig'
Handles when there is no .kunitconfig, the error due to merge conflicts
between the following:
commit 9bdf64b351 ("kunit: use KUnit defconfig by default")
commit 45ba7a893a ("kunit: kunit_tool: Separate out
config/build/exec/parse")
[iha@bbking linux]$ tools/testing/kunit/kunit.py run
Traceback (most recent call last):
File "tools/testing/kunit/kunit.py", line 335, in <module>
main(sys.argv[1:])
File "tools/testing/kunit/kunit.py", line 246, in main
linux = kunit_kernel.LinuxSourceTree()
File "../tools/testing/kunit/kunit_kernel.py", line 109, in __init__
self._kconfig.read_from_file(kunitconfig_path)
File "t../ools/testing/kunit/kunit_config.py", line 88, in read_from_file
with open(path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '.kunit/.kunitconfig'
Signed-off-by: Vitor Massaru Iha <vitor@massaru.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
- RCU-tasks update, including addition of RCU Tasks Trace for
BPF use and TASKS_RUDE_RCU
- kfree_rcu() updates.
- Remove scheduler locking restriction
- RCU CPU stall warning updates.
- Torture-test updates.
- Miscellaneous fixes and other updates.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7U/r0RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hSNxAAirKhPGBoLI9DW1qde4OFhZg+BlIpS+LD
IE/0eGB8hGwhb1793RGbzIJfSnRQpSOPxWbWc6DJZ4Zpi5/ZbVkiPKsuXpM1xGxs
kuBCTOhWy1/p3iCZ1JH/JCrCAdWGZkIzEoaV7ipnHtV/+UrRbCWH5PB7R0fYvcbI
q5bUcWJyEp/bYMxQn8DhAih6SLPHx+F9qaGAqqloLSHstTYG2HkBhBGKnqcd/Jex
twkLK53poCkeP/c08V1dyagU2IRWj2jGB1NjYh/Ocm+Sn/vru15CVGspjVjqO5FF
oq07lad357ddMsZmKoM2F5DhXbOh95A+EqF9VDvIzCvfGMUgqYI1oxWF4eycsGhg
/aYJgYuN23YeEe2DkDzJB67GvBOwl4WgdoFaxKRzOiCSfrhkM8KqM4G9Fz1JIepG
abRJCF85iGcLslU9DkrShQiDsd/CRPzu/jz6ybK0I2II2pICo6QRf76T7TdOvKnK
yXwC6OdL7/dwOht20uT6XfnDXMCWI4MutiUrb8/C1DbaihwEaI2denr3YYL+IwrB
B38CdP6sfKZ5UFxKh0xb+sOzWrw0KA+ThSAXeJhz3tKdxdyB6nkaw3J9lFg8oi20
XGeAujjtjMZG5cxt2H+wO9kZY0RRau/nTqNtmmRrCobd5yJjHHPHH8trEd0twZ9A
X5Wjh11lv3E=
=Yisx
-----END PGP SIGNATURE-----
Merge tag 'core-rcu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
"The RCU updates for this cycle were:
- RCU-tasks update, including addition of RCU Tasks Trace for BPF use
and TASKS_RUDE_RCU
- kfree_rcu() updates.
- Remove scheduler locking restriction
- RCU CPU stall warning updates.
- Torture-test updates.
- Miscellaneous fixes and other updates"
* tag 'core-rcu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits)
rcu: Allow for smp_call_function() running callbacks from idle
rcu: Provide rcu_irq_exit_check_preempt()
rcu: Abstract out rcu_irq_enter_check_tick() from rcu_nmi_enter()
rcu: Provide __rcu_is_watching()
rcu: Provide rcu_irq_exit_preempt()
rcu: Make RCU IRQ enter/exit functions rely on in_nmi()
rcu/tree: Mark the idle relevant functions noinstr
x86: Replace ist_enter() with nmi_enter()
x86/mce: Send #MC singal from task work
x86/entry: Get rid of ist_begin/end_non_atomic()
sched,rcu,tracing: Avoid tracing before in_nmi() is correct
sh/ftrace: Move arch_ftrace_nmi_{enter,exit} into nmi exception
lockdep: Always inline lockdep_{off,on}()
hardirq/nmi: Allow nested nmi_enter()
arm64: Prepare arch_nmi_enter() for recursion
printk: Disallow instrumenting print_nmi_enter()
printk: Prepare for nested printk_nmi_enter()
rcutorture: Convert ULONG_CMP_LT() to time_before()
torture: Add a --kasan argument
torture: Save a few lines by using config_override_param initially
...
- refactor pstore locking for safer module unloading (Kees Cook)
- remove orphaned records from pstorefs when backend unloaded (Kees Cook)
- refactor dump_oops parameter into max_reason (Pavel Tatashin)
- introduce pstore/zone for common code for contiguous storage (WeiXiong Liao)
- introduce pstore/blk for block device backend (WeiXiong Liao)
- introduce mtd backend (WeiXiong Liao)
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl7UbYYWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJpkgD/9/09OkJIWydwk2lr2T89HW5fSF
5uBT0a309/QDUpnV9yhcRsrESEicnvbtaGxD0kuYIInkiW/2cj1l689EkyRjUmy9
q3z4GzLqOlC7qvd7LUPFNGHmllBb09H/CxmXDxRP3aynB9oHzdpNQdPcpLBDA00r
0byp/AE48dFbKIhtT0QxpGUYZFOlyc7XVAaOkED4bmu148gx8q7MU1AxFgbx0Feb
9iPV0r6XYMgXJZ3sn/3PJsxF0V/giDSJ8ui2xsYRjCE408zVIYLdDs2e8dz+2yW6
+3Lyankgo+ofZc4XYExTYgn3WjhPFi+pjVRUaj+BcyTk9SLNIj2WmZdmcLMuzanh
BaUurmED7ffTtlsH4PhQgn8/OY4FX2PO2MwUHwlU+87Y8YDiW0lpzTq5H822OO8p
QQ8awql/6lLCJuyzuWIciVUsS65MCPxsZ4+LSiMZzyYpWu1sxrEY8ic3agzCgsA0
0i+4nZFlLG+Aap/oiKpegenkIyAunn2tDXAyFJFH6qLOiZJ78iRuws3XZqjCElhJ
XqvyDJIfjkJhWUb++ckeqX7ThOR4CPSnwba/7GHv7NrQWuk3Cn+GQ80oxydXUY6b
2/4eYjq0wtvf9NeuJ4/LYNXotLR/bq9zS0zqwTWG50v+RPmuC3bNJB+RmF7fCiCG
jo1Sd1LMeTQ7bnULpA==
=7s1u
-----END PGP SIGNATURE-----
Merge tag 'pstore-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore updates from Kees Cook:
"Fixes and new features for pstore.
This is a pretty big set of changes (relative to past pstore pulls),
but it has been in -next for a while. The biggest change here is the
ability to support a block device as a pstore backend, which has been
desired for a while. A lot of additional fixes and refactorings are
also included, mostly in support of the new features.
- refactor pstore locking for safer module unloading (Kees Cook)
- remove orphaned records from pstorefs when backend unloaded (Kees
Cook)
- refactor dump_oops parameter into max_reason (Pavel Tatashin)
- introduce pstore/zone for common code for contiguous storage
(WeiXiong Liao)
- introduce pstore/blk for block device backend (WeiXiong Liao)
- introduce mtd backend (WeiXiong Liao)"
* tag 'pstore-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (35 commits)
mtd: Support kmsg dumper based on pstore/blk
pstore/blk: Introduce "best_effort" mode
pstore/blk: Support non-block storage devices
pstore/blk: Provide way to query pstore configuration
pstore/zone: Provide way to skip "broken" zone for MTD devices
Documentation: Add details for pstore/blk
pstore/zone,blk: Add ftrace frontend support
pstore/zone,blk: Add console frontend support
pstore/zone,blk: Add support for pmsg frontend
pstore/blk: Introduce backend for block devices
pstore/zone: Introduce common layer to manage storage zones
ramoops: Add "max-reason" optional field to ramoops DT node
pstore/ram: Introduce max_reason and convert dump_oops
pstore/platform: Pass max_reason to kmesg dump
printk: Introduce kmsg_dump_reason_str()
printk: honor the max_reason field in kmsg_dumper
printk: Collapse shutdown types into a single dump reason
pstore/ftrace: Provide ftrace log merging routine
pstore/ram: Refactor ftrace buffer merging
pstore/ram: Refactor DT size parsing
...
Generate packets matching the various control traps and check that the
traps' stats increase accordingly.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vmx_tsc_adjust_test fails with:
IA32_TSC_ADJUST is -4294969448 (-1 * TSC_ADJUST_VALUE + -2152).
IA32_TSC_ADJUST is -4294969448 (-1 * TSC_ADJUST_VALUE + -2152).
IA32_TSC_ADJUST is 281470681738540 (65534 * TSC_ADJUST_VALUE + 4294962476).
==== Test Assertion Failure ====
x86_64/vmx_tsc_adjust_test.c:153: false
pid=19738 tid=19738 - Interrupted system call
1 0x0000000000401192: main at vmx_tsc_adjust_test.c:153
2 0x00007fe1ef8583d4: ?? ??:0
3 0x0000000000401201: _start at ??:?
Failed guest assert: (adjust <= max)
The problem is that is 'tsc_val' should be u64, not u32 or the reading
gets truncated.
Fixes: 8d7fbf01f9 ("KVM: selftests: VMX preemption timer migration test")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200601154726.261868-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
With synthetic events now a separate config item as a result of
'tracing: Move synthetic events to a separate file', tests that use
both need to explicitly check for hist trigger support rather than
relying on hist triggers to pull in synthetic events.
Add an additional hist trigger check to all the trigger tests that now
require it, otherwise they'll fail if synthetic events but not hist
triggers are enabled.
Link: http://lkml.kernel.org/r/af36c539006ef2768114b4ed38e6b054f7c7a3bd.1590693308.git.zanussi@kernel.org
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Update tests to reflect new CPUID capabilities with SYNDBG.
Check that we get the right number of entries and that
0x40000000.EAX always returns the correct max leaf.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Jon Doron <arilou@gmail.com>
Message-Id: <20200529134543.1127440-7-arilou@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When a nested VM with a VMX-preemption timer is migrated, verify that the
nested VM and its parent VM observe the VMX-preemption timer exit close to
the original expiration deadline.
Signed-off-by: Makarand Sonare <makarandsonare@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20200526215107.205814-3-makarandsonare@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM_CAP_NESTED_STATE is now supported for AMD too but smm test acts like
it is still Intel only.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200529130407.57176-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The test is similar to the existing one for VMX, but simpler because we
don't have to test shadow VMCS or vmptrld/vmptrst/vmclear.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Many tests will want to check if the CPU is Intel or AMD in
guest code, add cpu_has_svm() and put it as static
inline to svm_util.h.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200529130407.57176-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
xdp_umem.c had overlapping changes between the 64-bit math fix
for the calculation of npgs and the removal of the zerocopy
memory type which got rid of the chunk_size_nohdr member.
The mlx5 Kconfig conflict is a case where we just take the
net-next copy of the Kconfig entry dependency as it takes on
the ESWITCH dependency by one level of indirection which is
what the 'net' conflicting change is trying to ensure.
Signed-off-by: David S. Miller <davem@davemloft.net>
A missing stats_update callback was recently added to act_pedit. Now that
iproute2 supports JSON dumping for pedit, extend the pedit_dsfield selftest
with a check that would have caught the fact that the callback was missing.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using ping in tests is error-prone, because ping is too smart. On a
flaky system (notably in a simulator), when packets don't come quickly
enough, more pings are sent, and that throws off counters. Instead use
mausezahn to generate ICMP echo request packets. That allows us to
send them in quicker succession as well, because the reason the ping
was made slow in the first place was to make the tests work on
simulated systems.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf 2020-05-29
The following pull-request contains BPF updates for your *net* tree.
We've added 6 non-merge commits during the last 7 day(s) which contain
a total of 4 files changed, 55 insertions(+), 34 deletions(-).
The main changes are:
1) minor verifier fix for fmod_ret progs, from Alexei.
2) af_xdp overflow check, from Bjorn.
3) minor verifier fix for 32bit assignment, from John.
4) powerpc has non-overlapping addr space, from Petr.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Added a verifier test for assigning 32bit reg states to
64bit where 32bit reg holds a constant value of 0.
Without previous kernel verifier.c fix, the test in
this patch will fail.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/159077335867.6014.2075350327073125374.stgit@john-Precision-5820-Tower
After previous fix for zero extension test_verifier tests #65 and #66 now
fail. Before the fix we can see the alu32 mov op at insn 10
10: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0)
R1_w=invP(id=0,
smin_value=4294967168,smax_value=4294967423,
umin_value=4294967168,umax_value=4294967423,
var_off=(0x0; 0x1ffffffff),
s32_min_value=-2147483648,s32_max_value=2147483647,
u32_min_value=0,u32_max_value=-1)
R10=fp0 fp-8_w=mmmmmmmm
10: (bc) w1 = w1
11: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0)
R1_w=invP(id=0,
smin_value=0,smax_value=2147483647,
umin_value=0,umax_value=4294967295,
var_off=(0x0; 0xffffffff),
s32_min_value=-2147483648,s32_max_value=2147483647,
u32_min_value=0,u32_max_value=-1)
R10=fp0 fp-8_w=mmmmmmmm
After the fix at insn 10 because we have 's32_min_value < 0' the following
step 11 now has 'smax_value=U32_MAX' where before we pulled the s32_max_value
bound into the smax_value as seen above in 11 with smax_value=2147483647.
10: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0)
R1_w=inv(id=0,
smin_value=4294967168,smax_value=4294967423,
umin_value=4294967168,umax_value=4294967423,
var_off=(0x0; 0x1ffffffff),
s32_min_value=-2147483648, s32_max_value=2147483647,
u32_min_value=0,u32_max_value=-1)
R10=fp0 fp-8_w=mmmmmmmm
10: (bc) w1 = w1
11: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0)
R1_w=inv(id=0,
smin_value=0,smax_value=4294967295,
umin_value=0,umax_value=4294967295,
var_off=(0x0; 0xffffffff),
s32_min_value=-2147483648, s32_max_value=2147483647,
u32_min_value=0, u32_max_value=-1)
R10=fp0 fp-8_w=mmmmmmmm
The fall out of this is by the time we get to the failing instruction at
step 14 where previously we had the following:
14: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0)
R1_w=inv(id=0,
smin_value=72057594021150720,smax_value=72057594029539328,
umin_value=72057594021150720,umax_value=72057594029539328,
var_off=(0xffffffff000000; 0xffffff),
s32_min_value=-16777216,s32_max_value=-1,
u32_min_value=-16777216,u32_max_value=-1)
R10=fp0 fp-8_w=mmmmmmmm
14: (0f) r0 += r1
We now have,
14: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0)
R1_w=inv(id=0,
smin_value=0,smax_value=72057594037927935,
umin_value=0,umax_value=72057594037927935,
var_off=(0x0; 0xffffffffffffff),
s32_min_value=-2147483648,s32_max_value=2147483647,
u32_min_value=0,u32_max_value=-1)
R10=fp0 fp-8_w=mmmmmmmm
14: (0f) r0 += r1
In the original step 14 'smin_value=72057594021150720' this trips the logic
in the verifier function check_reg_sane_offset(),
if (smin >= BPF_MAX_VAR_OFF || smin <= -BPF_MAX_VAR_OFF) {
verbose(env, "value %lld makes %s pointer be out of bounds\n",
smin, reg_type_str[type]);
return false;
}
Specifically, the 'smin <= -BPF_MAX_VAR_OFF' check. But with the fix
at step 14 we have bounds 'smin_value=0' so the above check is not tripped
because BPF_MAX_VAR_OFF=1<<29.
We have a smin_value=0 here because at step 10 the smaller smin_value=0 means
the subtractions at steps 11 and 12 bring the smin_value negative.
11: (17) r1 -= 2147483584
12: (17) r1 -= 2147483584
13: (77) r1 >>= 8
Then the shift clears the top bit and smin_value is set to 0. Note we still
have the smax_value in the fixed code so any reads will fail. An alternative
would be to have reg_sane_check() do both smin and smax value tests.
To fix the test we can omit the 'r1 >>=8' at line 13. This will change the
err string, but keeps the intention of the test as suggseted by the title,
"check after truncation of boundary-crossing range". If the verifier logic
changes a different value is likely to be thrown in the error or the error
will no longer be thrown forcing this test to be examined. With this change
we see the new state at step 13.
13: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0)
R1_w=invP(id=0,
smin_value=-4294967168,smax_value=127,
umin_value=0,umax_value=18446744073709551615,
s32_min_value=-2147483648,s32_max_value=2147483647,
u32_min_value=0,u32_max_value=-1)
R10=fp0 fp-8_w=mmmmmmmm
Giving the expected out of bounds error, "value -4294967168 makes map_value
pointer be out of bounds" However, for unpriv case we see a different error
now because of the mixed signed bounds pointer arithmatic. This seems OK so
I've only added the unpriv_errstr for this. Another optino may have been to
do addition on r1 instead of subtraction but I favor the approach above
slightly.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/159077333942.6014.14004320043595756079.stgit@john-Precision-5820-Tower
Add Nik's torture tests as a new set to stress the replace and cleanup
paths.
Torture test created by Nikolay Aleksandrov and then I adapted to
selftest and added IPv6 version.
Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check whether error_log file exists in tracing/error_log testcase
and return UNSUPPORTED if no error_log file.
This can happen if we run the ftracetest on the older stable
kernel.
Fixes: 4eab1cc461 ("selftests/ftrace: Add tracing/error_log testcase")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Since the built-in echo has different behavior in POSIX shell
(dash) and bash, kprobe_syntax_errors.tc can fail on dash which
interpret backslash escape automatically.
To fix this issue, we explicitly use printf "%s" (not interpret
backslash escapes) if the command string can include backslash.
Reported-by: Liu Yiding <yidingx.liu@intel.com>
Suggested-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Add a couple large ecmp group nexthop selftests to cover
the remnant fixed by d69100b8ee.
The tests create 100 x32 ecmp groups of ipv4 and ipv6 and then
dump them. On kernels without the fix, they will fail due
to data remnant during the dump.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To align with recent recommended values. Will be configurable by future
patches.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MSCC bug fix in 'net' had to be slightly adjusted because the
register accesses are done slightly differently in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix RCU warnings in ipv6 multicast router code, from Madhuparna
Bhowmik.
2) Nexthop attributes aren't being checked properly because of
mis-initialized iterator, from David Ahern.
3) Revert iop_idents_reserve() change as it caused performance
regressions and was just working around what is really a UBSAN bug
in the compiler. From Yuqi Jin.
4) Read MAC address properly from ROM in bmac driver (double iteration
proceeds past end of address array), from Jeremy Kerr.
5) Add Microsoft Surface device IDs to r8152, from Marc Payne.
6) Prevent reference to freed SKB in __netif_receive_skb_core(), from
Boris Sukholitko.
7) Fix ACK discard behavior in rxrpc, from David Howells.
8) Preserve flow hash across packet scrubbing in wireguard, from Jason
A. Donenfeld.
9) Cap option length properly for SO_BINDTODEVICE in AX25, from Eric
Dumazet.
10) Fix encryption error checking in kTLS code, from Vadim Fedorenko.
11) Missing BPF prog ref release in flow dissector, from Jakub Sitnicki.
12) dst_cache must be used with BH disabled in tipc, from Eric Dumazet.
13) Fix use after free in mlxsw driver, from Jiri Pirko.
14) Order kTLS key destruction properly in mlx5 driver, from Tariq
Toukan.
15) Check devm_platform_ioremap_resource() return value properly in
several drivers, from Tiezhu Yang.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits)
net: smsc911x: Fix runtime PM imbalance on error
net/mlx4_core: fix a memory leak bug.
net: ethernet: ti: cpsw: fix ASSERT_RTNL() warning during suspend
net: phy: mscc: fix initialization of the MACsec protocol mode
net: stmmac: don't attach interface until resume finishes
net: Fix return value about devm_platform_ioremap_resource()
net/mlx5: Fix error flow in case of function_setup failure
net/mlx5e: CT: Correctly get flow rule
net/mlx5e: Update netdev txq on completions during closure
net/mlx5: Annotate mutex destroy for root ns
net/mlx5: Don't maintain a case of del_sw_func being null
net/mlx5: Fix cleaning unmanaged flow tables
net/mlx5: Fix memory leak in mlx5_events_init
net/mlx5e: Fix inner tirs handling
net/mlx5e: kTLS, Destroy key object after destroying the TIS
net/mlx5e: Fix allowed tc redirect merged eswitch offload cases
net/mlx5: Avoid processing commands before cmdif is ready
net/mlx5: Fix a race when moving command interface to events mode
net/mlx5: Add command entry handling completion
rxrpc: Fix a memory leak in rxkad_verify_response()
...
Daniel Borkmann says:
====================
pull-request: bpf-next 2020-05-23
The following pull-request contains BPF updates for your *net-next* tree.
We've added 50 non-merge commits during the last 8 day(s) which contain
a total of 109 files changed, 2776 insertions(+), 2887 deletions(-).
The main changes are:
1) Add a new AF_XDP buffer allocation API to the core in order to help
lowering the bar for drivers adopting AF_XDP support. i40e, ice, ixgbe
as well as mlx5 have been moved over to the new API and also gained a
small improvement in performance, from Björn Töpel and Magnus Karlsson.
2) Add getpeername()/getsockname() attach types for BPF sock_addr programs
in order to allow for e.g. reverse translation of load-balancer backend
to service address/port tuple from a connected peer, from Daniel Borkmann.
3) Improve the BPF verifier is_branch_taken() logic to evaluate pointers
being non-NULL, e.g. if after an initial test another non-NULL test on
that pointer follows in a given path, then it can be pruned right away,
from John Fastabend.
4) Larger rework of BPF sockmap selftests to make output easier to understand
and to reduce overall runtime as well as adding new BPF kTLS selftests
that run in combination with sockmap, also from John Fastabend.
5) Batch of misc updates to BPF selftests including fixing up test_align
to match verifier output again and moving it under test_progs, allowing
bpf_iter selftest to compile on machines with older vmlinux.h, and
updating config options for lirc and v6 segment routing helpers, from
Stanislav Fomichev, Andrii Nakryiko and Alan Maguire.
6) Conversion of BPF tracing samples outdated internal BPF loader to use
libbpf API instead, from Daniel T. Lee.
7) Follow-up to BPF kernel test infrastructure in order to fix a flake in
the XDP selftests, from Jesper Dangaard Brouer.
8) Minor improvements to libbpf's internal hashmap implementation, from
Ian Rogers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
test_lirc_mode2.sh assumes presence of /sys/class/rc/rc0/lirc*/uevent
which will not be present unless CONFIG_LIRC=y
Fixes: 6bdd533cee ("bpf: add selftest for lirc_mode2 type program")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1590147389-26482-3-git-send-email-alan.maguire@oracle.com
test_seg6_loop.o uses the helper bpf_lwt_seg6_adjust_srh();
it will not be present if CONFIG_IPV6_SEG6_BPF is not specified.
Fixes: b061017f8b ("selftests/bpf: add realistic loop tests")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1590147389-26482-2-git-send-email-alan.maguire@oracle.com
Getting a clean BPF selftests run involves ensuring latest trunk LLVM/clang
are used, pahole is recent (>=1.16) and config matches the specified
config file as closely as possible. Add to bpf_devel_QA.rst and point
tools/testing/selftests/bpf/README.rst to it.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/1590146674-25485-1-git-send-email-alan.maguire@oracle.com
Starting from iputils s20190709 (used in Fedora 31), arping does not
support timeout being specified as a decimal:
$ arping -c 1 -I swp1 -b 192.0.2.66 -q -w 0.1
arping: invalid argument: '0.1'
Previously, such timeouts were rounded to an integer.
Fix this by specifying the timeout as an integer.
Fixes: a5ee171d08 ("selftests: mlxsw: qos_mc_aware: Add a test for UC awareness")
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable is used by log_test() to check if the test case completely
successfully or not. In case it is not initialized at the start of a
test case, it is possible for the test case to fail despite not
encountering any errors.
Example:
```
...
TEST: Trap group statistics [ OK ]
TEST: Trap policer [FAIL]
Policer drop counter was not incremented
TEST: Trap policer binding [FAIL]
Policer drop counter was not incremented
```
Failure of trap_policer_test() caused trap_policer_bind_test() to fail
as well.
Fix by adding missing initialization of the variable.
Fixes: 5fbff58e27 ("selftests: netdevsim: Add test cases for devlink-trap policers")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To improve the usability of KUnit, defconfig is used
by default if no kunitconfig is present.
* https://bugzilla.kernel.org/show_bug.cgi?id=205259
Fixed up minor merge conflicts - Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Vitor Massaru Iha <vitor@massaru.org>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Daniel Borkmann says:
====================
pull-request: bpf 2020-05-22
The following pull-request contains BPF updates for your *net* tree.
We've added 3 non-merge commits during the last 3 day(s) which contain
a total of 5 files changed, 69 insertions(+), 11 deletions(-).
The main changes are:
1) Fix to reject mmap()'ing read-only array maps as writable since BPF verifier
relies on such map content to be frozen, from Andrii Nakryiko.
2) Fix breaking audit from secid_to_secctx() LSM hook by avoiding to use
call_int_hook() since this hook is not stackable, from KP Singh.
3) Fix BPF flow dissector program ref leak on netns cleanup, from Jakub Sitnicki.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit adds ipv4 and ipv6 fdb nexthop api tests to fib_nexthops.sh.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To make KUnit easier to use, and to avoid overwriting object and
.config files, the default KUnit build directory is set to .kunit
* Related bug: https://bugzilla.kernel.org/show_bug.cgi?id=205221
Fixed up minor merge conflicts - Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Vitor Massaru Iha <vitor@massaru.org>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
This can happen if a testing node doesn't have RTC (real time clock)
hardware or it doesn't support alarms.
Fixes: 61c5767603 ("selftests/timens: Add Time Namespace test for supported clocks")
Acked-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reported-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
TPM2 tests set uses /dev/tpm0 and /dev/tpmrm0 without check if they
are available. In case, when these devices are not available test
fails, but expected behaviour is skipped test.
Signed-off-by: Nikita Sobolev <Nikita.Sobolev@synopsys.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Provide a very basic selftest for getcpu() which similarly to our existing
test for gettimeofday() looks up the function in the vDSO and prints the
results it gets if the function exists and succeeds.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Both vdso_test_gettimeofday and vdso_standalone_test_x86 use the library in
parse_vdso.c but each separately declares the API it offers which is not
ideal. Create a header file with prototypes of the functions and use it in
both the library and the tests to ensure that the same prototypes are used
throughout.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Currently the vDSO kselftests have a test called vdso_test which tests
the vDSO implementation of gettimeofday(). In preparation for adding
tests for other vDSO functionality rename this test to reflect what's
going on.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Add a named pipe as an exec target to make sure that non-regular
files are rejected by execve() with EACCES. This can help verify
commit 73601ea5b7 ("fs/open.c: allow opening only regular files
during execve()").
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Adding a printk to test_sk_lookup_kern created the reported failure
where a pointer type is checked twice for NULL. Lets add it to the
progs test test_sk_lookup_kern.c so we test the case from C all the
way into the verifier.
We already have printk's in selftests so seems OK to add another one.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/159009170603.6313.1715279795045285176.stgit@john-Precision-5820-Tower
When we have pointer type that is known to be non-null we only follow
the non-null branch. This adds tests to cover the map_value pointer
returned from a map lookup. To force an error if both branches are
followed we do an ALU op on R10.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/159009168650.6313.7434084136067263554.stgit@john-Precision-5820-Tower
When we have pointer type that is known to be non-null and comparing
against zero we only follow the non-null branch. This adds tests to
cover this case for reference tracking. Also add the other case when
comparison against a non-zero value and ensure we still fail with
unreleased reference.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/159009166599.6313.1593680633787453767.stgit@john-Precision-5820-Tower
While working on commit b5372fe5dc ("exec: load_script: Do not exec
truncated interpreter path"), I wrote a series of test scripts to verify
corner cases. However, soon after, commit 6eb3c3d0a5 ("exec: increase
BINPRM_BUF_SIZE to 256") landed, resulting in the tests needing to be
refactored for the larger BINPRM_BUF_SIZE, which got lost on my TODO
list. During the recent exec refactoring work[1], the need for these tests
resurfaced, so I've finished them up for addition to the kernel selftests.
[1] https://lore.kernel.org/lkml/202005191144.E3112135@keescook/
Link: https://lkml.kernel.org/r/202005200204.D07DF079@keescook
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
gcc-10 switched to defaulting to -fno-common, which broke iproute2-5.4.
This was fixed in iproute-5.6, so switch to that. Because we're after a
stable testing surface, we generally don't like to bump these
unnecessarily, but in this case, being able to actually build is a basic
necessity.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As discussed in [0], it's dangerous to allow mapping BPF map, that's meant to
be frozen and is read-only on BPF program side, because that allows user-space
to actually store a writable view to the page even after it is frozen. This is
exacerbated by BPF verifier making a strong assumption that contents of such
frozen map will remain unchanged. To prevent this, disallow mapping
BPF_F_RDONLY_PROG mmap()'able BPF maps as writable, ever.
[0] https://lore.kernel.org/bpf/CAEf4BzYGWYhXdp6BJ7_=9OQPJxQpgug080MMjdSB72i9R+5c6g@mail.gmail.com/
Fixes: fc9702273e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY")
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/bpf/20200519053824.1089415-1-andriin@fb.com
The gen_kselftest_tar.sh always packages *all* selftests and doesn't
pass along any variables to `make install` to influence what should be
built. This can result in an early error on the command line ("Unknown
tarball format TARGETS=XXX"), or unexpected test failures as the
tarball contains tests people wanted to skip on purpose.
Since the makefile already contains all the logic, we can add a target
for packaging. Keep the default .gz target the script uses, and actually
extend the supported formats by using tar's autodetection.
To not break current workflows, keep the gen_kselftest_tar.sh script as
it is, with an added suggestion to use the makefile target instead.
Signed-off-by: Veronika Kabatova <vkabatov@redhat.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
b9f4c01f3e ("selftest/bpf: Make bpf_iter selftest compilable against old vmlinux.h")
missed the fact that bpf_iter_test_kern{3,4}.c are not just including
bpf_iter_test_kern_common.h and need similar bpf_iter_meta re-definition
explicitly.
Fixes: b9f4c01f3e ("selftest/bpf: Make bpf_iter selftest compilable against old vmlinux.h")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200519192341.134360-1-andriin@fb.com