Commit Graph

3746 Commits

Author SHA1 Message Date
Andrey Ignatov
6041c67f28 selftests/bpf: Test bpf_sysctl_get_name helper
Test w/ and w/o BPF_F_SYSCTL_BASE_NAME, buffers with enough space and
too small buffers to get E2BIG and truncated result, etc.

  # ./test_sysctl
  ...
  Test case: sysctl_get_name sysctl_value:base ok .. [PASS]
  Test case: sysctl_get_name sysctl_value:base E2BIG truncated .. [PASS]
  Test case: sysctl_get_name sysctl:full ok .. [PASS]
  Test case: sysctl_get_name sysctl:full E2BIG truncated .. [PASS]
  Test case: sysctl_get_name sysctl:full E2BIG truncated small .. [PASS]
  Summary: 11 PASSED, 0 FAILED

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12 13:54:58 -07:00
Andrey Ignatov
1f5fa9ab6e selftests/bpf: Test BPF_CGROUP_SYSCTL
Add unit test for BPF_PROG_TYPE_CGROUP_SYSCTL program type.

Test that program can allow/deny access.
Test both valid and invalid accesses to ctx->write.

Example of output:
  # ./test_sysctl
  Test case: sysctl wrong attach_type .. [PASS]
  Test case: sysctl:read allow all .. [PASS]
  Test case: sysctl:read deny all .. [PASS]
  Test case: ctx:write sysctl:read read ok .. [PASS]
  Test case: ctx:write sysctl:write read ok .. [PASS]
  Test case: ctx:write sysctl:read write reject .. [PASS]
  Summary: 6 PASSED, 0 FAILED

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12 13:54:58 -07:00
Andrey Ignatov
7007af63da selftests/bpf: Test sysctl section name
Add unit test to verify that program and attach types are properly
identified for "cgroup/sysctl" section name.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12 13:54:58 -07:00
David Ahern
56490b623a selftests: Add debugging options to pmtu.sh
pmtu.sh script runs a number of tests and dumps a summary of pass/fail.
If a test fails, it is near impossible to debug why. For example:

    TEST: ipv6: PMTU exceptions                       [FAIL]

There are a lot of commands run behind the scenes for this test. Which
one is failing?

Add a VERBOSE option to show commands that are run and any output from
those commands. Add a PAUSE_ON_FAIL option to halt the script if a test
fails allowing users to poke around with the setup in the failed state.

In the process, rename tracing to TRACING and move declaration to top
with the new variables.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 21:32:00 -07:00
David S. Miller
bb23581b9b Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2019-04-12

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Improve BPF verifier scalability for large programs through two
   optimizations: i) remove verifier states that are not useful in pruning,
   ii) stop walking parentage chain once first LIVE_READ is seen. Combined
   gives approx 20x speedup. Increase limits for accepting large programs
   under root, and add various stress tests, from Alexei.

2) Implement global data support in BPF. This enables static global variables
   for .data, .rodata and .bss sections to be properly handled which allows
   for more natural program development. This also opens up the possibility
   to optimize program workflow by compiling ELFs only once and later only
   rewriting section data before reload, from Daniel and with test cases and
   libbpf refactoring from Joe.

3) Add config option to generate BTF type info for vmlinux as part of the
   kernel build process. DWARF debug info is converted via pahole to BTF.
   Latter relies on libbpf and makes use of BTF deduplication algorithm which
   results in 100x savings compared to DWARF data. Resulting .BTF section is
   typically about 2MB in size, from Andrii.

4) Add BPF verifier support for stack access with variable offset from
   helpers and add various test cases along with it, from Andrey.

5) Extend bpf_skb_adjust_room() growth BPF helper to mark inner MAC header
   so that L2 encapsulation can be used for tc tunnels, from Alan.

6) Add support for input __sk_buff context in BPF_PROG_TEST_RUN so that
   users can define a subset of allowed __sk_buff fields that get fed into
   the test program, from Stanislav.

7) Add bpf fs multi-dimensional array tests for BTF test suite and fix up
   various UBSAN warnings in bpftool, from Yonghong.

8) Generate a pkg-config file for libbpf, from Luca.

9) Dump program's BTF id in bpftool, from Prashant.

10) libbpf fix to use smaller BPF log buffer size for AF_XDP's XDP
    program, from Magnus.

11) kallsyms related fixes for the case when symbols are not present in
    BPF selftests and samples, from Daniel
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 17:00:05 -07:00
Florian Westphal
26f7fe4a5d selftests: netfilter: add ebtables broute test case
ebtables -t broute allows to redirect packets in a way that
they get pushed up the stack, even if the interface is part
of a bridge.

In case of IP packets to non-local address, this means
those IP packets are routed instead of bridged-forwarded, just
as if the bridge would not have existed.

Expected test output is:
PASS: netns connectivity: ns1 and ns2 can reach each other
PASS: ns1/ns2 connectivity with active broute rule
PASS: ns1/ns2 connectivity with active broute rule and bridge forward drop

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-12 01:45:58 +02:00
David Ahern
a5f622984a selftests: fib_tests: Fix 'Command line is not complete' errors
A couple of tests are verifying a route has been removed. The helper
expects the prefix as the first part of the expected output. When
checking that a route has been deleted the prefix is empty leading
to an invalid ip command:

  $ ip ro ls match
  Command line is not complete. Try option "help"

Fix by moving the comparison of expected output and output to a new
function that is used by both check_route and check_route6. Use the
new helper for the 2 checks on route removal.

Also, remove the reset of 'set -x' in route_setup which overrides the
user managed setting.

Fixes: d69faad765 ("selftests: fib_tests: Add prefix route tests with metric")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 14:17:59 -07:00
Alan Maguire
3ec61df82b selftests_bpf: add L2 encap to test_tc_tunnel
Update test_tc_tunnel to verify adding inner L2 header
encapsulation (an MPLS label or ethernet header) works.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-11 22:50:57 +02:00
Alan Maguire
166b5a7f2c selftests_bpf: extend test_tc_tunnel for UDP encap
commit 868d523535 ("bpf: add bpf_skb_adjust_room encap flags")
introduced support to bpf_skb_adjust_room for GSO-friendly GRE
and UDP encapsulation and later introduced associated test_tc_tunnel
tests.  Here those tests are extended to cover UDP encapsulation also.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-11 22:50:56 +02:00
Vlad Buslov
9e35552ae1 net: sched: flower: use correct ht function to prevent duplicates
Implementation of function rhashtable_insert_fast() check if its internal
helper function __rhashtable_insert_fast() returns non-NULL pointer and
seemingly return -EEXIST in such case. However, since
__rhashtable_insert_fast() is called with NULL key pointer, it never
actually checks for duplicates, which means that -EEXIST is never returned
to the user. Use rhashtable_lookup_insert_fast() hash table API instead. In
order to verify that it works as expected and prevent the problem from
happening in future, extend tc-tests with new test that verifies that no
new filters with existing key can be inserted to flower classifier.

Fixes: 1f17f7742e ("net: sched: flower: insert filter to ht before offloading it to hw")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 11:33:06 -07:00
Stanislav Fomichev
3daf8e703e selftests: bpf: add selftest for __sk_buff context in BPF_PROG_TEST_RUN
Simple test that sets cb to {1,2,3,4,5} and priority to 6, runs bpf
program that fails if cb is not what we expect and increments cb[i] and
priority. When the test finishes, we check that cb is now {2,3,4,5,6}
and priority is 7.

We also test the sanity checks:
* ctx_in is provided, but ctx_size_in is zero (same for
  ctx_out/ctx_size_out)
* unexpected non-zero fields in __sk_buff return EINVAL

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-11 10:21:41 +02:00
Ido Schimmel
7052e24363 selftests: mlxsw: Test VRF MAC vetoing
Test that it is possible to set an IP address on a VRF and that it is
not vetoed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10 11:57:08 -07:00
Daniel Borkmann
c861168b7c bpf, selftest: add test cases for BTF Var and DataSec
Extend test_btf with various positive and negative tests around
BTF verification of kind Var and DataSec. All passing as well:

  # ./test_btf
  [...]
  BTF raw test[4] (global data test #1): OK
  BTF raw test[5] (global data test #2): OK
  BTF raw test[6] (global data test #3): OK
  BTF raw test[7] (global data test #4, unsupported linkage): OK
  BTF raw test[8] (global data test #5, invalid var type): OK
  BTF raw test[9] (global data test #6, invalid var type (fwd type)): OK
  BTF raw test[10] (global data test #7, invalid var type (fwd type)): OK
  BTF raw test[11] (global data test #8, invalid var size): OK
  BTF raw test[12] (global data test #9, invalid var size): OK
  BTF raw test[13] (global data test #10, invalid var size): OK
  BTF raw test[14] (global data test #11, multiple section members): OK
  BTF raw test[15] (global data test #12, invalid offset): OK
  BTF raw test[16] (global data test #13, invalid offset): OK
  BTF raw test[17] (global data test #14, invalid offset): OK
  BTF raw test[18] (global data test #15, not var kind): OK
  BTF raw test[19] (global data test #16, invalid var referencing sec): OK
  BTF raw test[20] (global data test #17, invalid var referencing var): OK
  BTF raw test[21] (global data test #18, invalid var loop): OK
  BTF raw test[22] (global data test #19, invalid var referencing var): OK
  BTF raw test[23] (global data test #20, invalid ptr referencing var): OK
  BTF raw test[24] (global data test #21, var included in struct): OK
  BTF raw test[25] (global data test #22, array of var): OK
  [...]
  PASS:167 SKIP:0 FAIL:0

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-09 17:05:47 -07:00
Joe Stringer
b915ebe6d9 bpf, selftest: test global data/bss/rodata sections
Add tests for libbpf relocation of static variable references
into the .data, .rodata and .bss sections of the ELF, also add
read-only test for .rodata. All passing:

  # ./test_progs
  [...]
  test_global_data:PASS:load program 0 nsec
  test_global_data:PASS:pass global data run 925 nsec
  test_global_data_number:PASS:relocate .bss reference 925 nsec
  test_global_data_number:PASS:relocate .data reference 925 nsec
  test_global_data_number:PASS:relocate .rodata reference 925 nsec
  test_global_data_number:PASS:relocate .bss reference 925 nsec
  test_global_data_number:PASS:relocate .data reference 925 nsec
  test_global_data_number:PASS:relocate .rodata reference 925 nsec
  test_global_data_number:PASS:relocate .bss reference 925 nsec
  test_global_data_number:PASS:relocate .bss reference 925 nsec
  test_global_data_number:PASS:relocate .rodata reference 925 nsec
  test_global_data_number:PASS:relocate .rodata reference 925 nsec
  test_global_data_number:PASS:relocate .rodata reference 925 nsec
  test_global_data_string:PASS:relocate .rodata reference 925 nsec
  test_global_data_string:PASS:relocate .data reference 925 nsec
  test_global_data_string:PASS:relocate .bss reference 925 nsec
  test_global_data_string:PASS:relocate .data reference 925 nsec
  test_global_data_string:PASS:relocate .bss reference 925 nsec
  test_global_data_struct:PASS:relocate .rodata reference 925 nsec
  test_global_data_struct:PASS:relocate .bss reference 925 nsec
  test_global_data_struct:PASS:relocate .rodata reference 925 nsec
  test_global_data_struct:PASS:relocate .data reference 925 nsec
  test_global_data_rdonly:PASS:test .rodata read-only map 925 nsec
  [...]
  Summary: 229 PASSED, 0 FAILED

Note map helper signatures have been changed to avoid warnings
when passing in const data.

Joint work with Daniel Borkmann.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-09 17:05:47 -07:00
Daniel Borkmann
fb2abb73e5 bpf, selftest: test {rd, wr}only flags and direct value access
Extend test_verifier with various test cases around the two kernel
extensions, that is, {rd,wr}only map support as well as direct map
value access. All passing, one skipped due to xskmap not present
on test machine:

  # ./test_verifier
  [...]
  #948/p XDP pkt read, pkt_meta' <= pkt_data, bad access 1 OK
  #949/p XDP pkt read, pkt_meta' <= pkt_data, bad access 2 OK
  #950/p XDP pkt read, pkt_data <= pkt_meta', good access OK
  #951/p XDP pkt read, pkt_data <= pkt_meta', bad access 1 OK
  #952/p XDP pkt read, pkt_data <= pkt_meta', bad access 2 OK
  Summary: 1410 PASSED, 1 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-09 17:05:47 -07:00
David S. Miller
310655b07a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-04-08 23:39:36 -07:00
Linus Torvalds
869e3305f2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Off by one and bounds checking fixes in NFC, from Dan Carpenter.

 2) There have been many weird regressions in r8169 since we turned ASPM
    support on, some are still not understood nor completely resolved.
    Let's turn this back off for now. From Heiner Kallweit.

 3) Signess fixes for ethtool speed value handling, from Michael
    Zhivich.

 4) Handle timestamps properly in macb driver, from Paul Thomas.

 5) Two erspan fixes, it's the usual "skb ->data potentially reallocated
    and we're holding a stale protocol header pointer". From Lorenzo
    Bianconi.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  bnxt_en: Reset device on RX buffer errors.
  bnxt_en: Improve RX consumer index validity check.
  net: macb driver, check for SKBTX_HW_TSTAMP
  qlogic: qlcnic: fix use of SPEED_UNKNOWN ethtool constant
  broadcom: tg3: fix use of SPEED_UNKNOWN ethtool constant
  ethtool: avoid signed-unsigned comparison in ethtool_validate_speed()
  net: ip6_gre: fix possible use-after-free in ip6erspan_rcv
  net: ip_gre: fix possible use-after-free in erspan_rcv
  r8169: disable ASPM again
  MAINTAINERS: ieee802154: update documentation file pattern
  net: vrf: Fix ping failed when vrf mtu is set to 0
  selftests: add a tc matchall test case
  nfc: nci: Potential off by one in ->pipes[] array
  NFC: nci: Add some bounds checking in nci_hci_cmd_received()
2019-04-08 17:10:46 -10:00
Tadeusz Struk
6da70580af selftests/tpm2: Open tpm dev in unbuffered mode
In order to have control over how many bytes are read or written
the device needs to be opened in unbuffered mode.

Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
2019-04-08 15:58:55 -07:00
Tadeusz Struk
f1a0ba6ccc selftests/tpm2: Extend tests to cover partial reads
Three new tests added:
1. Send get random cmd, read header in 1st read, read the rest in second
   read - expect success
2. Send get random cmd, read only part of the response, send another
   get random command, read the response - expect success
3. Send get random cmd followed by another get random cmd, without
   reading the first response - expect the second cmd to fail with -EBUSY

Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
2019-04-08 15:58:55 -07:00
David Ahern
228ddb3315 selftests: fib_tests: Add tests for ipv6 gateway with ipv4 route
Add tests for ipv6 gateway with ipv4 route. Tests include basic
single path with ping to verify connectivity and multipath.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-08 15:22:41 -07:00
Florian Westphal
6978cdb129 kselftests: extend nft_nat with inet family based nat hooks
With older nft versions, this will cause:
[..]
PASS: ipv6 ping to ns1 was ip6 NATted to ns2
/dev/stdin:4:30-31: Error: syntax error, unexpected to, expecting newline or semicolon
                ip daddr 10.0.1.99 dnat ip to 10.0.2.99
                                           ^^
SKIP: inet nat tests
PASS: ip IP masquerade for ns2
[..]

as there is currently no way to detect if nft will be able to parse
the inet format.

redirect and masquerade tests need to be skipped in this case for inet
too because nft userspace has overzealous family check and rejects their
use in the inet family.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08 23:03:04 +02:00
Dave Jiang
2170a0d53b tools/testing/nvdimm: Retain security state after overwrite
Overwrite retains the security state after completion of operation.  Fix
nfit_test to reflect this so that the kernel can test the behavior it is
more likely to see in practice.

Fixes: 926f74802c ("tools/testing/nvdimm: Add overwrite support for nfit_test")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-04-08 09:39:32 -07:00
Florian Westphal
4c145dce26 xfrm: make xfrm modes builtin
after previous changes, xfrm_mode contains no function pointers anymore
and all modules defining such struct contain no code except an init/exit
functions to register the xfrm_mode struct with the xfrm core.

Just place the xfrm modes core and remove the modules,
the run-time xfrm_mode register/unregister functionality is removed.

Before:

    text    data     bss      dec filename
    7523     200    2364    10087 net/xfrm/xfrm_input.o
   40003     628     440    41071 net/xfrm/xfrm_state.o
15730338 6937080 4046908 26714326 vmlinux

    7389     200    2364    9953  net/xfrm/xfrm_input.o
   40574     656     440   41670  net/xfrm/xfrm_state.o
15730084 6937068 4046908 26714060 vmlinux

The xfrm*_mode_{transport,tunnel,beet} modules are gone.

v2: replace CONFIG_INET6_XFRM_MODE_* IS_ENABLED guards with CONFIG_IPV6
    ones rather than removing them.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-04-08 09:15:17 +02:00
Nicolas Dichtel
b959ecf8f9 selftests: add a tc matchall test case
This is a follow up of the commit 0db6f8befc ("net/sched: fix ->get
helper of the matchall cls").

To test it:
$ cd tools/testing/selftests/tc-testing
$ ln -s ../plugin-lib/nsPlugin.py plugins/20-nsPlugin.py
$ ./tdc.py -n -e 2638

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-07 19:31:49 -07:00
Nikolay Aleksandrov
f1054c65bc selftests: forwarding: test for bridge mcast traffic after report and leave
This test is split in two, the first part checks if a report creates a
corresponding mdb entry and if traffic is properly forwarded to it, and
the second part checks if the mdb entry is deleted after a leave and
if traffic is *not* forwarded to it. Since the mcast querier is enabled
we should see standard mcast snooping bridge behaviour.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-06 18:30:04 -07:00
David S. Miller
f83f715195 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor comment merge conflict in mlx5.

Staging driver has a fixup due to the skb->xmit_more changes
in 'net-next', but was removed in 'net'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-05 14:14:19 -07:00
Andrey Ignatov
07f9196241 selftests/bpf: Test unbounded var_off stack access
Test the case when reg->smax_value is too small/big and can overflow,
and separately min and max values outside of stack bounds.

Example of output:
  # ./test_verifier
  #856/p indirect variable-offset stack access, unbounded OK
  #857/p indirect variable-offset stack access, max out of bound OK
  #858/p indirect variable-offset stack access, min out of bound OK

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-05 16:50:08 +02:00
Andrey Ignatov
2c6927dbdc selftests/bpf: Test indirect var_off stack access in unpriv mode
Test that verifier rejects indirect stack access with variable offset in
unprivileged mode and accepts same code in privileged mode.

Since pointer arithmetics is prohibited in unprivileged mode verifier
should reject the program even before it gets to helper call that uses
variable offset, at the time when that variable offset is trying to be
constructed.

Example of output:
  # ./test_verifier
  ...
  #859/u indirect variable-offset stack access, priv vs unpriv OK
  #859/p indirect variable-offset stack access, priv vs unpriv OK

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-05 16:50:08 +02:00
Andrey Ignatov
f68a5b4464 selftests/bpf: Test indirect var_off stack access in raw mode
Test that verifier rejects indirect access to uninitialized stack with
variable offset.

Example of output:
  # ./test_verifier
  ...
  #859/p indirect variable-offset stack access, uninitialized OK

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-05 16:50:07 +02:00
Linus Torvalds
0548740e53 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Several hash table refcount fixes in batman-adv, from Sven
    Eckelmann.

 2) Use after free in bpf_evict_inode(), from Daniel Borkmann.

 3) Fix mdio bus registration in ixgbe, from Ivan Vecera.

 4) Unbounded loop in __skb_try_recv_datagram(), from Paolo Abeni.

 5) ila rhashtable corruption fix from Herbert Xu.

 6) Don't allow upper-devices to be added to vrf devices, from Sabrina
    Dubroca.

 7) Add qmi_wwan device ID for Olicard 600, from Bjørn Mork.

 8) Don't leave skb->next poisoned in __netif_receive_skb_list_ptype,
    from Alexander Lobakin.

 9) Missing IDR checks in mlx5 driver, from Aditya Pakki.

10) Fix false connection termination in ktls, from Jakub Kicinski.

11) Work around some ASPM issues with r8169 by disabling rx interrupt
    coalescing on certain chips. From Heiner Kallweit.

12) Properly use per-cpu qstat values on NOLOCK qdiscs, from Paolo
    Abeni.

13) Fully initialize sockaddr_in structures in SCTP, from Xin Long.

14) Various BPF flow dissector fixes from Stanislav Fomichev.

15) Divide by zero in act_sample, from Davide Caratti.

16) Fix bridging multicast regression introduced by rhashtable
    conversion, from Nikolay Aleksandrov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
  ibmvnic: Fix completion structure initialization
  ipv6: sit: reset ip header pointer in ipip6_rcv
  net: bridge: always clear mcast matching struct on reports and leaves
  libcxgb: fix incorrect ppmax calculation
  vlan: conditional inclusion of FCoE hooks to match netdevice.h and bnx2x
  sch_cake: Make sure we can write the IP header before changing DSCP bits
  sch_cake: Use tc_skb_protocol() helper for getting packet protocol
  tcp: Ensure DCTCP reacts to losses
  net/sched: act_sample: fix divide by zero in the traffic path
  net: thunderx: fix NULL pointer dereference in nicvf_open/nicvf_stop
  net: hns: Fix sparse: some warnings in HNS drivers
  net: hns: Fix WARNING when remove HNS driver with SMMU enabled
  net: hns: fix ICMP6 neighbor solicitation messages discard problem
  net: hns: Fix probabilistic memory overwrite when HNS driver initialized
  net: hns: Use NAPI_POLL_WEIGHT for hns driver
  net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw()
  flow_dissector: rst'ify documentation
  ipv6: Fix dangling pointer when ipv6 fragment
  net-gro: Fix GRO flush when receiving a GSO packet.
  flow_dissector: document BPF flow dissector environment
  ...
2019-04-04 18:07:12 -10:00
David S. Miller
5ba5780117 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2019-04-04

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Batch of fixes to the existing BPF flow dissector API to support
   calling BPF programs from the eth_get_headlen context (support for
   latter is planned to be added in bpf-next), from Stanislav.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04 13:30:55 -07:00
Davide Caratti
fae2708174 net/sched: act_sample: fix divide by zero in the traffic path
the control path of 'sample' action does not validate the value of 'rate'
provided by the user, but then it uses it as divisor in the traffic path.
Validate it in tcf_sample_init(), and return -EINVAL with a proper extack
message in case that value is zero, to fix a splat with the script below:

 # tc f a dev test0 egress matchall action sample rate 0 group 1 index 2
 # tc -s a s action sample
 total acts 1

         action order 0: sample rate 1/0 group 1 pipe
          index 2 ref 1 bind 1 installed 19 sec used 19 sec
         Action statistics:
         Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
         backlog 0b 0p requeues 0
 # ping 192.0.2.1 -I test0 -c1 -q

 divide error: 0000 [#1] SMP PTI
 CPU: 1 PID: 6192 Comm: ping Not tainted 5.1.0-rc2.diag2+ #591
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_sample_act+0x9e/0x1e0 [act_sample]
 Code: 6a f1 85 c0 74 0d 80 3d 83 1a 00 00 00 0f 84 9c 00 00 00 4d 85 e4 0f 84 85 00 00 00 e8 9b d7 9c f1 44 8b 8b e0 00 00 00 31 d2 <41> f7 f1 85 d2 75 70 f6 85 83 00 00 00 10 48 8b 45 10 8b 88 08 01
 RSP: 0018:ffffae320190ba30 EFLAGS: 00010246
 RAX: 00000000b0677d21 RBX: ffff8af1ed9ec000 RCX: 0000000059a9fe49
 RDX: 0000000000000000 RSI: 000000000c7e33b7 RDI: ffff8af23daa0af0
 RBP: ffff8af1ee11b200 R08: 0000000074fcaf7e R09: 0000000000000000
 R10: 0000000000000050 R11: ffffffffb3088680 R12: ffff8af232307f80
 R13: 0000000000000003 R14: ffff8af1ed9ec000 R15: 0000000000000000
 FS:  00007fe9c6d2f740(0000) GS:ffff8af23da80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fff6772f000 CR3: 00000000746a2004 CR4: 00000000001606e0
 Call Trace:
  tcf_action_exec+0x7c/0x1c0
  tcf_classify+0x57/0x160
  __dev_queue_xmit+0x3dc/0xd10
  ip_finish_output2+0x257/0x6d0
  ip_output+0x75/0x280
  ip_send_skb+0x15/0x40
  raw_sendmsg+0xae3/0x1410
  sock_sendmsg+0x36/0x40
  __sys_sendto+0x10e/0x140
  __x64_sys_sendto+0x24/0x30
  do_syscall_64+0x60/0x210
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [...]
  Kernel panic - not syncing: Fatal exception in interrupt

Add a TDC selftest to document that 'rate' is now being validated.

Reported-by: Matteo Croce <mcroce@redhat.com>
Fixes: 5c5670fae4 ("net/sched: Introduce sample tc action")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Yotam Gigi <yotam.gi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04 10:46:33 -07:00
Daniel T. Lee
e67b2c7154 samples, selftests/bpf: add NULL check for ksym_search
Since, ksym_search added with verification logic for symbols existence,
it could return NULL when the kernel symbols are not loaded.

This commit will add NULL check logic after ksym_search.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-04 16:43:47 +02:00
Daniel T. Lee
0979ff7992 selftests/bpf: ksym_search won't check symbols exists
Currently, ksym_search located at trace_helpers won't check symbols are
existing or not.

In ksym_search, when symbol is not found, it will return &syms[0](_stext).
But when the kernel symbols are not loaded, it will return NULL, which is
not a desired action.

This commit will add verification logic whether symbols are loaded prior
to the symbol search.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-04 16:43:46 +02:00
Alexei Starovoitov
8aa2d4b4b9 selftests/bpf: synthetic tests to push verifier limits
Add a test to generate 1m ld_imm64 insns to stress the verifier.

Bump the size of fill_ld_abs_vlan_push_pop test from 4k to 29k
and jump_around_ld_abs from 4k to 5.5k.
Larger sizes are not possible due to 16-bit offset encoding
in jump instructions.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-04 01:27:38 +02:00
Alexei Starovoitov
e5e7a8f2d8 selftests/bpf: add few verifier scale tests
Add 3 basic tests that stress verifier scalability.

test_verif_scale1.c calls non-inlined jhash() function 90 times on
different position in the packet.
This test simulates network packet parsing.
jhash function is ~140 instructions and main program is ~1200 insns.

test_verif_scale2.c force inlines jhash() function 90 times.
This program is ~15k instructions long.

test_verif_scale3.c calls non-inlined jhash() function 90 times on
But this time jhash has to process 32-bytes from the packet
instead of 14-bytes in tests 1 and 2.
jhash function is ~230 insns and main program is ~1200 insns.

$ test_progs -s
can be used to see verifier stats.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-04 01:27:38 +02:00
Stanislav Fomichev
822fe61795 net/flow_dissector: pass flow_keys->n_proto to BPF programs
This is a preparation for the next commit that would prohibit access to
the most fields of __sk_buff from the BPF programs.

Instead of requiring BPF flow dissector programs to look into skb,
pass all input data in the flow_keys.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-03 16:49:48 +02:00
Stanislav Fomichev
2c3af7d901 selftests/bpf: fix vlan handling in flow dissector program
When we tail call PROG(VLAN) from parse_eth_proto we don't need to peek
back to handle vlan proto because we didn't adjust nhoff/thoff yet. Use
flow_keys->n_proto, that we set in parse_eth_proto instead and
properly increment nhoff as well.

Also, always use skb->protocol and don't look at skb->vlan_present.
skb->vlan_present indicates that vlan information is stored out-of-band
in skb->vlan_{tci,proto} and vlan header is already pulled from skb.
That means, skb->vlan_present == true is not relevant for BPF flow
dissector.

Add simple test cases with VLAN tagged frames:
  * single vlan for ipv4
  * double vlan for ipv6

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-03 16:49:48 +02:00
Stanislav Fomichev
7596aa3ea8 selftests: bpf: remove duplicate .flags initialization in ctx_skb.c
verifier/ctx_skb.c:708:11: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
        .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-02 23:17:18 +02:00
Stanislav Fomichev
a918b03e8c selftests: bpf: fix -Wformat-invalid-specifier for bpf_obj_id.c
Use standard C99 %zu for sizeof, not GCC's custom %Zu:
bpf_obj_id.c:76:48: warning: invalid conversion specifier 'Z'

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-02 23:17:18 +02:00
Stanislav Fomichev
94e8f3c712 selftests: bpf: fix -Wformat-security warning for flow_dissector_load.c
flow_dissector_load.c:55:19: warning: format string is not a string literal (potentially insecure)
      [-Wformat-security]
                error(1, errno, command);
                                ^~~~~~~
flow_dissector_load.c:55:19: note: treat the string as an argument to avoid this
                error(1, errno, command);
                                ^
                                "%s",
1 warning generated.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-02 23:17:18 +02:00
Stanislav Fomichev
6b7b6995c4 selftests: bpf: tests.h should depend on .c files, not the output
This makes sure we don't put headers as input files when doing
compilation, because clang complains about the following:

clang-9: error: cannot specify -o when generating multiple output files
../lib.mk:152: recipe for target 'xxx/tools/testing/selftests/bpf/test_verifier' failed
make: *** [xxx/tools/testing/selftests/bpf/test_verifier] Error 1
make: *** Waiting for unfinished jobs....
clang-9: error: cannot specify -o when generating multiple output files
../lib.mk:152: recipe for target 'xxx/tools/testing/selftests/bpf/test_progs' failed

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-02 23:17:18 +02:00
Yonghong Song
9de2640b06 bpf: add bpffs multi-dimensional array tests in test_btf
For multiple dimensional arrays like below,
  int a[2][3]
both llvm and pahole generated one BTF_KIND_ARRAY type like
  . element_type: int
  . index_type: unsigned int
  . number of elements: 6

Such a collapsed BTF_KIND_ARRAY type will cause the divergence
in BTF vs. the user code. In the compile-once-run-everywhere
project, the header file is generated from BTF and used for bpf
program, and the definition in the header file will be different
from what user expects.

But the kernel actually supports chained multi-dimensional array
types properly. The above "int a[2][3]" can be represented as
  Type #n:
    . element_type: int
    . index_type: unsigned int
    . number of elements: 3
  Type #(n+1):
    . element_type: type #n
    . index_type: unsigned int
    . number of elements: 2

The following llvm commit
  https://reviews.llvm.org/rL357215
also enables llvm to generated proper chained multi-dimensional arrays.

The test_btf already has a raw test ("struct test #1") for chained
multi-dimensional arrays. This patch added amended bpffs test for
chained multi-dimensional arrays.

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-01 15:41:05 +02:00
Linus Torvalds
63fc9c2348 A collection of x86 and ARM bugfixes, and some improvements to documentation.
On top of this, a cleanup of kvm_para.h headers, which were exported by
 some architectures even though they not support KVM at all.  This is
 responsible for all the Kbuild changes in the diffstat.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJcoM5VAAoJEL/70l94x66DU3EH/A8sYdsfeqALWElm2Sy9TYas
 mntz+oTWsl3vDy8s8zp1ET2NpF7oBlBEMmCWhVEJaD+1qW3VpTRAseR3Zr9ML9xD
 k+BQM8SKv47o86ZN+y4XALl30Ckb3DXh/X1xsrV5hF6J3ofC+Ce2tF560l8C9ygC
 WyHDxwNHMWVA/6TyW3mhunzuVKgZ/JND9+0zlyY1LKmUQ0BQLle23gseIhhI0YDm
 B4VGIYU2Mf8jCH5Ir3N/rQ8pLdo8U7f5P/MMfgXQafksvUHJBg6B6vOhLJh94dLh
 J2wixYp1zlT0drBBkvJ0jPZ75skooWWj0o3otEA7GNk/hRj6MTllgfL5SajTHZg=
 =/A7u
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "A collection of x86 and ARM bugfixes, and some improvements to
  documentation.

  On top of this, a cleanup of kvm_para.h headers, which were exported
  by some architectures even though they not support KVM at all. This is
  responsible for all the Kbuild changes in the diffstat"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
  Documentation: kvm: clarify KVM_SET_USER_MEMORY_REGION
  KVM: doc: Document the life cycle of a VM and its resources
  KVM: selftests: complete IO before migrating guest state
  KVM: selftests: disable stack protector for all KVM tests
  KVM: selftests: explicitly disable PIE for tests
  KVM: selftests: assert on exit reason in CR4/cpuid sync test
  KVM: x86: update %rip after emulating IO
  x86/kvm/hyper-v: avoid spurious pending stimer on vCPU init
  kvm/x86: Move MSR_IA32_ARCH_CAPABILITIES to array emulated_msrs
  KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts
  kvm: don't redefine flags as something else
  kvm: mmu: Used range based flushing in slot_handle_level_range
  KVM: export <linux/kvm_para.h> and <asm/kvm_para.h> iif KVM is supported
  KVM: x86: remove check on nr_mmu_pages in kvm_arch_commit_memory_region()
  kvm: nVMX: Add a vmentry check for HOST_SYSENTER_ESP and HOST_SYSENTER_EIP fields
  KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)
  KVM: Reject device ioctls from processes other than the VM's creator
  KVM: doc: Fix incorrect word ordering regarding supported use of APIs
  KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size'
  KVM: nVMX: Do not inherit quadrant and invalid for the root shadow EPT
  ...
2019-03-31 08:55:59 -07:00
Dave Jiang
037c8489ad libnvdimm/security: provide fix for secure-erase to use zero-key
Add a zero key in order to standardize hardware that want a key of 0's to
be passed. Some platforms defaults to a zero-key with security enabled
rather than allow the OS to enable the security. The zero key would allow
us to manage those platform as well. This also adds a fix to secure erase
so it can use the zero key to do crypto erase. Some other security commands
already use zero keys. This introduces a standard zero-key to allow
unification of semantics cross nvdimm security commands.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-03-30 08:26:37 -07:00
David S. Miller
22bdf7d459 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2019-03-29

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Bug fix in BTF deduplication that was mishandling an equivalence
   comparison, from Andrii.

2) libbpf Makefile fixes to properly link against libelf for the shared
   object and to actually export AF_XDP's xsk.h header, from Björn.

3) Fix use after free in bpf inode eviction, from Daniel.

4) Fix a bug in skb creation out of cpumap redirect, from Jesper.

5) Remove an unnecessary and triggerable WARN_ONCE() in max number
   of call stack frames checking in verifier, from Paul.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-29 21:00:28 -07:00
Andrey Ignatov
8ff80e96e3 selftests/bpf: Test variable offset stack access
Test different scenarios of indirect variable-offset stack access: out of
bound access (>0), min_off below initialized part of the stack,
max_off+size above initialized part of the stack, initialized stack.

Example of output:
  ...
  #856/p indirect variable-offset stack access, out of bound OK
  #857/p indirect variable-offset stack access, max_off+size > max_initialized OK
  #858/p indirect variable-offset stack access, min_off < min_initialized OK
  #859/p indirect variable-offset stack access, ok OK
  ...

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-29 12:05:35 -07:00
Dmytro Linkin
49b1b4a19c selftests: tc-testing: Add pedit tests
Add 36 pedit action tests to check pedit options described in tc-pedit(8)
man page. Test cases can be specified by categories: actions, pedit,
raw_op, layered_op. RAW_OP cases check offset option for u8, u16 and u32
offset size. LAYERED_OP cases check fields option for eth, ip, ip6,
tcp and udp headers.

Include following tests:
377e - Add pedit action with RAW_OP offset u32
a0ca - Add pedit action with RAW_OP offset u32 (INVALID)
dd8a - Add pedit action with RAW_OP offset u16 u16
53db - Add pedit action with RAW_OP offset u16 (INVALID)
5c7e - Add pedit action with RAW_OP offset u8 add value
2893 - Add pedit action with RAW_OP offset u8 quad
3a07 - Add pedit action with RAW_OP offset u8-u16-u8
ab0f - Add pedit action with RAW_OP offset u16-u8-u8
9d12 - Add pedit action with RAW_OP offset u32 set u16 clear u8 invert
ebfa - Add pedit action with RAW_OP offset overflow u32 (INVALID)
f512 - Add pedit action with RAW_OP offset u16 at offmask shift set
c2cb - Add pedit action with RAW_OP offset u32 retain value
86d4 - Add pedit action with LAYERED_OP eth set src & dst
c715 - Add pedit action with LAYERED_OP eth set src (INVALID)
ba22 - Add pedit action with LAYERED_OP eth type set/clear sequence
5810 - Add pedit action with LAYERED_OP ip set src & dst
1092 - Add pedit action with LAYERED_OP ip set ihl & dsfield
02d8 - Add pedit action with LAYERED_OP ip set ttl & protocol
3e2d - Add pedit action with LAYERED_OP ip set ttl (INVALID)
31ae - Add pedit action with LAYERED_OP ip ttl clear/set
486f - Add pedit action with LAYERED_OP ip set duplicate fields
e790 - Add pedit action with LAYERED_OP ip set ce, df, mf, firstfrag,
nofrag fields
6829 - Add pedit action with LAYERED_OP beyond ip set dport & sport
afd8 - Add pedit action with LAYERED_OP beyond ip set icmp_type &
icmp_code
3143 - Add pedit action with LAYERED_OP beyond ip set dport (INVALID)
fc1f - Add pedit action with LAYERED_OP ip6 set src & dst
6d34 - Add pedit action with LAYERED_OP ip6 dst retain value (INVALID)
6f5e - Add pedit action with LAYERED_OP ip6 flow_lbl
6795 - Add pedit action with LAYERED_OP ip6 set payload_len, nexthdr,
hoplimit
1442 - Add pedit action with LAYERED_OP tcp set dport & sport
b7ac - Add pedit action with LAYERED_OP tcp sport set (INVALID)
cfcc - Add pedit action with LAYERED_OP tcp flags set
3bc4 - Add pedit action with LAYERED_OP tcp set dport, sport & flags
fields
f1c8 - Add pedit action with LAYERED_OP udp set dport & sport
d784 - Add pedit action with mixed RAW/LAYERED_OP #1
70ca - Add pedit action with mixed RAW/LAYERED_OP #2

Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-29 11:15:20 -07:00
Petr Machata
30905dc63b selftests: mlxsw: Add a new test for strict priority
Test that when strict priority is configured on a system, the
higher-priority traffic does actually win all the available bandwidth.
The test uses a similar approach to qos_mc_aware.sh to run and account
the traffic.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-28 17:20:53 -07:00
Petr Machata
573363a68f selftests: mlxsw: Add qos_lib.sh
Extract reusable code from qos_mc_aware.sh and put into a new library.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-28 17:20:52 -07:00
Petr Machata
5dde21b3a7 selftests: mlxsw: qos_mc_aware: Configure shared buffers
This test runs two streams of traffic from two independent ports to
create congestion on one egress port. It is necessary to configure the
shared buffer thresholds correctly, to make sure that there is traffic
from both streams in the shared buffer. Only then can the test actually
test prioritization among these streams.

Without this configuration, it is possible, that one of the streams
takes all of port-pool quota, and the other stream is not even admitted,
thus invalidating the result.

On Spectrum-1, this is not a problem, because MC traffic uses a separate
pool. But for Spectrum-2, MC and UC share the same pool, and the correct
configuration is important.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-28 17:20:52 -07:00
Petr Machata
d04cc726c8 selftests: forwarding: devlink_lib: Add shared buffer helpers
Add helpers to obtain, set, and restore a pool size, and a port-pool and
tc-pool threshold.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-28 17:20:52 -07:00
Petr Machata
8e46aee697 selftests: forwarding: devlink_lib: Simplify deduction of DEVLINK_DEV
Use devlink -j and jq for more accurate querying. Use cut -f-2 instead
of rev-cut-rev combo.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-28 17:20:52 -07:00
Petr Machata
2cca8751af selftests: forwarding: devlink_lib: Avoid double sourcing of lib.sh
Don't source lib.sh twice and make the script work with ifnames passed
on the command line.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-28 17:20:52 -07:00
Danielle Ratson
2fcbc0b15e selftests: forwarding: Test action VLAN modify
Construct a basic topology consisting of two hosts connected using a
VLAN-aware bridge. Put each port in a different VLAN and test that ping
fails.

Add ingress and egress filters with a VLAN modify action and test that
ping passes.

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-28 17:20:52 -07:00
Amit Cohen
0637e1f878 selftests: forwarding: Add PCP match and VLAN match tests
Send packets with VLAN and PCP set and check that TC flower filters can
match on these keys.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-28 17:20:52 -07:00
Ido Schimmel
ca059af852 selftests: forwarding: Add reverse path forwarding (RPF) test cases
In case a packet is routed using a multicast route whose specified
ingress interface does not match the interface from which the packet was
received, the packet is dropped.

Add IPv4 and IPv6 test cases for above mentioned scenario.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-28 17:20:52 -07:00
Sean Christopherson
0f73bbc851 KVM: selftests: complete IO before migrating guest state
Documentation/virtual/kvm/api.txt states:

  NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and
        KVM_EXIT_EPR the corresponding operations are complete (and guest
        state is consistent) only after userspace has re-entered the
        kernel with KVM_RUN.  The kernel side will first finish incomplete
        operations and then check for pending signals.  Userspace can
        re-enter the guest with an unmasked signal pending to complete
        pending operations.

Because guest state may be inconsistent, starting state migration after
an IO exit without first completing IO may result in test failures, e.g.
a proposed change to KVM's handling of %rip in its fast PIO handling[1]
will cause the new VM, i.e. the post-migration VM, to have its %rip set
to the IN instruction that triggered KVM_EXIT_IO, leading to a test
assertion due to a stage mismatch.

For simplicitly, require KVM_CAP_IMMEDIATE_EXIT to complete IO and skip
the test if it's not available.  The addition of KVM_CAP_IMMEDIATE_EXIT
predates the state selftest by more than a year.

[1] https://patchwork.kernel.org/patch/10848545/

Fixes: fa3899add1 ("kvm: selftests: add basic test for state save and restore")
Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28 17:29:09 +01:00
Sean Christopherson
ffac839d04 KVM: selftests: disable stack protector for all KVM tests
Since 4.8.3, gcc has enabled -fstack-protector by default.  This is
problematic for the KVM selftests as they do not configure fs or gs
segments (the stack canary is pulled from fs:0x28).  With the default
behavior, gcc will insert a stack canary on any function that creates
buffers of 8 bytes or more.  As a result, ucall() will hit a triple
fault shutdown due to reading a bad fs segment when inserting its
stack canary, i.e. every test fails with an unexpected SHUTDOWN.

Fixes: 14c47b7530 ("kvm: selftests: introduce ucall")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28 17:29:08 +01:00
Sean Christopherson
0a3f29b5a7 KVM: selftests: explicitly disable PIE for tests
KVM selftests embed the guest "image" as a function in the test itself
and extract the guest code at runtime by manually parsing the elf
headers.  The parsing is very simple and doesn't supporting fancy things
like position independent executables.  Recent versions of gcc enable
pie by default, which results in triple fault shutdowns in the guest due
to the virtual address in the headers not matching up with the virtual
address retrieved from the function pointer.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28 17:29:07 +01:00
Sean Christopherson
8df98ae0ab KVM: selftests: assert on exit reason in CR4/cpuid sync test
...so that the test doesn't end up in an infinite loop if it fails for
whatever reason, e.g. SHUTDOWN due to gcc inserting stack canary code
into ucall() and attempting to derefence a null segment.

Fixes: ca35906688 ("kvm: selftests: add cr4_cpuid_sync_test")
Cc: Wei Huang <wei@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28 17:29:05 +01:00
David S. Miller
356d71e00d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-03-27 17:37:58 -07:00
Linus Torvalds
1a9df9e29c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Fixes here and there, a couple new device IDs, as usual:

   1) Fix BQL race in dpaa2-eth driver, from Ioana Ciornei.

   2) Fix 64-bit division in iwlwifi, from Arnd Bergmann.

   3) Fix documentation for some eBPF helpers, from Quentin Monnet.

   4) Some UAPI bpf header sync with tools, also from Quentin Monnet.

   5) Set descriptor ownership bit at the right time for jumbo frames in
      stmmac driver, from Aaro Koskinen.

   6) Set IFF_UP properly in tun driver, from Eric Dumazet.

   7) Fix load/store doubleword instruction generation in powerpc eBPF
      JIT, from Naveen N. Rao.

   8) nla_nest_start() return value checks all over, from Kangjie Lu.

   9) Fix asoc_id handling in SCTP after the SCTP_*_ASSOC changes this
      merge window. From Marcelo Ricardo Leitner and Xin Long.

  10) Fix memory corruption with large MTUs in stmmac, from Aaro
      Koskinen.

  11) Do not use ipv4 header for ipv6 flows in TCP and DCCP, from Eric
      Dumazet.

  12) Fix topology subscription cancellation in tipc, from Erik Hugne.

  13) Memory leak in genetlink error path, from Yue Haibing.

  14) Valid control actions properly in packet scheduler, from Davide
      Caratti.

  15) Even if we get EEXIST, we still need to rehash if a shrink was
      delayed. From Herbert Xu.

  16) Fix interrupt mask handling in interrupt handler of r8169, from
      Heiner Kallweit.

  17) Fix leak in ehea driver, from Wen Yang"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (168 commits)
  dpaa2-eth: fix race condition with bql frame accounting
  chelsio: use BUG() instead of BUG_ON(1)
  net: devlink: skip info_get op call if it is not defined in dumpit
  net: phy: bcm54xx: Encode link speed and activity into LEDs
  tipc: change to check tipc_own_id to return in tipc_net_stop
  net: usb: aqc111: Extend HWID table by QNAP device
  net: sched: Kconfig: update reference link for PIE
  net: dsa: qca8k: extend slave-bus implementations
  net: dsa: qca8k: remove leftover phy accessors
  dt-bindings: net: dsa: qca8k: support internal mdio-bus
  dt-bindings: net: dsa: qca8k: fix example
  net: phy: don't clear BMCR in genphy_soft_reset
  bpf, libbpf: clarify bump in libbpf version info
  bpf, libbpf: fix version info and add it to shared object
  rxrpc: avoid clang -Wuninitialized warning
  tipc: tipc clang warning
  net: sched: fix cleanup NULL pointer exception in act_mirr
  r8169: fix cable re-plugging issue
  net: ethernet: ti: fix possible object reference leak
  net: ibm: fix possible object reference leak
  ...
2019-03-27 12:22:57 -07:00
Andrii Nakryiko
eb76899ce7 selftests/bpf: add btf_dedup test for VOID equivalence check
This patch adds specific test exposing bug in btf_dedup_is_equiv() when
comparing candidate VOID type to a non-VOID canonical type. It's
important for canonical type to be anonymous, otherwise name equality
check will do the right thing and will exit early.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-27 08:01:25 -07:00
Paul Chaignon
cabacfbbe5 selftests/bpf: test case for invalid call stack in dead code
This patch adds a test case with an excessive number of call stack frames
in dead code.

Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
Tested-by: Xiao Han <xiao.han@orange.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-26 13:02:16 -07:00
Stanislav Fomichev
b4b6aa8343 selftests: bpf: don't depend on hardcoded perf sample_freq
When running stacktrace_build_id_nmi, try to query
kernel.perf_event_max_sample_rate sysctl and use it as a sample_freq.
If there was an error reading sysctl, fallback to 5000.

kernel.perf_event_max_sample_rate sysctl can drift and/or can be
adjusted by the perf tool, so assuming a fixed number might be
problematic on a long running machine.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-26 12:47:41 -07:00
Alan Maguire
0c4ea7f87a bpf: test_tc_tunnel.sh needs reverse path filtering disabled
test_tc_tunnel.sh sets up a pair of namespaces connected by a
veth pair to verify encap/decap using bpf_skb_adjust_room.  In
testing this, it uses tunnel links as the peer of the bpf-based
encap/decap.  However because the same IP header is used for inner
and outer IP, when packets arrive at the tunnel interface they will
be dropped by reverse path filtering as those packets are expected
on the veth interface (where the destination IP of the decapped
packet is configured).

To avoid this, ensure reverse path filtering is disabled for the
namespace using tunneling.

Fixes: 98cdabcd07 ("selftests/bpf: bpf tunnel encap test")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-25 15:01:54 +01:00
David S. Miller
27602e2c44 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2019-03-24

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) libbpf verision fix up from Daniel.

2) fix liveness propagation from Jakub.

3) fix verbose print of refcounted regs from Martin.

4) fix for large map allocations from Martynas.

5) fix use after free in sanitize_ptr_alu from Xu.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-24 23:45:35 -04:00
Peter Oskolkov
7df5e3db8f selftests: bpf: tc-bpf flow shaping with EDT
Add a small test that shows how to shape a TCP flow in tc-bpf
with EDT and ECN.

Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22 18:16:44 -07:00
Willem de Bruijn
75a1a9fa2e selftests/bpf: convert bpf tunnel test to encap modes
Make the tests correctly annotate skbs with tunnel metadata.

This makes the gso tests succeed. Enable them.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22 13:52:45 -07:00
Willem de Bruijn
94f16813e1 selftests/bpf: convert bpf tunnel test to BPF_F_ADJ_ROOM_FIXED_GSO
Lower route MTU to ensure packets fit in device MTU after encap, then
skip the gso_size changes.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22 13:52:45 -07:00
Willem de Bruijn
005edd1656 selftests/bpf: convert bpf tunnel test to BPF_ADJ_ROOM_MAC
Avoid moving the network layer header when prefixing tunnel headers.

This avoids an explicit call to bpf_skb_store_bytes and an implicit
move of the network header bytes in bpf_skb_adjust_room.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22 13:52:45 -07:00
Willem de Bruijn
8142958954 selftests/bpf: extend bpf tunnel test with tso
Segmentation offload takes a longer path. Verify that the feature
works with large packets.

The test succeeds if not setting dodgy in bpf_skb_adjust_room, as veth
TSO is permissive.

If not setting SKB_GSO_DODGY, this enables tunneled TSO offload on
supporting NICs.

The feature sets SKB_GSO_DODGY because the caller is untrusted. As a
result the packets traverse through the gso stack at least up to TCP.
And fail the gso_type validation, such as the skb->encapsulation check
in gre_gso_segment and the gso_type checks introduced in commit
418e897e07 ("gso: validate gso_type on ipip style tunnel").

This will be addressed in a follow-on feature patch. In the meantime,
disable the new gso tests.

Changes v1->v2:
  - not all netcat versions support flag '-q', use timeout instead

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22 13:52:44 -07:00
Willem de Bruijn
7255fade7b selftests/bpf: extend bpf tunnel test with gre
GRE is a commonly used protocol. Add GRE cases for both IPv4 and IPv6.

It also inserts different sized headers, which can expose some
unexpected edge cases.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22 13:52:44 -07:00
Willem de Bruijn
ef81bd0549 selftests/bpf: expand bpf tunnel test to ipv6
The test only uses ipv4 so far, expand to ipv6.
This is mostly a boilerplate near copy of the ipv4 path.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22 13:52:44 -07:00
Willem de Bruijn
ccd34cd357 selftests/bpf: expand bpf tunnel test with decap
The bpf tunnel test encapsulates using bpf, then decapsulates using
a standard tunnel device to verify correctness.

Once encap is verified, also test decap, by replacing the tunnel
device on decap with another bpf program.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22 13:52:44 -07:00
Willem de Bruijn
98cdabcd07 selftests/bpf: bpf tunnel encap test
Validate basic tunnel encapsulation using ipip.

Set up two namespaces connected by veth. Connect a client and server.
Do this with and without bpf encap.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22 13:52:44 -07:00
Jakub Kicinski
83d163124c bpf: verifier: propagate liveness on all frames
Commit 7640ead939 ("bpf: verifier: make sure callees don't prune
with caller differences") connected up parentage chains of all
frames of the stack.  It didn't, however, ensure propagate_liveness()
propagates all liveness information along those chains.

This means pruning happening in the callee may generate explored
states with incomplete liveness for the chains in lower frames
of the stack.

The included selftest is similar to the prior one from commit
7640ead939 ("bpf: verifier: make sure callees don't prune with
caller differences"), where callee would prune regardless of the
difference in r8 state.

Now we also initialize r9 to 0 or 1 based on a result from get_random().
r9 is never read so the walk with r9 = 0 gets pruned (correctly) after
the walk with r9 = 1 completes.

The selftest is so arranged that the pruning will happen in the
callee.  Since callee does not propagate read marks of r8, the
explored state at the pruning point prior to the callee will
now ignore r8.

Propagate liveness on all frames of the stack when pruning.

Fixes: f4d7e40a5b ("bpf: introduce function calls (verification)")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21 19:57:02 -07:00
Ivan Vecera
f682752627 selftests: bpf: modify urandom_read and link it non-statically
After some experiences I found that urandom_read does not need to be
linked statically. When the 'read' syscall call is moved to separate
non-inlined function then bpf_get_stackid() is able to find
the executable in stack trace and extract its build_id from it.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21 19:37:30 -07:00
Lorenz Bauer
bafc0ba826 selftests/bpf: add tests for bpf_tcp_check_syncookie and bpf_skc_lookup_tcp
Add tests which verify that the new helpers work for both IPv4 and
IPv6, by forcing SYN cookies to always on. Use a new network namespace
to avoid clobbering the global SYN cookie settings.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21 18:59:11 -07:00
Lorenz Bauer
5792d52df1 selftests/bpf: test references to sock_common
Make sure that returning a struct sock_common * reference invokes
the reference tracking machinery in the verifier.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21 18:59:11 -07:00
Lorenz Bauer
dbaf2877e9 selftests/bpf: allow specifying helper for BPF_SK_LOOKUP
Make the BPF_SK_LOOKUP macro take a helper function, to ease
writing tests for new helpers.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21 18:59:11 -07:00
Davide Caratti
7e0c8892df net/sched: act_vlan: validate the control action inside init()
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action vlan pop pass index 90
 # tc actions replace action vlan \
 > pop goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action vlan

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: vlan  pop goto chain 42
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 800000007974f067 P4D 800000007974f067 PUD 79638067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.0.0-rc4.gotochain_crash+ #536
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffff982dfdb83be0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff982dfc55db00 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffff982df97099c0 RDI: ffff982dfc55db00
 RBP: ffff982dfdb83c80 R08: ffff982df983fec8 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff982df5aacd00
 R13: ffff982df5aacd08 R14: 0000000000000001 R15: ffff982df97099c0
 FS:  0000000000000000(0000) GS:ffff982dfdb80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000000796d0005 CR4: 00000000001606e0
 Call Trace:
  <IRQ>
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ip6_finish_output2+0x369/0x590
  ip6_finish_output2+0x369/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.35+0x79/0xc0
  mld_sendpack+0x16f/0x220
  mld_ifc_timer_expire+0x195/0x2c0
  ? igmp6_timer_handler+0x70/0x70
  call_timer_fn+0x2b/0x130
  run_timer_softirq+0x3e8/0x440
  ? enqueue_hrtimer+0x39/0x90
  __do_softirq+0xe3/0x2f5
  irq_exit+0xf0/0x100
  smp_apic_timer_interrupt+0x6c/0x130
  apic_timer_interrupt+0xf/0x20
  </IRQ>
 RIP: 0010:native_safe_halt+0x2/0x10
 Code: 7b ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
 RSP: 0018:ffffa4714038feb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
 RAX: ffffffff840184f0 RBX: 0000000000000003 RCX: 0000000000000000
 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000001e57d3f387
 RBP: 0000000000000003 R08: 001125d9ca39e1eb R09: 0000000000000000
 R10: 000000000000027d R11: 000000000009f400 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  ? __sched_text_end+0x1/0x1
  default_idle+0x1c/0x140
  do_idle+0x1c4/0x280
  cpu_startup_entry+0x19/0x20
  start_secondary+0x1a7/0x200
  secondary_startup_64+0xa4/0xb0
 Modules linked in: act_vlan veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 snd_hda_codec_generic mbcache crct10dif_pclmul jbd2 snd_hda_intel crc32_pclmul snd_hda_codec ghash_clmulni_intel snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper joydev snd_timer virtio_balloon snd pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt virtio_net fb_sys_fops virtio_blk ttm net_failover virtio_console failover ata_piix drm libata crc32c_intel virtio_pci serio_raw virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_vlan_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f4 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:42 -07:00
Davide Caratti
e5fdabacbf net/sched: act_tunnel_key: validate the control action inside init()
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action tunnel_key set src_ip 10.10.10.1 dst_ip 20.20.2 dst_port 3128 \
 > nocsum id 1 pass index 90
 # tc actions replace action tunnel_key \
 > set src_ip 10.10.10.1 dst_ip 20.20.2 dst_port 3128 nocsum id 1 \
 > goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action tunnel_key

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: tunnel_key  set
         src_ip 10.10.10.1
         dst_ip 20.20.2.0
         key_id 1
         dst_port 3128
         nocsum goto chain 42
          index 90 ref 2 bind 1
         cookie c1a0c1a0

then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 800000002aba4067 P4D 800000002aba4067 PUD 795f9067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.0.0-rc4.gotochain_crash+ #536
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffff9346bdb83be0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff9346bb795c00 RCX: 0000000000000002
 RDX: 0000000000000000 RSI: ffff93466c881700 RDI: 0000000000000246
 RBP: ffff9346bdb83c80 R08: ffff9346b3e1e0c8 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9346b978f000
 R13: ffff9346b978f008 R14: 0000000000000001 R15: ffff93466dceeb40
 FS:  0000000000000000(0000) GS:ffff9346bdb80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000007a6c2002 CR4: 00000000001606e0
 Call Trace:
  <IRQ>
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ip6_finish_output2+0x369/0x590
  ip6_finish_output2+0x369/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.35+0x79/0xc0
  mld_sendpack+0x16f/0x220
  mld_ifc_timer_expire+0x195/0x2c0
  ? igmp6_timer_handler+0x70/0x70
  call_timer_fn+0x2b/0x130
  run_timer_softirq+0x3e8/0x440
  ? tick_sched_timer+0x37/0x70
  __do_softirq+0xe3/0x2f5
  irq_exit+0xf0/0x100
  smp_apic_timer_interrupt+0x6c/0x130
  apic_timer_interrupt+0xf/0x20
  </IRQ>
 RIP: 0010:native_safe_halt+0x2/0x10
 Code: 55 ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
 RSP: 0018:ffffa48a8038feb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
 RAX: ffffffffaa8184f0 RBX: 0000000000000003 RCX: 0000000000000000
 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000003
 RBP: 0000000000000003 R08: 0011251c6fcfac49 R09: ffff9346b995be00
 R10: ffffa48a805e7ce8 R11: 00000000024c38dd R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  ? __sched_text_end+0x1/0x1
  default_idle+0x1c/0x140
  do_idle+0x1c4/0x280
  cpu_startup_entry+0x19/0x20
  start_secondary+0x1a7/0x200
  secondary_startup_64+0xa4/0xb0
 Modules linked in: act_tunnel_key veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel mbcache snd_hda_intel jbd2 snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper joydev snd_timer snd pcspkr virtio_balloon soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt fb_sys_fops ttm net_failover virtio_console virtio_blk failover drm serio_raw crc32c_intel ata_piix virtio_pci floppy virtio_ring libata virtio dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_tunnel_key_init() proved to fix
the above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f4 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:42 -07:00
Davide Caratti
7c3d825d12 net/sched: act_skbmod: validate the control action inside init()
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action skbmod set smac 00:c1:a0:c1:a0:00 pass index 90
 # tc actions replace action skbmod \
 > set smac 00:c1:a0:c1:a0:00 goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action skbmod

had the following output:

 src MAC address <00:c1:a0:c1:a0:00>
 src MAC address <00:c1:a0:c1:a0:00>
 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: skbmod goto chain 42 set smac 00:c1:a0:c1:a0:00
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 800000002d5c7067 P4D 800000002d5c7067 PUD 77e16067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.0.0-rc4.gotochain_crash+ #536
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffff8987ffd83be0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff8987aeb68800 RCX: ffff8987fa263640
 RDX: 0000000000000000 RSI: ffff8987f51c8802 RDI: 00000000000000a0
 RBP: ffff8987ffd83c80 R08: ffff8987f939bac8 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8987f5c77d00
 R13: ffff8987f5c77d08 R14: 0000000000000001 R15: ffff8987f0c29f00
 FS:  0000000000000000(0000) GS:ffff8987ffd80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000007832c004 CR4: 00000000001606e0
 Call Trace:
  <IRQ>
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ip6_finish_output2+0x369/0x590
  ip6_finish_output2+0x369/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.35+0x79/0xc0
  mld_sendpack+0x16f/0x220
  mld_ifc_timer_expire+0x195/0x2c0
  ? igmp6_timer_handler+0x70/0x70
  call_timer_fn+0x2b/0x130
  run_timer_softirq+0x3e8/0x440
  ? tick_sched_timer+0x37/0x70
  __do_softirq+0xe3/0x2f5
  irq_exit+0xf0/0x100
  smp_apic_timer_interrupt+0x6c/0x130
  apic_timer_interrupt+0xf/0x20
  </IRQ>
 RIP: 0010:native_safe_halt+0x2/0x10
 Code: 56 ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
 RSP: 0018:ffffa2a1c038feb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
 RAX: ffffffffa94184f0 RBX: 0000000000000003 RCX: 0000000000000001
 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000003
 RBP: 0000000000000003 R08: 001123cfc2ba71ac R09: 0000000000000000
 R10: 0000000000000000 R11: 00000000000f4240 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  ? __sched_text_end+0x1/0x1
  default_idle+0x1c/0x140
  do_idle+0x1c4/0x280
  cpu_startup_entry+0x19/0x20
  start_secondary+0x1a7/0x200
  secondary_startup_64+0xa4/0xb0
 Modules linked in: act_skbmod veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mbcache jbd2 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device aesni_intel crypto_simd cryptd glue_helper snd_pcm joydev pcspkr virtio_balloon snd_timer snd i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt fb_sys_fops net_failover virtio_console ttm virtio_blk failover drm crc32c_intel serio_raw ata_piix virtio_pci libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_skbmod_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f4 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:42 -07:00
Davide Caratti
ec7727bb24 net/sched: act_skbedit: validate the control action inside init()
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action skbedit ptype host pass index 90
 # tc actions replace action skbedit \
 > ptype host goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action skbedit

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: skbedit  ptype host goto chain 42
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP PTI
 CPU: 3 PID: 3467 Comm: kworker/3:3 Not tainted 5.0.0-rc4.gotochain_crash+ #536
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Workqueue: ipv6_addrconf addrconf_dad_work
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffffb50a81e1fad0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff9aa47ba4ea00 RCX: 0000000000000001
 RDX: 0000000000000000 RSI: ffff9aa469eeb3c0 RDI: ffff9aa47ba4ea00
 RBP: ffffb50a81e1fb70 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: ffff9aa47bce0638 R12: ffff9aa4793b0c00
 R13: ffff9aa4793b0c08 R14: 0000000000000001 R15: ffff9aa469eeb3c0
 FS:  0000000000000000(0000) GS:ffff9aa474780000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000007360e005 CR4: 00000000001606e0
 Call Trace:
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ndisc_next_option+0x50/0x50
  ? ___neigh_create+0x4d5/0x680
  ? ip6_finish_output2+0x1b5/0x590
  ip6_finish_output2+0x1b5/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.28+0x79/0xc0
  ndisc_send_skb+0x248/0x2e0
  ndisc_send_ns+0xf8/0x200
  ? addrconf_dad_work+0x389/0x4b0
  addrconf_dad_work+0x389/0x4b0
  ? __switch_to_asm+0x34/0x70
  ? process_one_work+0x195/0x380
  ? addrconf_dad_completed+0x370/0x370
  process_one_work+0x195/0x380
  worker_thread+0x30/0x390
  ? process_one_work+0x380/0x380
  kthread+0x113/0x130
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
 Modules linked in: act_skbedit veth ip6table_filter ip6_tables iptable_filter binfmt_misc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ext4 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep mbcache snd_hda_core jbd2 snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd snd_timer glue_helper snd joydev soundcore pcspkr virtio_balloon i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_net net_failover drm failover virtio_blk virtio_console ata_piix virtio_pci crc32c_intel serio_raw libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_skbedit_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f4 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:41 -07:00
Davide Caratti
4b006b0c13 net/sched: act_simple: validate the control action inside init()
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action simple sdata hello pass index 90
 # tc actions replace action simple \
 > sdata world goto chain 42 index 90 cookie c1a0c1a0
 # tc action show action simple

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: Simple <world>
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 800000006a6fb067 P4D 800000006a6fb067 PUD 6aed6067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 2 PID: 3241 Comm: kworker/2:0 Not tainted 5.0.0-rc4.gotochain_crash+ #536
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Workqueue: ipv6_addrconf addrconf_dad_work
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffffbe6781763ad0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff9e59bdb80e00 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffff9e59b4716738 RDI: ffff9e59ab12d140
 RBP: ffffbe6781763b70 R08: 0000000000000234 R09: 0000000000aaaaaa
 R10: 0000000000000000 R11: ffff9e59b247cd50 R12: ffff9e59b112f100
 R13: ffff9e59b112f108 R14: 0000000000000001 R15: ffff9e59ab12d0c0
 FS:  0000000000000000(0000) GS:ffff9e59b4700000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000006af92004 CR4: 00000000001606e0
 Call Trace:
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ndisc_next_option+0x50/0x50
  ? ___neigh_create+0x4d5/0x680
  ? ip6_finish_output2+0x1b5/0x590
  ip6_finish_output2+0x1b5/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.28+0x79/0xc0
  ndisc_send_skb+0x248/0x2e0
  ndisc_send_ns+0xf8/0x200
  ? addrconf_dad_work+0x389/0x4b0
  addrconf_dad_work+0x389/0x4b0
  ? __switch_to_asm+0x34/0x70
  ? process_one_work+0x195/0x380
  ? addrconf_dad_completed+0x370/0x370
  process_one_work+0x195/0x380
  worker_thread+0x30/0x390
  ? process_one_work+0x380/0x380
  kthread+0x113/0x130
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
 Modules linked in: act_simple veth ip6table_filter ip6_tables iptable_filter binfmt_misc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ext4 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep mbcache snd_hda_core jbd2 snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd snd_timer glue_helper snd joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio_net ttm net_failover virtio_console virtio_blk failover drm crc32c_intel serio_raw floppy ata_piix libata virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_simple_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f4 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:41 -07:00
Davide Caratti
e8c87c643e net/sched: act_sample: validate the control action inside init()
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action sample rate 1024 group 4 pass index 90
 # tc actions replace action sample \
 > rate 1024 group 4 goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action sample

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: sample rate 1/1024 group 4 goto chain 42
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 8000000079966067 P4D 8000000079966067 PUD 7987b067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.0.0-rc4.gotochain_crash+ #536
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Workqueue: ipv6_addrconf addrconf_dad_work
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffffbee60033fad0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff99d7ae6e3b00 RCX: 00000000e555df9b
 RDX: 0000000000000000 RSI: 00000000b0352718 RDI: ffff99d7fda1fcf0
 RBP: ffffbee60033fb70 R08: 0000000070731ab1 R09: 0000000000000400
 R10: 0000000000000000 R11: ffff99d7ac733838 R12: ffff99d7f3c2be00
 R13: ffff99d7f3c2be08 R14: 0000000000000001 R15: ffff99d7f3c2b600
 FS:  0000000000000000(0000) GS:ffff99d7fda00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000000797de006 CR4: 00000000001606f0
 Call Trace:
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ndisc_next_option+0x50/0x50
  ? ___neigh_create+0x4d5/0x680
  ? ip6_finish_output2+0x1b5/0x590
  ip6_finish_output2+0x1b5/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.28+0x79/0xc0
  ndisc_send_skb+0x248/0x2e0
  ndisc_send_ns+0xf8/0x200
  ? addrconf_dad_work+0x389/0x4b0
  addrconf_dad_work+0x389/0x4b0
  ? __switch_to_asm+0x34/0x70
  ? process_one_work+0x195/0x380
  ? addrconf_dad_completed+0x370/0x370
  process_one_work+0x195/0x380
  worker_thread+0x30/0x390
  ? process_one_work+0x380/0x380
  kthread+0x113/0x130
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
 Modules linked in: act_sample psample veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mbcache jbd2 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device aesni_intel crypto_simd snd_pcm cryptd glue_helper snd_timer joydev snd pcspkr virtio_balloon i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt fb_sys_fops net_failover ttm failover virtio_blk virtio_console drm ata_piix serio_raw crc32c_intel libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_sample_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f4 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:41 -07:00
Davide Caratti
d6124d6ba6 net/sched: act_police: validate the control action inside init()
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action police rate 3mbit burst 250k pass index 90
 # tc actions replace action police \
 > rate 3mbit burst 250k goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action police rate 3mbit burst

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0:  police 0x5a rate 3Mbit burst 250Kb mtu 2Kb  action goto chain 42 overhead 0b
         ref 2 bind 1
         cookie c1a0c1a0

Then, when crash0 starts transmitting more than 3Mbit/s, the following
kernel crash is observed:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 800000007a779067 P4D 800000007a779067 PUD 2ad96067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 3 PID: 5032 Comm: netperf Not tainted 5.0.0-rc4.gotochain_crash+ #533
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffffb0e04064fa60 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff93bb3322cce0 RCX: 0000000000000005
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff93bb3322cce0
 RBP: ffffb0e04064fb00 R08: 0000000000000022 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000001 R12: ffff93bb3beed300
 R13: ffff93bb3beed308 R14: 0000000000000001 R15: ffff93bb3b64d000
 FS:  00007f0bc6be5740(0000) GS:ffff93bb3db80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000000746a8001 CR4: 00000000001606e0
 Call Trace:
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ipt_do_table+0x31c/0x420 [ip_tables]
  ? ip_finish_output2+0x16f/0x430
  ip_finish_output2+0x16f/0x430
  ? ip_output+0x69/0xe0
  ip_output+0x69/0xe0
  ? ip_forward_options+0x1a0/0x1a0
  __tcp_transmit_skb+0x563/0xa40
  tcp_write_xmit+0x243/0xfa0
  __tcp_push_pending_frames+0x32/0xf0
  tcp_sendmsg_locked+0x404/0xd30
  tcp_sendmsg+0x27/0x40
  sock_sendmsg+0x36/0x40
  __sys_sendto+0x10e/0x140
  ? __sys_connect+0x87/0xf0
  ? syscall_trace_enter+0x1df/0x2e0
  ? __audit_syscall_exit+0x216/0x260
  __x64_sys_sendto+0x24/0x30
  do_syscall_64+0x5b/0x180
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7f0bc5ffbafd
 Code: 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 8b 05 ae c4 2c 00 85 c0 75 2d 45 31 c9 45 31 c0 4c 63 d1 48 63 ff b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 63 63 2c 00 f7 d8 64 89 02 48
 RSP: 002b:00007fffef94b7f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
 RAX: ffffffffffffffda RBX: 0000000000004000 RCX: 00007f0bc5ffbafd
 RDX: 0000000000004000 RSI: 00000000017e5420 RDI: 0000000000000004
 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004
 R13: 00000000017e51d0 R14: 0000000000000010 R15: 0000000000000006
 Modules linked in: act_police veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 snd_hda_codec_generic mbcache crct10dif_pclmul jbd2 crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper snd_timer snd joydev pcspkr virtio_balloon soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_blk virtio_net virtio_console net_failover failover crc32c_intel ata_piix libata serio_raw virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_police_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f4 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:41 -07:00
Davide Caratti
6ac86ca352 net/sched: act_pedit: validate the control action inside init()
the following script:

 # tc filter add dev crash0 egress matchall \
 > action pedit ex munge ip ttl set 10 pass index 90
 # tc actions replace action pedit \
 > ex munge ip ttl set 10 goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action pedit

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0:  pedit action goto chain 42 keys 1
          index 90 ref 2 bind 1
          key #0  at ipv4+8: val 0a000000 mask 00ffffff
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP PTI
 CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffff94a73db03be0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff94a6ee4c0700 RCX: 000000000000000a
 RDX: 0000000000000000 RSI: ffff94a6ed22c800 RDI: 0000000000000000
 RBP: ffff94a73db03c80 R08: ffff94a7386fa4c8 R09: ffff94a73229ea20
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff94a6ed22cb00
 R13: ffff94a6ed22cb08 R14: 0000000000000001 R15: ffff94a6ed22c800
 FS:  0000000000000000(0000) GS:ffff94a73db00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000007120e002 CR4: 00000000001606e0
 Call Trace:
  <IRQ>
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ip6_finish_output2+0x369/0x590
  ip6_finish_output2+0x369/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.35+0x79/0xc0
  mld_sendpack+0x16f/0x220
  mld_ifc_timer_expire+0x195/0x2c0
  ? igmp6_timer_handler+0x70/0x70
  call_timer_fn+0x2b/0x130
  run_timer_softirq+0x3e8/0x440
  ? tick_sched_timer+0x37/0x70
  __do_softirq+0xe3/0x2f5
  irq_exit+0xf0/0x100
  smp_apic_timer_interrupt+0x6c/0x130
  apic_timer_interrupt+0xf/0x20
  </IRQ>
 RIP: 0010:native_safe_halt+0x2/0x10
 Code: 4e ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
 RSP: 0018:ffffab1740387eb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
 RAX: ffffffffb18184f0 RBX: 0000000000000002 RCX: 0000000000000001
 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000002
 RBP: 0000000000000002 R08: 000f168fa695f9a9 R09: 0000000000000020
 R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  ? __sched_text_end+0x1/0x1
  default_idle+0x1c/0x140
  do_idle+0x1c4/0x280
  cpu_startup_entry+0x19/0x20
  start_secondary+0x1a7/0x200
  secondary_startup_64+0xa4/0xb0
 Modules linked in: act_pedit veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 mbcache jbd2 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep aesni_intel snd_hda_core crypto_simd snd_seq cryptd glue_helper snd_seq_device snd_pcm joydev snd_timer pcspkr virtio_balloon snd soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs qxl ata_generic pata_acpi drm_kms_helper virtio_net net_failover syscopyarea sysfillrect sysimgblt failover virtio_blk fb_sys_fops virtio_console ttm drm crc32c_intel serio_raw ata_piix virtio_pci libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_pedit_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f4 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:41 -07:00
Davide Caratti
1e45d043a8 net/sched: act_nat: validate the control action inside init()
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action nat ingress 1.18.1.1 1.18.2.2 pass index 90
 # tc actions replace action nat \
 > ingress 1.18.1.1 1.18.2.2 goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action nat

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0:  nat ingress 1.18.1.1/32 1.18.2.2 goto chain 42
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 800000002d180067 P4D 800000002d180067 PUD 7cb8b067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 3 PID: 164 Comm: kworker/3:1 Not tainted 5.0.0-rc4.gotochain_crash+ #533
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Workqueue: ipv6_addrconf addrconf_dad_work
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffffae4500e2fad0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff9fa52e28c800 RCX: 0000000001011201
 RDX: 0000000000000000 RSI: 0000000000000056 RDI: ffff9fa52ca12800
 RBP: ffffae4500e2fb70 R08: 0000000000000022 R09: 000000000000000e
 R10: 00000000ffffffff R11: 0000000001011201 R12: ffff9fa52cbc9c00
 R13: ffff9fa52cbc9c08 R14: 0000000000000001 R15: ffff9fa52ca12780
 FS:  0000000000000000(0000) GS:ffff9fa57db80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 0000000073f8c004 CR4: 00000000001606e0
 Call Trace:
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ndisc_next_option+0x50/0x50
  ? ___neigh_create+0x4d5/0x680
  ? ip6_finish_output2+0x1b5/0x590
  ip6_finish_output2+0x1b5/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.28+0x79/0xc0
  ndisc_send_skb+0x248/0x2e0
  ndisc_send_ns+0xf8/0x200
  ? addrconf_dad_work+0x389/0x4b0
  addrconf_dad_work+0x389/0x4b0
  ? __switch_to_asm+0x34/0x70
  ? process_one_work+0x195/0x380
  ? addrconf_dad_completed+0x370/0x370
  process_one_work+0x195/0x380
  worker_thread+0x30/0x390
  ? process_one_work+0x380/0x380
  kthread+0x113/0x130
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
 Modules linked in: act_nat veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mbcache jbd2 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper snd_timer snd joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs qxl ata_generic pata_acpi drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_net virtio_blk net_failover failover virtio_console drm crc32c_intel floppy ata_piix libata virtio_pci virtio_ring virtio serio_raw dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_nat_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f4 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:41 -07:00
Davide Caratti
c53075ea5d net/sched: act_connmark: validate the control action inside init()
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action connmark pass index 90
 # tc actions replace action connmark \
 > goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action connmark

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: connmark zone 0 goto chain 42
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP PTI
 CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Workqueue: ipv6_addrconf addrconf_dad_work
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
 RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
 R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
 R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
 FS:  0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
 Call Trace:
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ndisc_next_option+0x50/0x50
  ? ___neigh_create+0x4d5/0x680
  ? ip6_finish_output2+0x1b5/0x590
  ip6_finish_output2+0x1b5/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.28+0x79/0xc0
  ndisc_send_skb+0x248/0x2e0
  ndisc_send_ns+0xf8/0x200
  ? addrconf_dad_work+0x389/0x4b0
  addrconf_dad_work+0x389/0x4b0
  ? __switch_to_asm+0x34/0x70
  ? process_one_work+0x195/0x380
  ? addrconf_dad_completed+0x370/0x370
  process_one_work+0x195/0x380
  worker_thread+0x30/0x390
  ? process_one_work+0x380/0x380
  kthread+0x113/0x130
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
 Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f4 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:41 -07:00
Davide Caratti
ff9721d32b net/sched: act_mirred: validate the control action inside init()
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action mirred ingress mirror dev lo pass
 # tc actions replace action mirred \
 > ingress mirror dev lo goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action mirred

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: mirred (Ingress Mirror to device lo) goto chain 42
         index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 Mirror/redirect action on
 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP PTI
 CPU: 3 PID: 47 Comm: kworker/3:1 Not tainted 5.0.0-rc4.gotochain_crash+ #533
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Workqueue: ipv6_addrconf addrconf_dad_work
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffffa772404b7ad0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff9c5afc3f4300 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffff9c5afdba9380 RDI: 0000000000029380
 RBP: ffffa772404b7b70 R08: ffff9c5af7010028 R09: ffff9c5af7010029
 R10: 0000000000000000 R11: ffff9c5af94c6a38 R12: ffff9c5af7953000
 R13: ffff9c5af7953008 R14: 0000000000000001 R15: ffff9c5af7953d00
 FS:  0000000000000000(0000) GS:ffff9c5afdb80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000007c514004 CR4: 00000000001606e0
 Call Trace:
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ndisc_next_option+0x50/0x50
  ? ___neigh_create+0x4d5/0x680
  ? ip6_finish_output2+0x1b5/0x590
  ip6_finish_output2+0x1b5/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.28+0x79/0xc0
  ndisc_send_skb+0x248/0x2e0
  ndisc_send_ns+0xf8/0x200
  ? addrconf_dad_work+0x389/0x4b0
  addrconf_dad_work+0x389/0x4b0
  ? __switch_to_asm+0x34/0x70
  ? process_one_work+0x195/0x380
  ? addrconf_dad_completed+0x370/0x370
  process_one_work+0x195/0x380
  worker_thread+0x30/0x390
  ? process_one_work+0x380/0x380
  kthread+0x113/0x130
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
 Modules linked in: act_mirred veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul snd_hda_codec_generic crc32_pclmul snd_hda_intel snd_hda_codec mbcache ghash_clmulni_intel jbd2 snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer snd crypto_simd cryptd glue_helper soundcore virtio_balloon joydev pcspkr i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio_net ttm virtio_blk net_failover virtio_console failover drm ata_piix crc32c_intel virtio_pci serio_raw libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_mirred_init() proved to fix the
above issue. For the same reason, postpone the assignment of tcfa_action
and tcfm_eaction to avoid partial reconfiguration of a mirred rule when
it's replaced by another one that mirrors to a device that does not
exist. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f4 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:41 -07:00
Davide Caratti
11a94d7fd8 net/sched: act_ife: validate the control action inside init()
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action ife encode allow mark pass index 90
 # tc actions replace action ife \
 > encode allow mark goto chain 42 index 90 cookie c1a0c1a0
 # tc action show action ife

had the following output:

 IFE type 0xED3E
 IFE type 0xED3E
 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: ife encode action goto chain 42 type 0XED3E
         allow mark
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 800000007b4e7067 P4D 800000007b4e7067 PUD 7b4e6067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 2 PID: 164 Comm: kworker/2:1 Not tainted 5.0.0-rc4.gotochain_crash+ #533
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Workqueue: ipv6_addrconf addrconf_dad_work
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffffa6a7c0553ad0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff9796ee1bbd00 RCX: 0000000000000001
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: ffffa6a7c0553b70 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: ffff9797385bb038 R12: ffff9796ead9d700
 R13: ffff9796ead9d708 R14: 0000000000000001 R15: ffff9796ead9d800
 FS:  0000000000000000(0000) GS:ffff97973db00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000007c41e006 CR4: 00000000001606e0
 Call Trace:
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ndisc_next_option+0x50/0x50
  ? ___neigh_create+0x4d5/0x680
  ? ip6_finish_output2+0x1b5/0x590
  ip6_finish_output2+0x1b5/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.28+0x79/0xc0
  ndisc_send_skb+0x248/0x2e0
  ndisc_send_ns+0xf8/0x200
  ? addrconf_dad_work+0x389/0x4b0
  addrconf_dad_work+0x389/0x4b0
  ? __switch_to_asm+0x34/0x70
  ? process_one_work+0x195/0x380
  ? addrconf_dad_completed+0x370/0x370
  process_one_work+0x195/0x380
  worker_thread+0x30/0x390
  ? process_one_work+0x380/0x380
  kthread+0x113/0x130
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
 Modules linked in: act_gact act_meta_mark act_ife dummy veth ip6table_filter ip6_tables iptable_filter binfmt_misc snd_hda_codec_generic ext4 snd_hda_intel snd_hda_codec crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hwdep snd_hda_core ghash_clmulni_intel snd_seq snd_seq_device snd_pcm snd_timer aesni_intel crypto_simd snd cryptd glue_helper virtio_balloon joydev pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl virtio_net drm_kms_helper virtio_blk net_failover syscopyarea failover sysfillrect virtio_console sysimgblt fb_sys_fops ttm drm crc32c_intel serio_raw ata_piix virtio_pci virtio_ring libata virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_ife]
 CR2: 0000000000000000

Validating the control action within tcf_ife_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f4 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:41 -07:00
Davide Caratti
0da2dbd602 net/sched: act_gact: validate the control action inside init()
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action gact pass index 90
 # tc actions replace action gact \
 > goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action gact

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: gact action goto chain 42
          random type none pass val 0
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP PTI
 CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffff8c2434703be0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff8c23ed6d7e00 RCX: 000000000000005a
 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8c23ed6d7e00
 RBP: ffff8c2434703c80 R08: ffff8c243b639ac8 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c2429e68b00
 R13: ffff8c2429e68b08 R14: 0000000000000001 R15: ffff8c2429c5a480
 FS:  0000000000000000(0000) GS:ffff8c2434700000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000002dc0e005 CR4: 00000000001606e0
 Call Trace:
  <IRQ>
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ip6_finish_output2+0x369/0x590
  ip6_finish_output2+0x369/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.35+0x79/0xc0
  mld_sendpack+0x16f/0x220
  mld_ifc_timer_expire+0x195/0x2c0
  ? igmp6_timer_handler+0x70/0x70
  call_timer_fn+0x2b/0x130
  run_timer_softirq+0x3e8/0x440
  ? tick_sched_timer+0x37/0x70
  __do_softirq+0xe3/0x2f5
  irq_exit+0xf0/0x100
  smp_apic_timer_interrupt+0x6c/0x130
  apic_timer_interrupt+0xf/0x20
  </IRQ>
 RIP: 0010:native_safe_halt+0x2/0x10
 Code: 74 ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
 RSP: 0018:ffff9c8640387eb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
 RAX: ffffffff8b2184f0 RBX: 0000000000000002 RCX: 0000000000000001
 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000002
 RBP: 0000000000000002 R08: 000eb57882b36cc3 R09: 0000000000000020
 R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  ? __sched_text_end+0x1/0x1
  default_idle+0x1c/0x140
  do_idle+0x1c4/0x280
  cpu_startup_entry+0x19/0x20
  start_secondary+0x1a7/0x200
  secondary_startup_64+0xa4/0xb0
 Modules linked in: act_gact act_bpf veth ip6table_filter ip6_tables iptable_filter binfmt_misc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic ext4 snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core mbcache jbd2 snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper virtio_balloon joydev pcspkr snd_timer snd i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea virtio_net sysfillrect net_failover virtio_blk sysimgblt fb_sys_fops virtio_console ttm failover drm crc32c_intel serio_raw ata_piix libata floppy virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_bpf]
 CR2: 0000000000000000

Validating the control action within tcf_gact_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f4 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:41 -07:00
Davide Caratti
f5c29d8386 net/sched: act_csum: validate the control action inside init()
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall action csum icmp pass index 90
 # tc actions replace action csum icmp goto chain 42 index 90 \
 > cookie c1a0c1a0
 # tc actions show action csum

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

        action order 0: csum (icmp) action goto chain 42
        index 90 ref 2 bind 1
        cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 8000000074692067 P4D 8000000074692067 PUD 2e210067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.0.0-rc4.gotochain_crash+ #533
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffff93153da03be0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff9314ee40f700 RCX: 0000000000003a00
 RDX: 0000000000000000 RSI: ffff931537c87828 RDI: ffff931537c87818
 RBP: ffff93153da03c80 R08: 00000000527cffff R09: 0000000000000003
 R10: 000000000000003f R11: 0000000000000028 R12: ffff9314edf68400
 R13: ffff9314edf68408 R14: 0000000000000001 R15: ffff9314ed67b600
 FS:  0000000000000000(0000) GS:ffff93153da00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 0000000073e32003 CR4: 00000000001606f0
 Call Trace:
  <IRQ>
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ip6_finish_output2+0x369/0x590
  ip6_finish_output2+0x369/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.35+0x79/0xc0
  mld_sendpack+0x16f/0x220
  mld_ifc_timer_expire+0x195/0x2c0
  ? igmp6_timer_handler+0x70/0x70
  call_timer_fn+0x2b/0x130
  run_timer_softirq+0x3e8/0x440
  ? tick_sched_timer+0x37/0x70
  __do_softirq+0xe3/0x2f5
  irq_exit+0xf0/0x100
  smp_apic_timer_interrupt+0x6c/0x130
  apic_timer_interrupt+0xf/0x20
  </IRQ>
 RIP: 0010:native_safe_halt+0x2/0x10
 Code: 66 ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
 RSP: 0018:ffffffff9a803e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
 RAX: ffffffff99e184f0 RBX: 0000000000000000 RCX: 0000000000000001
 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000000
 RBP: 0000000000000000 R08: 000eb5c4572376b3 R09: 0000000000000000
 R10: ffffa53e806a3ca0 R11: 00000000000f4240 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  ? __sched_text_end+0x1/0x1
  default_idle+0x1c/0x140
  do_idle+0x1c4/0x280
  cpu_startup_entry+0x19/0x20
  start_kernel+0x49e/0x4be
  secondary_startup_64+0xa4/0xb0
 Modules linked in: act_csum veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel mbcache snd_hda_codec jbd2 snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd snd_timer glue_helper snd joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt net_failover fb_sys_fops virtio_console virtio_blk ttm failover drm ata_piix crc32c_intel floppy virtio_pci serio_raw libata virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_csum_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f4 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:41 -07:00
Davide Caratti
4e1810049c net/sched: act_bpf: validate the control action inside init()
the following script:

 # tc filter add dev crash0 egress matchall \
 > action bpf bytecode '1,6 0 0 4294967295' pass index 90
 # tc actions replace action bpf \
 > bytecode '1,6 0 0 4294967295' goto chain 42 index 90 cookie c1a0c1a0
 # tc action show action bpf

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: bpf bytecode '1,6 0 0 4294967295' default-action goto chain 42
         index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffffb3a0803dfa90 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff942b347ada00 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffffb3a08034d038 RDI: ffff942b347ada00
 RBP: ffffb3a0803dfb30 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: ffffb3a0803dfb0c R12: ffff942b3b682b00
 R13: ffff942b3b682b08 R14: 0000000000000001 R15: ffff942b3b682f00
 FS:  00007f6160a72740(0000) GS:ffff942b3da80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000000795a4002 CR4: 00000000001606e0
 Call Trace:
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ip_finish_output2+0x16f/0x430
  ip_finish_output2+0x16f/0x430
  ? ip_output+0x69/0xe0
  ip_output+0x69/0xe0
  ? ip_forward_options+0x1a0/0x1a0
  ip_send_skb+0x15/0x40
  raw_sendmsg+0x8e1/0xbd0
  ? sched_clock+0x5/0x10
  ? sched_clock_cpu+0xc/0xa0
  ? try_to_wake_up+0x54/0x480
  ? ldsem_down_read+0x3f/0x280
  ? _cond_resched+0x15/0x40
  ? down_read+0xe/0x30
  ? copy_termios+0x1e/0x70
  ? tty_mode_ioctl+0x1b6/0x4c0
  ? sock_sendmsg+0x36/0x40
  sock_sendmsg+0x36/0x40
  __sys_sendto+0x10e/0x140
  ? do_vfs_ioctl+0xa4/0x640
  ? handle_mm_fault+0xdc/0x210
  ? syscall_trace_enter+0x1df/0x2e0
  ? __audit_syscall_exit+0x216/0x260
  __x64_sys_sendto+0x24/0x30
  do_syscall_64+0x5b/0x180
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7f615f7e3c03
 Code: 48 8b 0d 90 62 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 9d c3 2c 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 4b cc 00 00 48 89 04 24
 RSP: 002b:00007ffee5d8cc28 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
 RAX: ffffffffffffffda RBX: 000055a4f28f1700 RCX: 00007f615f7e3c03
 RDX: 0000000000000040 RSI: 000055a4f28f1700 RDI: 0000000000000003
 RBP: 00007ffee5d8e340 R08: 000055a4f28ee510 R09: 0000000000000010
 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
 R13: 000055a4f28f16c0 R14: 000055a4f28ef69c R15: 0000000000000080
 Modules linked in: act_bpf veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 mbcache crct10dif_pclmul jbd2 crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper pcspkr joydev virtio_balloon snd_timer snd i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_blk virtio_net virtio_console net_failover failover syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel ata_piix serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_bpf_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f4 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-21 13:26:41 -07:00
Adrian Ratiu
48e5d98a0e selftests/bpf: Add arm target register definitions
eBPF "restricted C" code can be compiled with LLVM/clang using target
triplets like armv7l-unknown-linux-gnueabihf and loaded/run with small
cross-compiled gobpf/elf [1] programs without requiring a full BCC
port which is also undesirable on small embedded systems due to its
size footprint. The only missing pieces are these helper macros which
otherwise have to be redefined by each eBPF arm program.

[1] https://github.com/iovisor/gobpf/tree/master/elf

Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-20 20:28:58 -07:00
Linus Torvalds
a9dce6679d pidfd patches for v5.1-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE7btrcuORLb1XUhEwjrBW1T7ssS0FAlx+nn4ACgkQjrBW1T7s
 sS2kwg//aJUCwLIhV91gXUFN2jHTCf0/+5fnigEk7JhAT5wmAykxLM8tprLlIlyp
 HtwNQx54hq/6p010Ulo9K50VS6JRii+2lNSpC6IkqXXdHXXm0ViH+5I9Nru8SVJ+
 avRCYWNjW9Gn1EtcB2yv6KP3XffgnQ6ZLIr4QJwglOxgAqUaWZ68woSUlrIR5yFj
 j48wAxjsC3g2qwGLvXPeiwYZHwk6VnYmrZ3eWXPDthWRDC4zkjyBdchZZzFJagSC
 6sX8T9s5ua5juZMokEJaWjuBQQyfg0NYu41hupSdVjV7/0D3E+5/DiReInvLmSup
 63bZ85uKRqWTNgl4cmJ1W3aVe2RYYemMZCXVVYYvU+IKpvTSzzYY7us+FyMAIRUV
 bT+XPGzTWcGrChzv9bHZcBrkL91XGqyxRJz56jLl6EhRtqxmzmywf6mO6pS2WK4N
 r+aBDgXeJbG39KguCzwUgVX8hC6YlSxSP8Md+2sK+UoAdfTUvFtdCYnjhuACofCt
 saRvDIPF8N9qn4Ch3InzCKkrUTL/H3BZKBl2jo6tYQ9smUsFZW7lQoip5Ui/0VS+
 qksJ91djOc9facGoOorPazojY5fO5Lj3Hg+cGIoxUV0jPH483z7hWH0ALynb0f6z
 EDsgNyEUpIO2nJMJJfm37ysbU/j1gOpzQdaAEaWeknwtfecFPzM=
 =yOWp
 -----END PGP SIGNATURE-----

Merge tag 'pidfd-v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull pidfd system call from Christian Brauner:
 "This introduces the ability to use file descriptors from /proc/<pid>/
  as stable handles on struct pid. Even if a pid is recycled the handle
  will not change. For a start these fds can be used to send signals to
  the processes they refer to.

  With the ability to use /proc/<pid> fds as stable handles on struct
  pid we can fix a long-standing issue where after a process has exited
  its pid can be reused by another process. If a caller sends a signal
  to a reused pid it will end up signaling the wrong process.

  With this patchset we enable a variety of use cases. One obvious
  example is that we can now safely delegate an important part of
  process management - sending signals - to processes other than the
  parent of a given process by sending file descriptors around via scm
  rights and not fearing that the given process will have been recycled
  in the meantime. It also allows for easy testing whether a given
  process is still alive or not by sending signal 0 to a pidfd which is
  quite handy.

  There has been some interest in this feature e.g. from systems
  management (systemd, glibc) and container managers. I have requested
  and gotten comments from glibc to make sure that this syscall is
  suitable for their needs as well. In the future I expect it to take on
  most other pid-based signal syscalls. But such features are left for
  the future once they are needed.

  This has been sitting in linux-next for quite a while and has not
  caused any issues. It comes with selftests which verify basic
  functionality and also test that a recycled pid cannot be signaled via
  a pidfd.

  Jon has written about a prior version of this patchset. It should
  cover the basic functionality since not a lot has changed since then:

      https://lwn.net/Articles/773459/

  The commit message for the syscall itself is extensively documenting
  the syscall, including it's functionality and extensibility"

* tag 'pidfd-v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  selftests: add tests for pidfd_send_signal()
  signal: add pidfd_send_signal() syscall
2019-03-16 13:47:14 -07:00
Linus Torvalds
f67e3fb489 device-dax for 5.1
* Replace the /sys/class/dax device model with /sys/bus/dax, and include
   a compat driver so distributions can opt-in to the new ABI.
 
 * Allow for an alternative driver for the device-dax address-range
 
 * Introduce the 'kmem' driver to hotplug / assign a device-dax
   address-range to the core-mm.
 
 * Arrange for the device-dax target-node to be onlined so that the newly
   added memory range can be uniquely referenced by numa apis.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJchWpGAAoJEB7SkWpmfYgCJk8P/0Q1DINszUDO/vKjJ09cDs9P
 Jw3it6GBIL50rDOu9QdcprSpwYDD0h1mLAV/m6oa3bVO+p4uWGvnxaxRx2HN2c/v
 vhZFtUDpHlqR63vzWMNVKRprYixCRJDUr6xQhhCcE3ak/ELN6w7LWfikKVWv15UL
 MfR96IQU38f+xRda/zSXnL9606Dvkvu/inEHj84lRcHIwj3sQAUalrE8bR3O32gZ
 bDg/l5kzT49o8ZXUo/TegvRSSSZpJmOl2DD0RW+ax5q3NI2bOXFrVDUKBKxf/hcQ
 E/V9i57TrqQx0GqRhnU7rN/v53cFZGGs31TEEIB/xs3bzCnADxwXcjL5b5K005J6
 vJjBA2ODBewHFK3uVx46Hy1iV4eCtZWj4QrMnrjdSrjXOfbF5GTbWOhPFgoq7TWf
 S7VqFEf3I2gDPaMq4o8Ej1kLH4HMYeor2NSOZjyvGn87rSZ3ZIQguwbaNIVl+itz
 gdDt0ZOU0BgOBkV+rZIeZDaGdloWCHcDPL15CkZaOZyzdWhfEZ7dod6ad+9udilU
 EUPH62RgzXZtfm5zpebYyjNVLbb9pLZ0nT+UypyGR6zqWx1SqU3mXi63NFXPco+x
 XA9j//edPeI6NHg2CXLEh8DLuCg3dG1zWRJANkiF+niBwyCR8CHtGWAoY6soXbKe
 2UrXGcIfXxyJ8V9v8v4q
 =hfa3
 -----END PGP SIGNATURE-----

Merge tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull device-dax updates from Dan Williams:
 "New device-dax infrastructure to allow persistent memory and other
  "reserved" / performance differentiated memories, to be assigned to
  the core-mm as "System RAM".

  Some users want to use persistent memory as additional volatile
  memory. They are willing to cope with potential performance
  differences, for example between DRAM and 3D Xpoint, and want to use
  typical Linux memory management apis rather than a userspace memory
  allocator layered over an mmap() of a dax file. The administration
  model is to decide how much Persistent Memory (pmem) to use as System
  RAM, create a device-dax-mode namespace of that size, and then assign
  it to the core-mm. The rationale for device-dax is that it is a
  generic memory-mapping driver that can be layered over any "special
  purpose" memory, not just pmem. On subsequent boots udev rules can be
  used to restore the memory assignment.

  One implication of using pmem as RAM is that mlock() no longer keeps
  data off persistent media. For this reason it is recommended to enable
  NVDIMM Security (previously merged for 5.0) to encrypt pmem contents
  at rest. We considered making this recommendation an actively enforced
  requirement, but in the end decided to leave it as a distribution /
  administrator policy to allow for emulation and test environments that
  lack security capable NVDIMMs.

  Summary:

   - Replace the /sys/class/dax device model with /sys/bus/dax, and
     include a compat driver so distributions can opt-in to the new ABI.

   - Allow for an alternative driver for the device-dax address-range

   - Introduce the 'kmem' driver to hotplug / assign a device-dax
     address-range to the core-mm.

   - Arrange for the device-dax target-node to be onlined so that the
     newly added memory range can be uniquely referenced by numa apis"

NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because
we currently have special - and very annoying rules in the kernel about
accessing PMEM only with the "MC safe" accessors, because machine checks
inside the regular repeat string copy functions can be fatal in some
(not described) circumstances.

And apparently the PMEM modules can cause that a lot more than regular
RAM.  The argument is that this happens because PMEM doesn't necessarily
get scrubbed at boot like RAM does, but that is planned to be added for
the user space tooling.

Quoting Dan from another email:
 "The exposure can be reduced in the volatile-RAM case by scanning for
  and clearing errors before it is onlined as RAM. The userspace tooling
  for that can be in place before v5.1-final. There's also runtime
  notifications of errors via acpi_nfit_uc_error_notify() from
  background scrubbers on the DIMM devices. With that mechanism the
  kernel could proactively clear newly discovered poison in the volatile
  case, but that would be additional development more suitable for v5.2.

  I understand the concern, and the need to highlight this issue by
  tapping the brakes on feature development, but I don't see PMEM as RAM
  making the situation worse when the exposure is also there via DAX in
  the PMEM case. Volatile-RAM is arguably a safer use case since it's
  possible to repair pages where the persistent case needs active
  application coordination"

* tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  device-dax: "Hotplug" persistent memory for use like normal RAM
  mm/resource: Let walk_system_ram_range() search child resources
  mm/memory-hotplug: Allow memory resources to be children
  mm/resource: Move HMM pr_debug() deeper into resource code
  mm/resource: Return real error codes from walk failures
  device-dax: Add a 'modalias' attribute to DAX 'bus' devices
  device-dax: Add a 'target_node' attribute
  device-dax: Auto-bind device after successful new_id
  acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node
  device-dax: Add /sys/class/dax backwards compatibility
  device-dax: Add support for a dax override driver
  device-dax: Move resource pinning+mapping into the common driver
  device-dax: Introduce bus + driver model
  device-dax: Start defining a dax bus model
  device-dax: Remove multi-resource infrastructure
  device-dax: Kill dax_region base
  device-dax: Kill dax_region ida
2019-03-16 13:05:32 -07:00