The current flags of the SMC_PNET_GET command only allow privileged
users to retrieve entries from the pnet table via netlink. The content
of the pnet table may be useful for all users though, e.g., for
debugging smc connection problems.
This patch removes the GENL_ADMIN_PERM flag so that unprivileged users
can read the pnet table.
Signed-off-by: Hans Wippel <ndev@hwipl.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following series extends MR creation routines to allow creation of
user MRs through kernel ULPs as a proxy. The immediate use case is to
allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
MRs to be created and such MRs were not possible to create prior this
series.
The first part of this patchset extends RDMA to have special verb
ib_reg_user_mr(). The common use case that uses this function is a
userspace application that allocates memory for HCA access but the
responsibility to register the memory at the HCA is on an kernel ULP.
This ULP acts as an agent for the userspace application.
The second part provides advise MR functionality for ULPs. This is
integral part of ODP flows and used to trigger pagefaults in advance
to prepare memory before running working set.
The third part is actual user of those in-kernel APIs.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQT1m3YD37UfMCUQBNwp8NhrnBAZsQUCXiVO8AAKCRAp8NhrnBAZ
scTrAP9gb0d3qv0IOtHw5aGI1DAgjTUn/SzUOnsjDEn7DIoh9gEA2+ZmaEyLXKrl
+UcZb31auy5P8ueJYokRLhLAyRcOIAg=
=yaHb
-----END PGP SIGNATURE-----
Merge tag 'rds-odp-for-5.5' of https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma
Leon Romanovsky says:
====================
Use ODP MRs for kernel ULPs
The following series extends MR creation routines to allow creation of
user MRs through kernel ULPs as a proxy. The immediate use case is to
allow RDS to work over FS-DAX, which requires ODP (on-demand-paging)
MRs to be created and such MRs were not possible to create prior this
series.
The first part of this patchset extends RDMA to have special verb
ib_reg_user_mr(). The common use case that uses this function is a
userspace application that allocates memory for HCA access but the
responsibility to register the memory at the HCA is on an kernel ULP.
This ULP acts as an agent for the userspace application.
The second part provides advise MR functionality for ULPs. This is
integral part of ODP flows and used to trigger pagefaults in advance
to prepare memory before running working set.
The third part is actual user of those in-kernel APIs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix non-blocking connect() in x25, from Martin Schiller.
2) Fix spurious decryption errors in kTLS, from Jakub Kicinski.
3) Netfilter use-after-free in mtype_destroy(), from Cong Wang.
4) Limit size of TSO packets properly in lan78xx driver, from Eric
Dumazet.
5) r8152 probe needs an endpoint sanity check, from Johan Hovold.
6) Prevent looping in tcp_bpf_unhash() during sockmap/tls free, from
John Fastabend.
7) hns3 needs short frames padded on transmit, from Yunsheng Lin.
8) Fix netfilter ICMP header corruption, from Eyal Birger.
9) Fix soft lockup when low on memory in hns3, from Yonglong Liu.
10) Fix NTUPLE firmware command failures in bnxt_en, from Michael Chan.
11) Fix memory leak in act_ctinfo, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
cxgb4: reject overlapped queues in TC-MQPRIO offload
cxgb4: fix Tx multi channel port rate limit
net: sched: act_ctinfo: fix memory leak
bnxt_en: Do not treat DSN (Digital Serial Number) read failure as fatal.
bnxt_en: Fix ipv6 RFS filter matching logic.
bnxt_en: Fix NTUPLE firmware command failures.
net: systemport: Fixed queue mapping in internal ring map
net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec
net: dsa: sja1105: Don't error out on disabled ports with no phy-mode
net: phy: dp83867: Set FORCE_LINK_GOOD to default after reset
net: hns: fix soft lockup when there is not enough memory
net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()
net/sched: act_ife: initalize ife->metalist earlier
netfilter: nat: fix ICMP header corruption on ICMP errors
net: wan: lapbether.c: Use built-in RCU list checking
netfilter: nf_tables: fix flowtable list del corruption
netfilter: nf_tables: fix memory leak in nf_tables_parse_netdev_hooks()
netfilter: nf_tables: remove WARN and add NLA_STRING upper limits
netfilter: nft_tunnel: ERSPAN_VERSION must not be null
netfilter: nft_tunnel: fix null-attribute check
...
Add packet trap that can report NVE packets that the device decided to
drop because their overlay source MAC is multicast.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add packet traps that can report packets that were dropped during tunnel
decapsulation.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add packet trap that can report packets that reached the router, but are
non-routable. For example, IGMP queries can be flooded by the device in
layer 2 and reach the router. Such packets should not be routed and
instead dropped.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next, they are:
1) Incorrect uapi header comment in bitwise, from Jeremy Sowden.
2) Fetch flow statistics if flow is still active.
3) Restrict flow matching on hardware based on input device.
4) Add nf_flow_offload_work_alloc() helper function.
5) Remove the last client of the FLOW_OFFLOAD_DYING flag, use teardown
instead.
6) Use atomic bitwise operation to operate with flow flags.
7) Add nf_flowtable_hw_offload() helper function to check for the
NF_FLOWTABLE_HW_OFFLOAD flag.
8) Add NF_FLOW_HW_REFRESH to retry hardware offload from the flowtable
software datapath.
9) Remove indirect calls in xt_hashlimit, from Florian Westphal.
10) Add nf_flow_offload_tuple() helper to consolidate code.
11) Add nf_flow_table_offload_cmd() helper function.
12) A few whitespace cleanups in nf_tables in bitwise and the bitmap/hash
set types, from Jeremy Sowden.
13) Cleanup netlink attribute checks in bitwise, from Jeremy Sowden.
14) Replace goto by return in error path of nft_bitwise_dump(), from
Jeremy Sowden.
15) Add bitwise operation netlink attribute, also from Jeremy.
16) Add nft_bitwise_init_bool(), from Jeremy Sowden.
17) Add nft_bitwise_eval_bool(), also from Jeremy.
18) Add nft_bitwise_dump_bool(), from Jeremy Sowden.
19) Disallow hardware offload for other that NFT_BITWISE_BOOL,
from Jeremy Sowden.
20) Add NFTA_BITWISE_DATA netlink attribute, again from Jeremy.
21) Add support for bitwise shift operation, from Jeremy Sowden.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Try prefetching pages when using On-Demand-Paging MR using
ib_advise_mr.
Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
On-Demand-Paging MRs are registered using ib_reg_user_mr and
unregistered with ib_dereg_mr.
Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Mark function parameters as 'const' where possible.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot reported some bogus lockdep warnings, for example bad unlock
balance in sch_direct_xmit(). They are due to a race condition between
slow path and fast path, that is qdisc_xmit_lock_key gets re-registered
in netdev_update_lockdep_key() on slow path, while we could still
acquire the queue->_xmit_lock on fast path in this small window:
CPU A CPU B
__netif_tx_lock();
lockdep_unregister_key(qdisc_xmit_lock_key);
__netif_tx_unlock();
lockdep_register_key(qdisc_xmit_lock_key);
In fact, unlike the addr_list_lock which has to be reordered when
the master/slave device relationship changes, queue->_xmit_lock is
only acquired on fast path and only when NETIF_F_LLTX is not set,
so there is likely no nested locking for it.
Therefore, we can just get rid of re-registration of
qdisc_xmit_lock_key.
Reported-by: syzbot+4ec99438ed7450da6272@syzkaller.appspotmail.com
Fixes: ab92d68fc2 ("net: core: add generic lockdep keys")
Cc: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net
The following patchset contains Netfilter fixes for net:
1) Fix use-after-free in ipset bitmap destroy path, from Cong Wang.
2) Missing init netns in entry cleanup path of arp_tables,
from Florian Westphal.
3) Fix WARN_ON in set destroy path due to missing cleanup on
transaction error.
4) Incorrect netlink sanity check in tunnel, from Florian Westphal.
5) Missing sanity check for erspan version netlink attribute, also
from Florian.
6) Remove WARN in nft_request_module() that can be triggered from
userspace, from Florian Westphal.
7) Memleak in NFTA_HOOK_DEVS netlink parser, from Dan Carpenter.
8) List poison from commit path for flowtables that are added and
deleted in the same batch, from Florian Westphal.
9) Fix NAT ICMP packet corruption, from Eyal Birger.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hitherto nft_bitwise has only supported boolean operations: NOT, AND, OR
and XOR. Extend it to do shifts as well.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add a new bitwise netlink attribute that will be used by shift
operations to store the size of the shift. It is not used by boolean
operations.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Only boolean operations supports offloading, so check the type of the
operation and return an error for other types.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Split the code specific to dumping bitwise boolean operations out into a
separate function. A similar function will be added later for shift
operations.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Split the code specific to evaluating bitwise boolean operations out
into a separate function. Similar functions will be added later for
shift operations.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Split the code specific to initializing bitwise boolean operations out
into a separate function. A similar function will be added later for
shift operations.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add a new bitwise netlink attribute, NFTA_BITWISE_OP, which is set to a
value of a new enum, nft_bitwise_ops. It describes the type of
operation an expression contains. Currently, it only has one value:
NFT_BITWISE_BOOL. More values will be added later to implement shifts.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When dumping a bitwise expression, if any of the puts fails, we use goto
to jump to a label. However, no clean-up is required and the only
statement at the label is a return. Drop the goto's and return
immediately instead.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In later patches, we will be adding more checks. In order to be
consistent and prevent complaints from checkpatch.pl, replace the
existing comparisons with NULL with logical NOT operators.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Indentation fixes for the parameters of a few nft functions.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
no need, just use a simple boolean to indicate we want to reap all
entries.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If nf_flow_offload_add() fails to add the flow to hardware, then the
NF_FLOW_HW_REFRESH flag bit is set and the flow remains in the flowtable
software path.
If flowtable hardware offload is enabled, this patch enqueues a new
request to offload this flow to hardware.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This function checks for the NF_FLOWTABLE_HW_OFFLOAD flag, meaning that
the flowtable hardware offload is enabled.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Originally, all flow flag bits were set on only from the workqueue. With
the introduction of the flow teardown state and hardware offload this is
no longer true. Let's be safe and use atomic bitwise operation to
operation with flow flags.
Fixes: 59c466dd68 ("netfilter: nf_flow_table: add a new flow state for tearing down offloading")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The dying bit removes the conntrack entry if the netdev that owns this
flow is going down. Instead, use the teardown mechanism to push back the
flow to conntrack to let the classic software path decide what to do
with it.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add helper function to allocate and initialize flow offload work and use
it to consolidate existing code.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Set on FLOW_DISSECTOR_KEY_META meta key using flow tuple ingress interface.
Fixes: c29f74e0df ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Do not fetch statistics if flow has expired since it might not in
hardware anymore. After this update, remove the FLOW_OFFLOAD_HW_DYING
check from nf_flow_offload_stats() since this flag is never set on.
Fixes: c29f74e0df ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: wenxu <wenxu@ucloud.cn>
Add code to check if memory intended for RDMA is FS-DAX-memory. RDS
will fail with error code EOPNOTSUPP if FS-DAX-memory is detected.
Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Commit 8303b7e8f0 ("netfilter: nat: fix spurious connection timeouts")
made nf_nat_icmp_reply_translation() use icmp_manip_pkt() as the l4
manipulation function for the outer packet on ICMP errors.
However, icmp_manip_pkt() assumes the packet has an 'id' field which
is not correct for all types of ICMP messages.
This is not correct for ICMP error packets, and leads to bogus bytes
being written the ICMP header, which can be wrongfully regarded as
'length' bytes by RFC 4884 compliant receivers.
Fix by assigning the 'id' field only for ICMP messages that have this
semantic.
Reported-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Fixes: 8303b7e8f0 ("netfilter: nat: fix spurious connection timeouts")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
syzbot reported following crash:
list_del corruption, ffff88808c9bb000->prev is LIST_POISON2 (dead000000000122)
[..]
Call Trace:
__list_del_entry include/linux/list.h:131 [inline]
list_del_rcu include/linux/rculist.h:148 [inline]
nf_tables_commit+0x1068/0x3b30 net/netfilter/nf_tables_api.c:7183
[..]
The commit transaction list has:
NFT_MSG_NEWTABLE
NFT_MSG_NEWFLOWTABLE
NFT_MSG_DELFLOWTABLE
NFT_MSG_DELTABLE
A missing generation check during DELTABLE processing causes it to queue
the DELFLOWTABLE operation a second time, so we corrupt the list here:
case NFT_MSG_DELFLOWTABLE:
list_del_rcu(&nft_trans_flowtable(trans)->list);
nf_tables_flowtable_notify(&trans->ctx,
because we have two different DELFLOWTABLE transactions for the same
flowtable. We then call list_del_rcu() twice for the same flowtable->list.
The object handling seems to suffer from the same bug so add a generation
check too and only queue delete transactions for flowtables/objects that
are still active in the next generation.
Reported-by: syzbot+37a6804945a3a13b1572@syzkaller.appspotmail.com
Fixes: 3b49e2e94e ("netfilter: nf_tables: add flow table netlink frontend")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Syzbot detected a leak in nf_tables_parse_netdev_hooks(). If the hook
already exists, then the error handling doesn't free the newest "hook".
Reported-by: syzbot+f9d4095107fc8749c69c@syzkaller.appspotmail.com
Fixes: b75a3e8371 ("netfilter: nf_tables: allow netdevice to be used only once per flowtable")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This WARN can trigger because some of the names fed to the module
autoload function can be of arbitrary length.
Remove the WARN and add limits for all NLA_STRING attributes.
Reported-by: syzbot+0e63ae76d117ae1c3a01@syzkaller.appspotmail.com
Fixes: 452238e8d5 ("netfilter: nf_tables: add and use helper for module autoload")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
else we get null deref when one of the attributes is missing, both
must be non-null.
Reported-by: syzbot+76d0b80493ac881ff77b@syzkaller.appspotmail.com
Fixes: aaecfdb5c5 ("netfilter: nf_tables: match on tunnel metadata")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
DSA subsystem takes care of netdev statistics since commit 4ed70ce9f0
("net: dsa: Refactor transmit path to eliminate duplication"), so
any accounting inside tagger callbacks is redundant and can lead to
messing up the stats.
This bug is present in Qualcomm tagger since day 0.
Fixes: cafdc45c94 ("net-next: dsa: add Qualcomm tag RX/TX handler")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The correct name is GSWIP (Gigabit Switch IP). Typo was introduced in
875138f81d ("dsa: Move tagger name into its ops structure") while
moving tagger names to their structures.
Fixes: 875138f81d ("dsa: Move tagger name into its ops structure")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2020-01-15
The following pull-request contains BPF updates for your *net* tree.
We've added 12 non-merge commits during the last 9 day(s) which contain
a total of 13 files changed, 95 insertions(+), 43 deletions(-).
The main changes are:
1) Fix refcount leak for TCP time wait and request sockets for socket lookup
related BPF helpers, from Lorenz Bauer.
2) Fix wrong verification of ARSH instruction under ALU32, from Daniel Borkmann.
3) Batch of several sockmap and related TLS fixes found while operating
more complex BPF programs with Cilium and OpenSSL, from John Fastabend.
4) Fix sockmap to read psock's ingress_msg queue before regular sk_receive_queue()
to avoid purging data upon teardown, from Lingpeng Chen.
5) Fix printing incorrect pointer in bpftool's btf_dump_ptr() in order to properly
dump a BPF map's value with BTF, from Martin KaFai Lau.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When user returns SK_DROP we need to reset the number of copied bytes
to indicate to the user the bytes were dropped and not sent. If we
don't reset the copied arg sendmsg will return as if those bytes were
copied giving the user a positive return value.
This works as expected today except in the case where the user also
pops bytes. In the pop case the sg.size is reduced but we don't correctly
account for this when copied bytes is reset. The popped bytes are not
accounted for and we return a small positive value potentially confusing
the user.
The reason this happens is due to a typo where we do the wrong comparison
when accounting for pop bytes. In this fix notice the if/else is not
needed and that we have a similar problem if we push data except its not
visible to the user because if delta is larger the sg.size we return a
negative value so it appears as an error regardless.
Fixes: 7246d8ed4d ("bpf: helper to pop data from messages")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/bpf/20200111061206.8028-9-john.fastabend@gmail.com
Its possible through a set of push, pop, apply helper calls to construct
a skmsg, which is just a ring of scatterlist elements, with the start
value larger than the end value. For example,
end start
|_0_|_1_| ... |_n_|_n+1_|
Where end points at 1 and start points and n so that valid elements is
the set {n, n+1, 0, 1}.
Currently, because we don't build the correct chain only {n, n+1} will
be sent. This adds a check and sg_chain call to correctly submit the
above to the crypto and tls send path.
Fixes: d3b18ad31f ("tls: add bpf support to sk_msg handling")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/bpf/20200111061206.8028-8-john.fastabend@gmail.com
It is possible to build a plaintext buffer using push helper that is larger
than the allocated encrypt buffer. When this record is pushed to crypto
layers this can result in a NULL pointer dereference because the crypto
API expects the encrypt buffer is large enough to fit the plaintext
buffer. Kernel splat below.
To resolve catch the cases this can happen and split the buffer into two
records to send individually. Unfortunately, there is still one case to
handle where the split creates a zero sized buffer. In this case we merge
the buffers and unmark the split. This happens when apply is zero and user
pushed data beyond encrypt buffer. This fixes the original case as well
because the split allocated an encrypt buffer larger than the plaintext
buffer and the merge simply moves the pointers around so we now have
a reference to the new (larger) encrypt buffer.
Perhaps its not ideal but it seems the best solution for a fixes branch
and avoids handling these two cases, (a) apply that needs split and (b)
non apply case. The are edge cases anyways so optimizing them seems not
necessary unless someone wants later in next branches.
[ 306.719107] BUG: kernel NULL pointer dereference, address: 0000000000000008
[...]
[ 306.747260] RIP: 0010:scatterwalk_copychunks+0x12f/0x1b0
[...]
[ 306.770350] Call Trace:
[ 306.770956] scatterwalk_map_and_copy+0x6c/0x80
[ 306.772026] gcm_enc_copy_hash+0x4b/0x50
[ 306.772925] gcm_hash_crypt_remain_continue+0xef/0x110
[ 306.774138] gcm_hash_crypt_continue+0xa1/0xb0
[ 306.775103] ? gcm_hash_crypt_continue+0xa1/0xb0
[ 306.776103] gcm_hash_assoc_remain_continue+0x94/0xa0
[ 306.777170] gcm_hash_assoc_continue+0x9d/0xb0
[ 306.778239] gcm_hash_init_continue+0x8f/0xa0
[ 306.779121] gcm_hash+0x73/0x80
[ 306.779762] gcm_encrypt_continue+0x6d/0x80
[ 306.780582] crypto_gcm_encrypt+0xcb/0xe0
[ 306.781474] crypto_aead_encrypt+0x1f/0x30
[ 306.782353] tls_push_record+0x3b9/0xb20 [tls]
[ 306.783314] ? sk_psock_msg_verdict+0x199/0x300
[ 306.784287] bpf_exec_tx_verdict+0x3f2/0x680 [tls]
[ 306.785357] tls_sw_sendmsg+0x4a3/0x6a0 [tls]
test_sockmap test signature to trigger bug,
[TEST]: (1, 1, 1, sendmsg, pass,redir,start 1,end 2,pop (1,2),ktls,):
Fixes: d3b18ad31f ("tls: add bpf support to sk_msg handling")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/bpf/20200111061206.8028-7-john.fastabend@gmail.com
Leaving an incorrect end mark in place when passing to crypto
layer will cause crypto layer to stop processing data before
all data is encrypted. To fix clear the end mark on push
data instead of expecting users of the helper to clear the
mark value after the fact.
This happens when we push data into the middle of a skmsg and
have room for it so we don't do a set of copies that already
clear the end flag.
Fixes: 6fff607e2f ("bpf: sk_msg program helper bpf_msg_push_data")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/bpf/20200111061206.8028-6-john.fastabend@gmail.com