ipsec-next-2024-09-10

-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmbf6xAACgkQrB3Eaf9P
 W7eZQA/9HuHTWBg0V43QDT1rjNnKult+uBKYpKrh045outqMs+cU8bsww5ZuIAKx
 ktN66OCE67d7XeFttb9UAJUPqQ98RjwjVUOpjRJ5iRDtj2bmn/5VGSYuH7zx5so0
 msFs5gkomo2ZZNjcMOSrDVGUoCdlHh1og5L2KN/FgztSA1smDdUBQOWNm1peezbI
 eJFt2Q6KCNfzwPthmQte0dmDnK5gWPducereSx03tMuSyUmPML1zrzOFXBXSg09e
 dAlDTxbAXZDrXS4Ii0y/FEM2Ugkjg9FXbE1kvM0i05GIc/SGnEBGEcdW5YbmRhOL
 4JlLnpiLTmKTaIZ0GdpADv7XZMga6R01AalSPsJz+H7aNAHTKkK+SzQY4YXRucZy
 SsASM39oRLzo9Bm4ZZ773Nw83cxBgO/ZixK4KVvCZI/1ftD+9zn72eqk+CeveSeE
 ChaXGuWpRdfAOsgozFJNFx/ffK5qzxFKkIeN9KN0QYV/XJuZJ7nD6eQkH9ydgvTI
 4cexY+cs4wgfdi9dDkVHPVhCR7mRlfi5r/VL8rtWWnWzR07okKF4rW6dgvx33m60
 9MmF1/EdD2uh3CLcBMjNg6qXdC07VeDpFLqWs+utJvSHMuI43uE4FkRQui/J6T9N
 RX7zzkFBsPvPpm5GHLx2u/wvnzX1co1Rk9xzbC+J6FEPlm2/0vI=
 =ErGl
 -----END PGP SIGNATURE-----

Merge tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2024-09-10

1) Remove an unneeded WARN_ON on packet offload.
   From Patrisious Haddad.

2) Add a copy from skb_seq_state to buffer function.
   This is needed for the upcomming IPTFS patchset.
   From Christian Hopps.

3) Spelling fix in xfrm.h.
   From Simon Horman.

4) Speed up xfrm policy insertions.
   From Florian Westphal.

5) Add and revert a patch to support xfrm interfaces
   for packet offload. This patch was just half cooked.

6) Extend usage of the new xfrm_policy_is_dead_or_sk helper.
   From Florian Westphal.

7) Update comments on sdb and xfrm_policy.
   From Florian Westphal.

8) Fix a null pointer dereference in the new policy insertion
   code From Florian Westphal.

9) Fix an uninitialized variable in the new policy insertion
   code. From Nathan Chancellor.

* tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  xfrm: policy: Restore dir assignments in xfrm_hash_rebuild()
  xfrm: policy: fix null dereference
  Revert "xfrm: add SA information to the offloaded packet"
  xfrm: minor update to sdb and xfrm_policy comments
  xfrm: policy: use recently added helper in more places
  xfrm: add SA information to the offloaded packet
  xfrm: policy: remove remaining use of inexact list
  xfrm: switch migrate to xfrm_policy_lookup_bytype
  xfrm: policy: don't iterate inexact policies twice at insert time
  selftests: add xfrm policy insertion speed test script
  xfrm: Correct spelling in xfrm.h
  net: add copy from skb_seq_state to buffer function
  xfrm: Remove documentation WARN_ON to limit return values for offloaded SA
====================

Link: https://patch.msgid.link/20240910065507.2436394-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit is contained in:
Jakub Kicinski 2024-09-10 19:00:47 -07:00
commit ea403549da
7 changed files with 258 additions and 136 deletions

View File

@ -1433,6 +1433,7 @@ void skb_prepare_seq_read(struct sk_buff *skb, unsigned int from,
unsigned int skb_seq_read(unsigned int consumed, const u8 **data,
struct skb_seq_state *st);
void skb_abort_seq_read(struct skb_seq_state *st);
int skb_copy_seq_read(struct skb_seq_state *st, int offset, void *to, int len);
unsigned int skb_find_text(struct sk_buff *skb, unsigned int from,
unsigned int to, struct ts_config *config);

View File

@ -67,27 +67,27 @@
- instance of a transformer, struct xfrm_state (=SA)
- template to clone xfrm_state, struct xfrm_tmpl
SPD is plain linear list of xfrm_policy rules, ordered by priority.
SPD is organized as hash table (for policies that meet minimum address prefix
length setting, net->xfrm.policy_hthresh). Other policies are stored in
lists, sorted into rbtree ordered by destination and source address networks.
See net/xfrm/xfrm_policy.c for details.
(To be compatible with existing pfkeyv2 implementations,
many rules with priority of 0x7fffffff are allowed to exist and
such rules are ordered in an unpredictable way, thanks to bsd folks.)
Lookup is plain linear search until the first match with selector.
If "action" is "block", then we prohibit the flow, otherwise:
if "xfrms_nr" is zero, the flow passes untransformed. Otherwise,
policy entry has list of up to XFRM_MAX_DEPTH transformations,
described by templates xfrm_tmpl. Each template is resolved
to a complete xfrm_state (see below) and we pack bundle of transformations
to a dst_entry returned to requestor.
to a dst_entry returned to requester.
dst -. xfrm .-> xfrm_state #1
|---. child .-> dst -. xfrm .-> xfrm_state #2
|---. child .-> dst -. xfrm .-> xfrm_state #3
|---. child .-> NULL
Bundles are cached at xrfm_policy struct (field ->bundles).
Resolution of xrfm_tmpl
-----------------------
@ -526,6 +526,36 @@ struct xfrm_policy_queue {
unsigned long timeout;
};
/**
* struct xfrm_policy - xfrm policy
* @xp_net: network namespace the policy lives in
* @bydst: hlist node for SPD hash table or rbtree list
* @byidx: hlist node for index hash table
* @lock: serialize changes to policy structure members
* @refcnt: reference count, freed once it reaches 0
* @pos: kernel internal tie-breaker to determine age of policy
* @timer: timer
* @genid: generation, used to invalidate old policies
* @priority: priority, set by userspace
* @index: policy index (autogenerated)
* @if_id: virtual xfrm interface id
* @mark: packet mark
* @selector: selector
* @lft: liftime configuration data
* @curlft: liftime state
* @walk: list head on pernet policy list
* @polq: queue to hold packets while aqcuire operaion in progress
* @bydst_reinsert: policy tree node needs to be merged
* @type: XFRM_POLICY_TYPE_MAIN or _SUB
* @action: XFRM_POLICY_ALLOW or _BLOCK
* @flags: XFRM_POLICY_LOCALOK, XFRM_POLICY_ICMP
* @xfrm_nr: number of used templates in @xfrm_vec
* @family: protocol family
* @security: SELinux security label
* @xfrm_vec: array of templates to resolve state
* @rcu: rcu head, used to defer memory release
* @xdo: hardware offload state
*/
struct xfrm_policy {
possible_net_t xp_net;
struct hlist_node bydst;
@ -555,7 +585,6 @@ struct xfrm_policy {
u16 family;
struct xfrm_sec_ctx *security;
struct xfrm_tmpl xfrm_vec[XFRM_MAX_DEPTH];
struct hlist_node bydst_inexact_list;
struct rcu_head rcu;
struct xfrm_dev_offload xdo;
@ -1016,7 +1045,7 @@ void xfrm_dst_ifdown(struct dst_entry *dst, struct net_device *dev);
struct xfrm_if_parms {
int link; /* ifindex of underlying L2 interface */
u32 if_id; /* interface identifyer */
u32 if_id; /* interface identifier */
bool collect_md;
};

View File

@ -4411,6 +4411,41 @@ void skb_abort_seq_read(struct skb_seq_state *st)
}
EXPORT_SYMBOL(skb_abort_seq_read);
/**
* skb_copy_seq_read() - copy from a skb_seq_state to a buffer
* @st: source skb_seq_state
* @offset: offset in source
* @to: destination buffer
* @len: number of bytes to copy
*
* Copy @len bytes from @offset bytes into the source @st to the destination
* buffer @to. `offset` should increase (or be unchanged) with each subsequent
* call to this function. If offset needs to decrease from the previous use `st`
* should be reset first.
*
* Return: 0 on success or -EINVAL if the copy ended early
*/
int skb_copy_seq_read(struct skb_seq_state *st, int offset, void *to, int len)
{
const u8 *data;
u32 sqlen;
for (;;) {
sqlen = skb_seq_read(offset, &data, st);
if (sqlen == 0)
return -EINVAL;
if (sqlen >= len) {
memcpy(to, data, len);
return 0;
}
memcpy(to, data, sqlen);
to += sqlen;
offset += sqlen;
len -= sqlen;
}
}
EXPORT_SYMBOL(skb_copy_seq_read);
#define TS_SKB_CB(state) ((struct skb_seq_state *) &((state)->cb))
static unsigned int skb_ts_get_next_block(unsigned int offset, const u8 **text,

View File

@ -328,12 +328,8 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
/* User explicitly requested packet offload mode and configured
* policy in addition to the XFRM state. So be civil to users,
* and return an error instead of taking fallback path.
*
* This WARN_ON() can be seen as a documentation for driver
* authors to do not return -EOPNOTSUPP in packet offload mode.
*/
WARN_ON(err == -EOPNOTSUPP && is_packet_offload);
if (err != -EOPNOTSUPP || is_packet_offload) {
if ((err != -EOPNOTSUPP && !is_packet_offload) || is_packet_offload) {
NL_SET_ERR_MSG_WEAK(extack, "Device failed to offload this state");
return err;
}

View File

@ -110,7 +110,11 @@ struct xfrm_pol_inexact_node {
* 4. saddr:any list from saddr tree
*
* This result set then needs to be searched for the policy with
* the lowest priority. If two results have same prio, youngest one wins.
* the lowest priority. If two candidates have the same priority, the
* struct xfrm_policy pos member with the lower number is used.
*
* This replicates previous single-list-search algorithm which would
* return first matching policy in the (ordered-by-priority) list.
*/
struct xfrm_pol_inexact_key {
@ -197,8 +201,6 @@ xfrm_policy_inexact_lookup_rcu(struct net *net,
static struct xfrm_policy *
xfrm_policy_insert_list(struct hlist_head *chain, struct xfrm_policy *policy,
bool excl);
static void xfrm_policy_insert_inexact_list(struct hlist_head *chain,
struct xfrm_policy *policy);
static bool
xfrm_policy_find_inexact_candidates(struct xfrm_pol_inexact_candidates *cand,
@ -411,7 +413,6 @@ struct xfrm_policy *xfrm_policy_alloc(struct net *net, gfp_t gfp)
if (policy) {
write_pnet(&policy->xp_net, net);
INIT_LIST_HEAD(&policy->walk.all);
INIT_HLIST_NODE(&policy->bydst_inexact_list);
INIT_HLIST_NODE(&policy->bydst);
INIT_HLIST_NODE(&policy->byidx);
rwlock_init(&policy->lock);
@ -1229,26 +1230,31 @@ xfrm_policy_inexact_insert(struct xfrm_policy *policy, u8 dir, int excl)
return ERR_PTR(-EEXIST);
}
chain = &net->xfrm.policy_inexact[dir];
xfrm_policy_insert_inexact_list(chain, policy);
if (delpol)
__xfrm_policy_inexact_prune_bin(bin, false);
return delpol;
}
static bool xfrm_policy_is_dead_or_sk(const struct xfrm_policy *policy)
{
int dir;
if (policy->walk.dead)
return true;
dir = xfrm_policy_id2dir(policy->index);
return dir >= XFRM_POLICY_MAX;
}
static void xfrm_hash_rebuild(struct work_struct *work)
{
struct net *net = container_of(work, struct net,
xfrm.policy_hthresh.work);
unsigned int hmask;
struct xfrm_policy *pol;
struct xfrm_policy *policy;
struct hlist_head *chain;
struct hlist_head *odst;
struct hlist_node *newpos;
int i;
int dir;
unsigned seq;
u8 lbits4, rbits4, lbits6, rbits6;
@ -1275,13 +1281,10 @@ static void xfrm_hash_rebuild(struct work_struct *work)
struct xfrm_pol_inexact_bin *bin;
u8 dbits, sbits;
if (policy->walk.dead)
if (xfrm_policy_is_dead_or_sk(policy))
continue;
dir = xfrm_policy_id2dir(policy->index);
if (dir >= XFRM_POLICY_MAX)
continue;
if ((dir & XFRM_POLICY_MASK) == XFRM_POLICY_OUT) {
if (policy->family == AF_INET) {
dbits = rbits4;
@ -1312,23 +1315,7 @@ static void xfrm_hash_rebuild(struct work_struct *work)
goto out_unlock;
}
/* reset the bydst and inexact table in all directions */
for (dir = 0; dir < XFRM_POLICY_MAX; dir++) {
struct hlist_node *n;
hlist_for_each_entry_safe(policy, n,
&net->xfrm.policy_inexact[dir],
bydst_inexact_list) {
hlist_del_rcu(&policy->bydst);
hlist_del_init(&policy->bydst_inexact_list);
}
hmask = net->xfrm.policy_bydst[dir].hmask;
odst = net->xfrm.policy_bydst[dir].table;
for (i = hmask; i >= 0; i--) {
hlist_for_each_entry_safe(policy, n, odst + i, bydst)
hlist_del_rcu(&policy->bydst);
}
if ((dir & XFRM_POLICY_MASK) == XFRM_POLICY_OUT) {
/* dir out => dst = remote, src = local */
net->xfrm.policy_bydst[dir].dbits4 = rbits4;
@ -1346,14 +1333,13 @@ static void xfrm_hash_rebuild(struct work_struct *work)
/* re-insert all policies by order of creation */
list_for_each_entry_reverse(policy, &net->xfrm.policy_all, walk.all) {
if (policy->walk.dead)
if (xfrm_policy_is_dead_or_sk(policy))
continue;
dir = xfrm_policy_id2dir(policy->index);
if (dir >= XFRM_POLICY_MAX) {
/* skip socket policies */
continue;
}
hlist_del_rcu(&policy->bydst);
newpos = NULL;
dir = xfrm_policy_id2dir(policy->index);
chain = policy_hash_bysel(net, &policy->selector,
policy->family, dir);
@ -1520,42 +1506,6 @@ static const struct rhashtable_params xfrm_pol_inexact_params = {
.automatic_shrinking = true,
};
static void xfrm_policy_insert_inexact_list(struct hlist_head *chain,
struct xfrm_policy *policy)
{
struct xfrm_policy *pol, *delpol = NULL;
struct hlist_node *newpos = NULL;
int i = 0;
hlist_for_each_entry(pol, chain, bydst_inexact_list) {
if (pol->type == policy->type &&
pol->if_id == policy->if_id &&
!selector_cmp(&pol->selector, &policy->selector) &&
xfrm_policy_mark_match(&policy->mark, pol) &&
xfrm_sec_ctx_match(pol->security, policy->security) &&
!WARN_ON(delpol)) {
delpol = pol;
if (policy->priority > pol->priority)
continue;
} else if (policy->priority >= pol->priority) {
newpos = &pol->bydst_inexact_list;
continue;
}
if (delpol)
break;
}
if (newpos && policy->xdo.type != XFRM_DEV_OFFLOAD_PACKET)
hlist_add_behind_rcu(&policy->bydst_inexact_list, newpos);
else
hlist_add_head_rcu(&policy->bydst_inexact_list, chain);
hlist_for_each_entry(pol, chain, bydst_inexact_list) {
pol->pos = i;
i++;
}
}
static struct xfrm_policy *xfrm_policy_insert_list(struct hlist_head *chain,
struct xfrm_policy *policy,
bool excl)
@ -2295,10 +2245,52 @@ out:
return pol;
}
static u32 xfrm_gen_pos_slow(struct net *net)
{
struct xfrm_policy *policy;
u32 i = 0;
/* oldest entry is last in list */
list_for_each_entry_reverse(policy, &net->xfrm.policy_all, walk.all) {
if (!xfrm_policy_is_dead_or_sk(policy))
policy->pos = ++i;
}
return i;
}
static u32 xfrm_gen_pos(struct net *net)
{
const struct xfrm_policy *policy;
u32 i = 0;
/* most recently added policy is at the head of the list */
list_for_each_entry(policy, &net->xfrm.policy_all, walk.all) {
if (xfrm_policy_is_dead_or_sk(policy))
continue;
if (policy->pos == UINT_MAX)
return xfrm_gen_pos_slow(net);
i = policy->pos + 1;
break;
}
return i;
}
static void __xfrm_policy_link(struct xfrm_policy *pol, int dir)
{
struct net *net = xp_net(pol);
switch (dir) {
case XFRM_POLICY_IN:
case XFRM_POLICY_FWD:
case XFRM_POLICY_OUT:
pol->pos = xfrm_gen_pos(net);
break;
}
list_add(&pol->walk.all, &net->xfrm.policy_all);
net->xfrm.policy_count[dir]++;
xfrm_pol_hold(pol);
@ -2315,7 +2307,6 @@ static struct xfrm_policy *__xfrm_policy_unlink(struct xfrm_policy *pol,
/* Socket policies are not hashed. */
if (!hlist_unhashed(&pol->bydst)) {
hlist_del_rcu(&pol->bydst);
hlist_del_init(&pol->bydst_inexact_list);
hlist_del(&pol->byidx);
}
@ -4438,63 +4429,50 @@ EXPORT_SYMBOL_GPL(xfrm_audit_policy_delete);
#endif
#ifdef CONFIG_XFRM_MIGRATE
static bool xfrm_migrate_selector_match(const struct xfrm_selector *sel_cmp,
const struct xfrm_selector *sel_tgt)
{
if (sel_cmp->proto == IPSEC_ULPROTO_ANY) {
if (sel_tgt->family == sel_cmp->family &&
xfrm_addr_equal(&sel_tgt->daddr, &sel_cmp->daddr,
sel_cmp->family) &&
xfrm_addr_equal(&sel_tgt->saddr, &sel_cmp->saddr,
sel_cmp->family) &&
sel_tgt->prefixlen_d == sel_cmp->prefixlen_d &&
sel_tgt->prefixlen_s == sel_cmp->prefixlen_s) {
return true;
}
} else {
if (memcmp(sel_tgt, sel_cmp, sizeof(*sel_tgt)) == 0) {
return true;
}
}
return false;
}
static struct xfrm_policy *xfrm_migrate_policy_find(const struct xfrm_selector *sel,
u8 dir, u8 type, struct net *net, u32 if_id)
{
struct xfrm_policy *pol, *ret = NULL;
struct hlist_head *chain;
u32 priority = ~0U;
struct xfrm_policy *pol;
struct flowi fl;
spin_lock_bh(&net->xfrm.xfrm_policy_lock);
chain = policy_hash_direct(net, &sel->daddr, &sel->saddr, sel->family, dir);
hlist_for_each_entry(pol, chain, bydst) {
if ((if_id == 0 || pol->if_id == if_id) &&
xfrm_migrate_selector_match(sel, &pol->selector) &&
pol->type == type) {
ret = pol;
priority = ret->priority;
memset(&fl, 0, sizeof(fl));
fl.flowi_proto = sel->proto;
switch (sel->family) {
case AF_INET:
fl.u.ip4.saddr = sel->saddr.a4;
fl.u.ip4.daddr = sel->daddr.a4;
if (sel->proto == IPSEC_ULPROTO_ANY)
break;
}
}
chain = &net->xfrm.policy_inexact[dir];
hlist_for_each_entry(pol, chain, bydst_inexact_list) {
if ((pol->priority >= priority) && ret)
fl.u.flowi4_oif = sel->ifindex;
fl.u.ip4.fl4_sport = sel->sport;
fl.u.ip4.fl4_dport = sel->dport;
break;
if ((if_id == 0 || pol->if_id == if_id) &&
xfrm_migrate_selector_match(sel, &pol->selector) &&
pol->type == type) {
ret = pol;
case AF_INET6:
fl.u.ip6.saddr = sel->saddr.in6;
fl.u.ip6.daddr = sel->daddr.in6;
if (sel->proto == IPSEC_ULPROTO_ANY)
break;
}
fl.u.flowi6_oif = sel->ifindex;
fl.u.ip6.fl4_sport = sel->sport;
fl.u.ip6.fl4_dport = sel->dport;
break;
default:
return ERR_PTR(-EAFNOSUPPORT);
}
xfrm_pol_hold(ret);
rcu_read_lock();
spin_unlock_bh(&net->xfrm.xfrm_policy_lock);
pol = xfrm_policy_lookup_bytype(net, type, &fl, sel->family, dir, if_id);
if (IS_ERR_OR_NULL(pol))
goto out_unlock;
return ret;
if (!xfrm_pol_hold_rcu(pol))
pol = NULL;
out_unlock:
rcu_read_unlock();
return pol;
}
static int migrate_tmpl_match(const struct xfrm_migrate *m, const struct xfrm_tmpl *t)
@ -4631,9 +4609,9 @@ int xfrm_migrate(const struct xfrm_selector *sel, u8 dir, u8 type,
/* Stage 1 - find policy */
pol = xfrm_migrate_policy_find(sel, dir, type, net, if_id);
if (!pol) {
if (IS_ERR_OR_NULL(pol)) {
NL_SET_ERR_MSG(extack, "Target policy not found");
err = -ENOENT;
err = IS_ERR(pol) ? PTR_ERR(pol) : -ENOENT;
goto out;
}

View File

@ -56,7 +56,7 @@ TEST_PROGS += ip_local_port_range.sh
TEST_PROGS += rps_default_mask.sh
TEST_PROGS += big_tcp.sh
TEST_PROGS += netns-sysctl.sh
TEST_PROGS_EXTENDED := toeplitz_client.sh toeplitz.sh
TEST_PROGS_EXTENDED := toeplitz_client.sh toeplitz.sh xfrm_policy_add_speed.sh
TEST_GEN_FILES = socket nettest
TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy reuseport_addr_any
TEST_GEN_FILES += tcp_mmap tcp_inq psock_snd txring_overwrite

View File

@ -0,0 +1,83 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
#
source lib.sh
timeout=4m
ret=0
tmp=$(mktemp)
cleanup() {
cleanup_all_ns
rm -f "$tmp"
}
trap cleanup EXIT
maxpolicies=100000
[ "$KSFT_MACHINE_SLOW" = "yes" ] && maxpolicies=10000
do_dummies4() {
local dir="$1"
local max="$2"
local policies
local pfx
pfx=30
policies=0
ip netns exec "$ns" ip xfrm policy flush
for i in $(seq 1 100);do
local s
local d
for j in $(seq 1 255);do
s=$((i+0))
d=$((i+100))
for a in $(seq 1 8 255); do
policies=$((policies+1))
[ "$policies" -gt "$max" ] && return
echo xfrm policy add src 10.$s.$j.0/30 dst 10.$d.$j.$a/$pfx dir $dir action block
done
for a in $(seq 1 8 255); do
policies=$((policies+1))
[ "$policies" -gt "$max" ] && return
echo xfrm policy add src 10.$s.$j.$a/30 dst 10.$d.$j.0/$pfx dir $dir action block
done
done
done
}
setup_ns ns
do_bench()
{
local max="$1"
start=$(date +%s%3N)
do_dummies4 "out" "$max" > "$tmp"
if ! timeout "$timeout" ip netns exec "$ns" ip -batch "$tmp";then
echo "WARNING: policy insertion cancelled after $timeout"
ret=1
fi
stop=$(date +%s%3N)
result=$((stop-start))
policies=$(wc -l < "$tmp")
printf "Inserted %-06s policies in $result ms\n" $policies
have=$(ip netns exec "$ns" ip xfrm policy show | grep "action block" | wc -l)
if [ "$have" -ne "$policies" ]; then
echo "WARNING: mismatch, have $have policies, expected $policies"
ret=1
fi
}
p=100
while [ $p -le "$maxpolicies" ]; do
do_bench "$p"
p="${p}0"
done
exit $ret