License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 14:07:57 +00:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2007-12-16 21:29:36 +00:00
|
|
|
/*
|
|
|
|
* ipv4 in net namespaces
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef __NETNS_IPV4_H__
|
|
|
|
#define __NETNS_IPV4_H__
|
2008-01-10 11:27:51 +00:00
|
|
|
|
2012-05-24 16:34:21 +00:00
|
|
|
#include <linux/uidgid.h>
|
2008-01-22 14:02:14 +00:00
|
|
|
#include <net/inet_frag.h>
|
2015-03-04 23:02:44 +00:00
|
|
|
#include <linux/rcupdate.h>
|
2019-03-27 19:40:33 +00:00
|
|
|
#include <linux/siphash.h>
|
2008-01-22 14:02:14 +00:00
|
|
|
|
2007-12-16 21:31:47 +00:00
|
|
|
struct ctl_table_header;
|
|
|
|
struct ipv4_devconf;
|
2008-01-10 11:27:51 +00:00
|
|
|
struct fib_rules_ops;
|
2008-01-10 11:28:24 +00:00
|
|
|
struct hlist_head;
|
2012-07-06 05:13:13 +00:00
|
|
|
struct fib_table;
|
2008-01-10 11:28:55 +00:00
|
|
|
struct sock;
|
2013-09-28 21:10:59 +00:00
|
|
|
struct local_ports {
|
|
|
|
seqlock_t lock;
|
|
|
|
int range[2];
|
2015-05-27 18:34:37 +00:00
|
|
|
bool warned;
|
2013-09-28 21:10:59 +00:00
|
|
|
};
|
2007-12-16 21:31:47 +00:00
|
|
|
|
2014-05-06 18:02:50 +00:00
|
|
|
struct ping_group_range {
|
|
|
|
seqlock_t lock;
|
|
|
|
kgid_t range[2];
|
|
|
|
};
|
|
|
|
|
2016-12-28 09:52:32 +00:00
|
|
|
struct inet_hashinfo;
|
|
|
|
|
|
|
|
struct inet_timewait_death_row {
|
2022-01-26 18:07:14 +00:00
|
|
|
refcount_t tw_refcount;
|
2016-12-28 09:52:32 +00:00
|
|
|
|
2022-01-26 18:07:14 +00:00
|
|
|
struct inet_hashinfo *hashinfo ____cacheline_aligned_in_smp;
|
2016-12-28 09:52:32 +00:00
|
|
|
int sysctl_max_tw_buckets;
|
|
|
|
};
|
|
|
|
|
2017-09-27 03:35:42 +00:00
|
|
|
struct tcp_fastopen_context;
|
|
|
|
|
2007-12-16 21:29:36 +00:00
|
|
|
struct netns_ipv4 {
|
2022-01-26 18:07:14 +00:00
|
|
|
struct inet_timewait_death_row *tcp_death_row;
|
inet: shrink inet_timewait_death_row by 48 bytes
struct inet_timewait_death_row uses two cache lines, because we want
tw_count to use a full cache line to avoid false sharing.
Rework its definition and placement in netns_ipv4 so that:
1) We add 60 bytes of padding after tw_count to avoid
false sharing, knowing that tcp_death_row will
have ____cacheline_aligned_in_smp attribute.
2) We do not risk padding before tcp_death_row, because
we move it at the beginning of netns_ipv4, even if new
fields are added later.
3) We do not waste 48 bytes of padding after it.
Note that I have not changed dccp.
pahole result for struct netns_ipv4 before/after the patch :
/* size: 832, cachelines: 13, members: 139 */
/* sum members: 721, holes: 12, sum holes: 95 */
/* padding: 16 */
/* paddings: 2, sum paddings: 55 */
->
/* size: 768, cachelines: 12, members: 139 */
/* sum members: 673, holes: 11, sum holes: 39 */
/* padding: 56 */
/* paddings: 2, sum paddings: 7 */
/* forced alignments: 1 */
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31 17:52:05 +00:00
|
|
|
|
2008-01-06 07:08:49 +00:00
|
|
|
#ifdef CONFIG_SYSCTL
|
2007-12-16 21:31:47 +00:00
|
|
|
struct ctl_table_header *forw_hdr;
|
2008-01-22 14:08:36 +00:00
|
|
|
struct ctl_table_header *frags_hdr;
|
2008-03-26 08:56:24 +00:00
|
|
|
struct ctl_table_header *ipv4_hdr;
|
2008-07-06 02:02:33 +00:00
|
|
|
struct ctl_table_header *route_hdr;
|
2013-02-06 09:46:33 +00:00
|
|
|
struct ctl_table_header *xfrm4_hdr;
|
2008-01-06 07:08:49 +00:00
|
|
|
#endif
|
2007-12-16 21:31:47 +00:00
|
|
|
struct ipv4_devconf *devconf_all;
|
|
|
|
struct ipv4_devconf *devconf_dflt;
|
2018-03-22 09:45:32 +00:00
|
|
|
struct ip_ra_chain __rcu *ra_chain;
|
2018-03-22 09:45:40 +00:00
|
|
|
struct mutex ra_mutex;
|
2008-01-10 11:27:51 +00:00
|
|
|
#ifdef CONFIG_IP_MULTIPLE_TABLES
|
|
|
|
struct fib_rules_ops *rules_ops;
|
2015-03-04 23:02:44 +00:00
|
|
|
struct fib_table __rcu *fib_main;
|
|
|
|
struct fib_table __rcu *fib_default;
|
inet: shrink netns_ipv4 by another cache line
By shuffling around some fields to remove 8 bytes of hole,
we can save one cache line.
pahole result before/after the patch :
/* size: 768, cachelines: 12, members: 139 */
/* sum members: 673, holes: 11, sum holes: 39 */
/* padding: 56 */
/* paddings: 2, sum paddings: 7 */
/* forced alignments: 1 */
->
/* size: 704, cachelines: 11, members: 139 */
/* sum members: 673, holes: 10, sum holes: 31 */
/* paddings: 2, sum paddings: 7 */
/* forced alignments: 1 */
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31 17:52:06 +00:00
|
|
|
unsigned int fib_rules_require_fldissect;
|
|
|
|
bool fib_has_custom_rules;
|
2012-07-06 05:13:13 +00:00
|
|
|
#endif
|
2017-09-22 01:18:23 +00:00
|
|
|
bool fib_has_custom_local_routes;
|
inet: shrink netns_ipv4 by another cache line
By shuffling around some fields to remove 8 bytes of hole,
we can save one cache line.
pahole result before/after the patch :
/* size: 768, cachelines: 12, members: 139 */
/* sum members: 673, holes: 11, sum holes: 39 */
/* padding: 56 */
/* paddings: 2, sum paddings: 7 */
/* forced alignments: 1 */
->
/* size: 704, cachelines: 11, members: 139 */
/* sum members: 673, holes: 10, sum holes: 31 */
/* paddings: 2, sum paddings: 7 */
/* forced alignments: 1 */
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31 17:52:06 +00:00
|
|
|
bool fib_offload_disabled;
|
2012-07-06 05:13:13 +00:00
|
|
|
#ifdef CONFIG_IP_ROUTE_CLASSID
|
2021-12-02 02:26:35 +00:00
|
|
|
atomic_t fib_num_tclassid_users;
|
2008-01-10 11:27:51 +00:00
|
|
|
#endif
|
2008-01-10 11:28:24 +00:00
|
|
|
struct hlist_head *fib_table_hash;
|
2008-01-10 11:28:55 +00:00
|
|
|
struct sock *fibnl;
|
2008-01-22 14:02:14 +00:00
|
|
|
|
2015-02-25 17:58:35 +00:00
|
|
|
struct sock *mc_autojoin_sk;
|
2015-01-29 23:58:09 +00:00
|
|
|
|
2012-06-08 01:20:41 +00:00
|
|
|
struct inet_peer_base *peers;
|
2019-05-24 16:03:39 +00:00
|
|
|
struct fqdir *fqdir;
|
2008-03-26 08:55:37 +00:00
|
|
|
|
ipv4: shrink netns_ipv4 with sysctl conversions
These sysctls that can fit in one byte instead of one int
are converted to save space and thus reduce cache line misses.
- icmp_echo_ignore_all, icmp_echo_ignore_broadcasts,
- icmp_ignore_bogus_error_responses, icmp_errors_use_inbound_ifaddr
- tcp_ecn, tcp_ecn_fallback
- ip_default_ttl, ip_no_pmtu_disc, ip_fwd_use_pmtu
- ip_nonlocal_bind, ip_autobind_reuse
- ip_dynaddr, ip_early_demux, raw_l3mdev_accept
- nexthop_compat_mode, fwmark_reflect
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 18:08:14 +00:00
|
|
|
u8 sysctl_icmp_echo_ignore_all;
|
2021-03-30 01:45:29 +00:00
|
|
|
u8 sysctl_icmp_echo_enable_probe;
|
ipv4: shrink netns_ipv4 with sysctl conversions
These sysctls that can fit in one byte instead of one int
are converted to save space and thus reduce cache line misses.
- icmp_echo_ignore_all, icmp_echo_ignore_broadcasts,
- icmp_ignore_bogus_error_responses, icmp_errors_use_inbound_ifaddr
- tcp_ecn, tcp_ecn_fallback
- ip_default_ttl, ip_no_pmtu_disc, ip_fwd_use_pmtu
- ip_nonlocal_bind, ip_autobind_reuse
- ip_dynaddr, ip_early_demux, raw_l3mdev_accept
- nexthop_compat_mode, fwmark_reflect
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 18:08:14 +00:00
|
|
|
u8 sysctl_icmp_echo_ignore_broadcasts;
|
|
|
|
u8 sysctl_icmp_ignore_bogus_error_responses;
|
|
|
|
u8 sysctl_icmp_errors_use_inbound_ifaddr;
|
2008-03-26 08:55:37 +00:00
|
|
|
int sysctl_icmp_ratelimit;
|
|
|
|
int sysctl_icmp_ratemask;
|
2008-07-06 02:02:59 +00:00
|
|
|
|
2022-01-04 10:59:34 +00:00
|
|
|
u32 ip_rt_min_pmtu;
|
2022-01-04 10:59:47 +00:00
|
|
|
int ip_rt_mtu_expires;
|
2022-01-26 07:10:58 +00:00
|
|
|
int ip_rt_min_advmss;
|
2022-01-04 10:59:34 +00:00
|
|
|
|
2014-05-06 18:02:49 +00:00
|
|
|
struct local_ports ip_local_ports;
|
2013-09-28 21:10:59 +00:00
|
|
|
|
ipv4: shrink netns_ipv4 with sysctl conversions
These sysctls that can fit in one byte instead of one int
are converted to save space and thus reduce cache line misses.
- icmp_echo_ignore_all, icmp_echo_ignore_broadcasts,
- icmp_ignore_bogus_error_responses, icmp_errors_use_inbound_ifaddr
- tcp_ecn, tcp_ecn_fallback
- ip_default_ttl, ip_no_pmtu_disc, ip_fwd_use_pmtu
- ip_nonlocal_bind, ip_autobind_reuse
- ip_dynaddr, ip_early_demux, raw_l3mdev_accept
- nexthop_compat_mode, fwmark_reflect
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 18:08:14 +00:00
|
|
|
u8 sysctl_tcp_ecn;
|
|
|
|
u8 sysctl_tcp_ecn_fallback;
|
tcp: add rfc3168, section 6.1.1.1. fallback
This work as a follow-up of commit f7b3bec6f516 ("net: allow setting ecn
via routing table") and adds RFC3168 section 6.1.1.1. fallback for outgoing
ECN connections. In other words, this work adds a retry with a non-ECN
setup SYN packet, as suggested from the RFC on the first timeout:
[...] A host that receives no reply to an ECN-setup SYN within the
normal SYN retransmission timeout interval MAY resend the SYN and
any subsequent SYN retransmissions with CWR and ECE cleared. [...]
Schematic client-side view when assuming the server is in tcp_ecn=2 mode,
that is, Linux default since 2009 via commit 255cac91c3c9 ("tcp: extend
ECN sysctl to allow server-side only ECN"):
1) Normal ECN-capable path:
SYN ECE CWR ----->
<----- SYN ACK ECE
ACK ----->
2) Path with broken middlebox, when client has fallback:
SYN ECE CWR ----X crappy middlebox drops packet
(timeout, rtx)
SYN ----->
<----- SYN ACK
ACK ----->
In case we would not have the fallback implemented, the middlebox drop
point would basically end up as:
SYN ECE CWR ----X crappy middlebox drops packet
(timeout, rtx)
SYN ECE CWR ----X crappy middlebox drops packet
(timeout, rtx)
SYN ECE CWR ----X crappy middlebox drops packet
(timeout, rtx)
In any case, it's rather a smaller percentage of sites where there would
occur such additional setup latency: it was found in end of 2014 that ~56%
of IPv4 and 65% of IPv6 servers of Alexa 1 million list would negotiate
ECN (aka tcp_ecn=2 default), 0.42% of these webservers will fail to connect
when trying to negotiate with ECN (tcp_ecn=1) due to timeouts, which the
fallback would mitigate with a slight latency trade-off. Recent related
paper on this topic:
Brian Trammell, Mirja Kühlewind, Damiano Boppart, Iain Learmonth,
Gorry Fairhurst, and Richard Scheffenegger:
"Enabling Internet-Wide Deployment of Explicit Congestion Notification."
Proc. PAM 2015, New York.
http://ecn.ethz.ch/ecn-pam15.pdf
Thus, when net.ipv4.tcp_ecn=1 is being set, the patch will perform RFC3168,
section 6.1.1.1. fallback on timeout. For users explicitly not wanting this
which can be in DC use case, we add a net.ipv4.tcp_ecn_fallback knob that
allows for disabling the fallback.
tp->ecn_flags are not being cleared in tcp_ecn_clear_syn() on output, but
rather we let tcp_ecn_rcv_synack() take that over on input path in case a
SYN ACK ECE was delayed. Thus a spurious SYN retransmission will not prevent
ECN being negotiated eventually in that case.
Reference: https://www.ietf.org/proceedings/92/slides/slides-92-iccrg-1.pdf
Reference: https://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch>
Signed-off-by: Brian Trammell <trammell@tik.ee.ethz.ch>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Dave That <dave.taht@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-19 19:04:22 +00:00
|
|
|
|
ipv4: shrink netns_ipv4 with sysctl conversions
These sysctls that can fit in one byte instead of one int
are converted to save space and thus reduce cache line misses.
- icmp_echo_ignore_all, icmp_echo_ignore_broadcasts,
- icmp_ignore_bogus_error_responses, icmp_errors_use_inbound_ifaddr
- tcp_ecn, tcp_ecn_fallback
- ip_default_ttl, ip_no_pmtu_disc, ip_fwd_use_pmtu
- ip_nonlocal_bind, ip_autobind_reuse
- ip_dynaddr, ip_early_demux, raw_l3mdev_accept
- nexthop_compat_mode, fwmark_reflect
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 18:08:14 +00:00
|
|
|
u8 sysctl_ip_default_ttl;
|
|
|
|
u8 sysctl_ip_no_pmtu_disc;
|
|
|
|
u8 sysctl_ip_fwd_use_pmtu;
|
2021-03-25 18:08:15 +00:00
|
|
|
u8 sysctl_ip_fwd_update_priority;
|
ipv4: shrink netns_ipv4 with sysctl conversions
These sysctls that can fit in one byte instead of one int
are converted to save space and thus reduce cache line misses.
- icmp_echo_ignore_all, icmp_echo_ignore_broadcasts,
- icmp_ignore_bogus_error_responses, icmp_errors_use_inbound_ifaddr
- tcp_ecn, tcp_ecn_fallback
- ip_default_ttl, ip_no_pmtu_disc, ip_fwd_use_pmtu
- ip_nonlocal_bind, ip_autobind_reuse
- ip_dynaddr, ip_early_demux, raw_l3mdev_accept
- nexthop_compat_mode, fwmark_reflect
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 18:08:14 +00:00
|
|
|
u8 sysctl_ip_nonlocal_bind;
|
|
|
|
u8 sysctl_ip_autobind_reuse;
|
2016-02-15 10:11:29 +00:00
|
|
|
/* Shall we try to damage output packets if routing dev changes? */
|
ipv4: shrink netns_ipv4 with sysctl conversions
These sysctls that can fit in one byte instead of one int
are converted to save space and thus reduce cache line misses.
- icmp_echo_ignore_all, icmp_echo_ignore_broadcasts,
- icmp_ignore_bogus_error_responses, icmp_errors_use_inbound_ifaddr
- tcp_ecn, tcp_ecn_fallback
- ip_default_ttl, ip_no_pmtu_disc, ip_fwd_use_pmtu
- ip_nonlocal_bind, ip_autobind_reuse
- ip_dynaddr, ip_early_demux, raw_l3mdev_accept
- nexthop_compat_mode, fwmark_reflect
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 18:08:14 +00:00
|
|
|
u8 sysctl_ip_dynaddr;
|
|
|
|
u8 sysctl_ip_early_demux;
|
2018-11-07 15:36:05 +00:00
|
|
|
#ifdef CONFIG_NET_L3_MASTER_DEV
|
ipv4: shrink netns_ipv4 with sysctl conversions
These sysctls that can fit in one byte instead of one int
are converted to save space and thus reduce cache line misses.
- icmp_echo_ignore_all, icmp_echo_ignore_broadcasts,
- icmp_ignore_bogus_error_responses, icmp_errors_use_inbound_ifaddr
- tcp_ecn, tcp_ecn_fallback
- ip_default_ttl, ip_no_pmtu_disc, ip_fwd_use_pmtu
- ip_nonlocal_bind, ip_autobind_reuse
- ip_dynaddr, ip_early_demux, raw_l3mdev_accept
- nexthop_compat_mode, fwmark_reflect
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 18:08:14 +00:00
|
|
|
u8 sysctl_raw_l3mdev_accept;
|
2018-11-07 15:36:05 +00:00
|
|
|
#endif
|
2021-03-25 18:08:16 +00:00
|
|
|
u8 sysctl_tcp_early_demux;
|
|
|
|
u8 sysctl_udp_early_demux;
|
2013-01-05 16:10:48 +00:00
|
|
|
|
ipv4: shrink netns_ipv4 with sysctl conversions
These sysctls that can fit in one byte instead of one int
are converted to save space and thus reduce cache line misses.
- icmp_echo_ignore_all, icmp_echo_ignore_broadcasts,
- icmp_ignore_bogus_error_responses, icmp_errors_use_inbound_ifaddr
- tcp_ecn, tcp_ecn_fallback
- ip_default_ttl, ip_no_pmtu_disc, ip_fwd_use_pmtu
- ip_nonlocal_bind, ip_autobind_reuse
- ip_dynaddr, ip_early_demux, raw_l3mdev_accept
- nexthop_compat_mode, fwmark_reflect
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 18:08:14 +00:00
|
|
|
u8 sysctl_nexthop_compat_mode;
|
2020-04-27 20:56:46 +00:00
|
|
|
|
ipv4: shrink netns_ipv4 with sysctl conversions
These sysctls that can fit in one byte instead of one int
are converted to save space and thus reduce cache line misses.
- icmp_echo_ignore_all, icmp_echo_ignore_broadcasts,
- icmp_ignore_bogus_error_responses, icmp_errors_use_inbound_ifaddr
- tcp_ecn, tcp_ecn_fallback
- ip_default_ttl, ip_no_pmtu_disc, ip_fwd_use_pmtu
- ip_nonlocal_bind, ip_autobind_reuse
- ip_dynaddr, ip_early_demux, raw_l3mdev_accept
- nexthop_compat_mode, fwmark_reflect
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 18:08:14 +00:00
|
|
|
u8 sysctl_fwmark_reflect;
|
2021-03-25 18:08:17 +00:00
|
|
|
u8 sysctl_tcp_fwmark_accept;
|
2015-12-16 21:20:44 +00:00
|
|
|
#ifdef CONFIG_NET_L3_MASTER_DEV
|
2021-03-25 18:08:17 +00:00
|
|
|
u8 sysctl_tcp_l3mdev_accept;
|
2015-12-16 21:20:44 +00:00
|
|
|
#endif
|
2021-03-25 18:08:17 +00:00
|
|
|
u8 sysctl_tcp_mtu_probing;
|
2019-08-07 23:52:29 +00:00
|
|
|
int sysctl_tcp_mtu_probe_floor;
|
2015-02-10 01:53:16 +00:00
|
|
|
int sysctl_tcp_base_mss;
|
2019-06-06 16:15:31 +00:00
|
|
|
int sysctl_tcp_min_snd_mss;
|
2015-03-06 03:18:23 +00:00
|
|
|
int sysctl_tcp_probe_threshold;
|
2015-03-06 03:18:24 +00:00
|
|
|
u32 sysctl_tcp_probe_interval;
|
2014-05-13 17:17:33 +00:00
|
|
|
|
2016-01-07 14:38:43 +00:00
|
|
|
int sysctl_tcp_keepalive_time;
|
2016-01-07 14:38:45 +00:00
|
|
|
int sysctl_tcp_keepalive_intvl;
|
2021-03-25 18:08:17 +00:00
|
|
|
u8 sysctl_tcp_keepalive_probes;
|
2016-01-07 14:38:43 +00:00
|
|
|
|
2021-03-25 18:08:17 +00:00
|
|
|
u8 sysctl_tcp_syn_retries;
|
|
|
|
u8 sysctl_tcp_synack_retries;
|
|
|
|
u8 sysctl_tcp_syncookies;
|
2021-06-12 12:32:14 +00:00
|
|
|
u8 sysctl_tcp_migrate_req;
|
tcp: adjust TSO packet sizes based on min_rtt
Back when tcp_tso_autosize() and TCP pacing were introduced,
our focus was really to reduce burst sizes for long distance
flows.
The simple heuristic of using sk_pacing_rate/1024 has worked
well, but can lead to too small packets for hosts in the same
rack/cluster, when thousands of flows compete for the bottleneck.
Neal Cardwell had the idea of making the TSO burst size
a function of both sk_pacing_rate and tcp_min_rtt()
Indeed, for local flows, sending bigger bursts is better
to reduce cpu costs, as occasional losses can be repaired
quite fast.
This patch is based on Neal Cardwell implementation
done more than two years ago.
bbr is adjusting max_pacing_rate based on measured bandwidth,
while cubic would over estimate max_pacing_rate.
/proc/sys/net/ipv4/tcp_tso_rtt_log can be used to tune or disable
this new feature, in logarithmic steps.
Tested:
100Gbit NIC, two hosts in the same rack, 4K MTU.
600 flows rate-limited to 20000000 bytes per second.
Before patch: (TSO sizes would be limited to 20000000/1024/4096 -> 4 segments per TSO)
~# echo 0 >/proc/sys/net/ipv4/tcp_tso_rtt_log
~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered"
96005
Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000':
65,945.29 msec task-clock # 2.845 CPUs utilized
1,314,632 context-switches # 19935.279 M/sec
5,292 cpu-migrations # 80.249 M/sec
940,641 page-faults # 14264.023 M/sec
201,117,030,926 cycles # 3049769.216 GHz (83.45%)
17,699,435,405 stalled-cycles-frontend # 8.80% frontend cycles idle (83.48%)
136,584,015,071 stalled-cycles-backend # 67.91% backend cycles idle (83.44%)
53,809,530,436 instructions # 0.27 insn per cycle
# 2.54 stalled cycles per insn (83.36%)
9,062,315,523 branches # 137422329.563 M/sec (83.22%)
153,008,621 branch-misses # 1.69% of all branches (83.32%)
23.182970846 seconds time elapsed
TcpInSegs 15648792 0.0
TcpOutSegs 58659110 0.0 # Average of 3.7 4K segments per TSO packet
TcpExtTCPDelivered 58654791 0.0
TcpExtTCPDeliveredCE 19 0.0
After patch:
~# echo 9 >/proc/sys/net/ipv4/tcp_tso_rtt_log
~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered"
96046
Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000':
48,982.58 msec task-clock # 2.104 CPUs utilized
186,014 context-switches # 3797.599 M/sec
3,109 cpu-migrations # 63.472 M/sec
941,180 page-faults # 19214.814 M/sec
153,459,763,868 cycles # 3132982.807 GHz (83.56%)
12,069,861,356 stalled-cycles-frontend # 7.87% frontend cycles idle (83.32%)
120,485,917,953 stalled-cycles-backend # 78.51% backend cycles idle (83.24%)
36,803,672,106 instructions # 0.24 insn per cycle
# 3.27 stalled cycles per insn (83.18%)
5,947,266,275 branches # 121417383.427 M/sec (83.64%)
87,984,616 branch-misses # 1.48% of all branches (83.43%)
23.281200256 seconds time elapsed
TcpInSegs 1434706 0.0
TcpOutSegs 58883378 0.0 # Average of 41 4K segments per TSO packet
TcpExtTCPDelivered 58878971 0.0
TcpExtTCPDeliveredCE 9664 0.0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20220309015757.2532973-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09 01:57:57 +00:00
|
|
|
u8 sysctl_tcp_comp_sack_nr;
|
2016-02-03 07:46:52 +00:00
|
|
|
int sysctl_tcp_reordering;
|
2021-03-25 18:08:17 +00:00
|
|
|
u8 sysctl_tcp_retries1;
|
|
|
|
u8 sysctl_tcp_retries2;
|
|
|
|
u8 sysctl_tcp_orphan_retries;
|
|
|
|
u8 sysctl_tcp_tw_reuse;
|
2016-02-03 07:46:56 +00:00
|
|
|
int sysctl_tcp_fin_timeout;
|
2016-02-03 07:46:57 +00:00
|
|
|
unsigned int sysctl_tcp_notsent_lowat;
|
2021-03-25 18:08:17 +00:00
|
|
|
u8 sysctl_tcp_sack;
|
|
|
|
u8 sysctl_tcp_window_scaling;
|
|
|
|
u8 sysctl_tcp_timestamps;
|
|
|
|
u8 sysctl_tcp_early_retrans;
|
|
|
|
u8 sysctl_tcp_recovery;
|
|
|
|
u8 sysctl_tcp_thin_linear_timeouts;
|
|
|
|
u8 sysctl_tcp_slow_start_after_idle;
|
|
|
|
u8 sysctl_tcp_retrans_collapse;
|
|
|
|
u8 sysctl_tcp_stdurg;
|
|
|
|
u8 sysctl_tcp_rfc1337;
|
|
|
|
u8 sysctl_tcp_abort_on_overflow;
|
|
|
|
u8 sysctl_tcp_fack; /* obsolete */
|
2017-10-27 04:55:06 +00:00
|
|
|
int sysctl_tcp_max_reordering;
|
2017-10-27 04:55:09 +00:00
|
|
|
int sysctl_tcp_adv_win_scale;
|
2021-03-25 18:08:17 +00:00
|
|
|
u8 sysctl_tcp_dsack;
|
|
|
|
u8 sysctl_tcp_app_win;
|
|
|
|
u8 sysctl_tcp_frto;
|
|
|
|
u8 sysctl_tcp_nometrics_save;
|
|
|
|
u8 sysctl_tcp_no_ssthresh_metrics_save;
|
|
|
|
u8 sysctl_tcp_moderate_rcvbuf;
|
|
|
|
u8 sysctl_tcp_tso_win_divisor;
|
|
|
|
u8 sysctl_tcp_workaround_signed_windows;
|
2017-10-27 14:47:25 +00:00
|
|
|
int sysctl_tcp_limit_output_bytes;
|
2017-10-27 14:47:26 +00:00
|
|
|
int sysctl_tcp_challenge_ack_limit;
|
2017-10-27 14:47:28 +00:00
|
|
|
int sysctl_tcp_min_rtt_wlen;
|
2021-03-25 18:08:17 +00:00
|
|
|
u8 sysctl_tcp_min_tso_segs;
|
tcp: adjust TSO packet sizes based on min_rtt
Back when tcp_tso_autosize() and TCP pacing were introduced,
our focus was really to reduce burst sizes for long distance
flows.
The simple heuristic of using sk_pacing_rate/1024 has worked
well, but can lead to too small packets for hosts in the same
rack/cluster, when thousands of flows compete for the bottleneck.
Neal Cardwell had the idea of making the TSO burst size
a function of both sk_pacing_rate and tcp_min_rtt()
Indeed, for local flows, sending bigger bursts is better
to reduce cpu costs, as occasional losses can be repaired
quite fast.
This patch is based on Neal Cardwell implementation
done more than two years ago.
bbr is adjusting max_pacing_rate based on measured bandwidth,
while cubic would over estimate max_pacing_rate.
/proc/sys/net/ipv4/tcp_tso_rtt_log can be used to tune or disable
this new feature, in logarithmic steps.
Tested:
100Gbit NIC, two hosts in the same rack, 4K MTU.
600 flows rate-limited to 20000000 bytes per second.
Before patch: (TSO sizes would be limited to 20000000/1024/4096 -> 4 segments per TSO)
~# echo 0 >/proc/sys/net/ipv4/tcp_tso_rtt_log
~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered"
96005
Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000':
65,945.29 msec task-clock # 2.845 CPUs utilized
1,314,632 context-switches # 19935.279 M/sec
5,292 cpu-migrations # 80.249 M/sec
940,641 page-faults # 14264.023 M/sec
201,117,030,926 cycles # 3049769.216 GHz (83.45%)
17,699,435,405 stalled-cycles-frontend # 8.80% frontend cycles idle (83.48%)
136,584,015,071 stalled-cycles-backend # 67.91% backend cycles idle (83.44%)
53,809,530,436 instructions # 0.27 insn per cycle
# 2.54 stalled cycles per insn (83.36%)
9,062,315,523 branches # 137422329.563 M/sec (83.22%)
153,008,621 branch-misses # 1.69% of all branches (83.32%)
23.182970846 seconds time elapsed
TcpInSegs 15648792 0.0
TcpOutSegs 58659110 0.0 # Average of 3.7 4K segments per TSO packet
TcpExtTCPDelivered 58654791 0.0
TcpExtTCPDeliveredCE 19 0.0
After patch:
~# echo 9 >/proc/sys/net/ipv4/tcp_tso_rtt_log
~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered"
96046
Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000':
48,982.58 msec task-clock # 2.104 CPUs utilized
186,014 context-switches # 3797.599 M/sec
3,109 cpu-migrations # 63.472 M/sec
941,180 page-faults # 19214.814 M/sec
153,459,763,868 cycles # 3132982.807 GHz (83.56%)
12,069,861,356 stalled-cycles-frontend # 7.87% frontend cycles idle (83.32%)
120,485,917,953 stalled-cycles-backend # 78.51% backend cycles idle (83.24%)
36,803,672,106 instructions # 0.24 insn per cycle
# 3.27 stalled cycles per insn (83.18%)
5,947,266,275 branches # 121417383.427 M/sec (83.64%)
87,984,616 branch-misses # 1.48% of all branches (83.43%)
23.281200256 seconds time elapsed
TcpInSegs 1434706 0.0
TcpOutSegs 58883378 0.0 # Average of 41 4K segments per TSO packet
TcpExtTCPDelivered 58878971 0.0
TcpExtTCPDeliveredCE 9664 0.0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20220309015757.2532973-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09 01:57:57 +00:00
|
|
|
u8 sysctl_tcp_tso_rtt_log;
|
2021-03-25 18:08:17 +00:00
|
|
|
u8 sysctl_tcp_autocorking;
|
|
|
|
u8 sysctl_tcp_reflect_tos;
|
2017-10-27 14:47:30 +00:00
|
|
|
int sysctl_tcp_invalid_ratelimit;
|
2017-10-27 14:47:31 +00:00
|
|
|
int sysctl_tcp_pacing_ss_ratio;
|
2017-10-27 14:47:32 +00:00
|
|
|
int sysctl_tcp_pacing_ca_ratio;
|
2017-11-07 08:29:28 +00:00
|
|
|
int sysctl_tcp_wmem[3];
|
|
|
|
int sysctl_tcp_rmem[3];
|
2018-05-17 21:47:28 +00:00
|
|
|
unsigned long sysctl_tcp_comp_sack_delay_ns;
|
2020-04-30 17:35:43 +00:00
|
|
|
unsigned long sysctl_tcp_comp_sack_slack_ns;
|
2016-12-28 09:52:33 +00:00
|
|
|
int sysctl_max_syn_backlog;
|
2017-09-27 03:35:40 +00:00
|
|
|
int sysctl_tcp_fastopen;
|
2017-11-14 16:25:49 +00:00
|
|
|
const struct tcp_congestion_ops __rcu *tcp_congestion_control;
|
2017-09-27 03:35:42 +00:00
|
|
|
struct tcp_fastopen_context __rcu *tcp_fastopen_ctx;
|
2017-09-27 03:35:43 +00:00
|
|
|
unsigned int sysctl_tcp_fastopen_blackhole_timeout;
|
|
|
|
atomic_t tfo_active_disable_times;
|
|
|
|
unsigned long tfo_active_disable_stamp;
|
2016-02-03 07:46:51 +00:00
|
|
|
|
2018-03-14 04:57:16 +00:00
|
|
|
int sysctl_udp_wmem_min;
|
|
|
|
int sysctl_udp_rmem_min;
|
|
|
|
|
2021-03-31 17:52:07 +00:00
|
|
|
u8 sysctl_fib_notify_on_flag_change;
|
2021-02-01 19:47:52 +00:00
|
|
|
|
2017-01-26 18:02:24 +00:00
|
|
|
#ifdef CONFIG_NET_L3_MASTER_DEV
|
2021-03-31 17:52:08 +00:00
|
|
|
u8 sysctl_udp_l3mdev_accept;
|
2017-01-26 18:02:24 +00:00
|
|
|
#endif
|
|
|
|
|
2021-03-31 17:52:10 +00:00
|
|
|
u8 sysctl_igmp_llm_reports;
|
2016-02-08 21:29:21 +00:00
|
|
|
int sysctl_igmp_max_memberships;
|
2016-02-08 21:29:22 +00:00
|
|
|
int sysctl_igmp_max_msf;
|
2016-02-08 21:29:24 +00:00
|
|
|
int sysctl_igmp_qrv;
|
2016-02-08 21:29:21 +00:00
|
|
|
|
2014-05-06 18:02:50 +00:00
|
|
|
struct ping_group_range ping_group_range;
|
net: ipv4: add IPPROTO_ICMP socket kind
This patch adds IPPROTO_ICMP socket kind. It makes it possible to send
ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages
without any special privileges. In other words, the patch makes it
possible to implement setuid-less and CAP_NET_RAW-less /bin/ping. In
order not to increase the kernel's attack surface, the new functionality
is disabled by default, but is enabled at bootup by supporting Linux
distributions, optionally with restriction to a group or a group range
(see below).
Similar functionality is implemented in Mac OS X:
http://www.manpagez.com/man/4/icmp/
A new ping socket is created with
socket(PF_INET, SOCK_DGRAM, PROT_ICMP)
Message identifiers (octets 4-5 of ICMP header) are interpreted as local
ports. Addresses are stored in struct sockaddr_in. No port numbers are
reserved for privileged processes, port 0 is reserved for API ("let the
kernel pick a free number"). There is no notion of remote ports, remote
port numbers provided by the user (e.g. in connect()) are ignored.
Data sent and received include ICMP headers. This is deliberate to:
1) Avoid the need to transport headers values like sequence numbers by
other means.
2) Make it easier to port existing programs using raw sockets.
ICMP headers given to send() are checked and sanitized. The type must be
ICMP_ECHO and the code must be zero (future extensions might relax this,
see below). The id is set to the number (local port) of the socket, the
checksum is always recomputed.
ICMP reply packets received from the network are demultiplexed according
to their id's, and are returned by recv() without any modifications.
IP header information and ICMP errors of those packets may be obtained
via ancillary data (IP_RECVTTL, IP_RETOPTS, and IP_RECVERR). ICMP source
quenches and redirects are reported as fake errors via the error queue
(IP_RECVERR); the next hop address for redirects is saved to ee_info (in
network order).
socket(2) is restricted to the group range specified in
"/proc/sys/net/ipv4/ping_group_range". It is "1 0" by default, meaning
that nobody (not even root) may create ping sockets. Setting it to "100
100" would grant permissions to the single group (to either make
/sbin/ping g+s and owned by this group or to grant permissions to the
"netadmins" group), "0 4294967295" would enable it for the world, "100
4294967295" would enable it for the users, but not daemons.
The existing code might be (in the unlikely case anyone needs it)
extended rather easily to handle other similar pairs of ICMP messages
(Timestamp/Reply, Information Request/Reply, Address Mask Request/Reply
etc.).
Userspace ping util & patch for it:
http://openwall.info/wiki/people/segoon/ping
For Openwall GNU/*/Linux it was the last step on the road to the
setuid-less distro. A revision of this patch (for RHEL5/OpenVZ kernels)
is in use in Owl-current, such as in the 2011/03/12 LiveCD ISOs:
http://mirrors.kernel.org/openwall/Owl/current/iso/
Initially this functionality was written by Pavel Kankovsky for
Linux 2.4.32, but unfortunately it was never made public.
All ping options (-b, -p, -Q, -R, -s, -t, -T, -M, -I), are tested with
the patch.
PATCH v3:
- switched to flowi4.
- minor changes to be consistent with raw sockets code.
PATCH v2:
- changed ping_debug() to pr_debug().
- removed CONFIG_IP_PING.
- removed ping_seq_fops.owner field (unused for procfs).
- switched to proc_net_fops_create().
- switched to %pK in seq_printf().
PATCH v1:
- fixed checksumming bug.
- CAP_NET_RAW may not create icmp sockets anymore.
RFC v2:
- minor cleanups.
- introduced sysctl'able group range to restrict socket(2).
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13 10:01:00 +00:00
|
|
|
|
2011-03-25 00:42:21 +00:00
|
|
|
atomic_t dev_addr_genid;
|
2009-01-22 04:56:15 +00:00
|
|
|
|
2014-05-12 23:04:53 +00:00
|
|
|
#ifdef CONFIG_SYSCTL
|
|
|
|
unsigned long *sysctl_local_reserved_ports;
|
2017-01-21 01:49:11 +00:00
|
|
|
int sysctl_ip_prot_sock;
|
2014-05-12 23:04:53 +00:00
|
|
|
#endif
|
|
|
|
|
2009-01-22 04:56:15 +00:00
|
|
|
#ifdef CONFIG_IP_MROUTE
|
ipv4: ipmr: support multiple tables
This patch adds support for multiple independant multicast routing instances,
named "tables".
Userspace multicast routing daemons can bind to a specific table instance by
issuing a setsockopt call using a new option MRT_TABLE. The table number is
stored in the raw socket data and affects all following ipmr setsockopt(),
getsockopt() and ioctl() calls. By default, a single table (RT_TABLE_DEFAULT)
is created with a default routing rule pointing to it. Newly created pimreg
devices have the table number appended ("pimregX"), with the exception of
devices created in the default table, which are named just "pimreg" for
compatibility reasons.
Packets are directed to a specific table instance using routing rules,
similar to how regular routing rules work. Currently iif, oif and mark
are supported as keys, source and destination addresses could be supported
additionally.
Example usage:
- bind pimd/xorp/... to a specific table:
uint32_t table = 123;
setsockopt(fd, IPPROTO_IP, MRT_TABLE, &table, sizeof(table));
- create routing rules directing packets to the new table:
# ip mrule add iif eth0 lookup 123
# ip mrule add oif eth0 lookup 123
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 05:03:23 +00:00
|
|
|
#ifndef CONFIG_IP_MROUTE_MULTIPLE_TABLES
|
2010-04-13 05:03:22 +00:00
|
|
|
struct mr_table *mrt;
|
ipv4: ipmr: support multiple tables
This patch adds support for multiple independant multicast routing instances,
named "tables".
Userspace multicast routing daemons can bind to a specific table instance by
issuing a setsockopt call using a new option MRT_TABLE. The table number is
stored in the raw socket data and affects all following ipmr setsockopt(),
getsockopt() and ioctl() calls. By default, a single table (RT_TABLE_DEFAULT)
is created with a default routing rule pointing to it. Newly created pimreg
devices have the table number appended ("pimregX"), with the exception of
devices created in the default table, which are named just "pimreg" for
compatibility reasons.
Packets are directed to a specific table instance using routing rules,
similar to how regular routing rules work. Currently iif, oif and mark
are supported as keys, source and destination addresses could be supported
additionally.
Example usage:
- bind pimd/xorp/... to a specific table:
uint32_t table = 123;
setsockopt(fd, IPPROTO_IP, MRT_TABLE, &table, sizeof(table));
- create routing rules directing packets to the new table:
# ip mrule add iif eth0 lookup 123
# ip mrule add oif eth0 lookup 123
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 05:03:23 +00:00
|
|
|
#else
|
|
|
|
struct list_head mr_tables;
|
|
|
|
struct fib_rules_ops *mr_rules_ops;
|
|
|
|
#endif
|
2016-04-07 14:21:00 +00:00
|
|
|
#endif
|
|
|
|
#ifdef CONFIG_IP_ROUTE_MULTIPATH
|
2021-05-17 18:15:18 +00:00
|
|
|
u32 sysctl_fib_multipath_hash_fields;
|
2021-03-31 17:52:09 +00:00
|
|
|
u8 sysctl_fib_multipath_use_neigh;
|
|
|
|
u8 sysctl_fib_multipath_hash_policy;
|
2009-01-22 04:56:15 +00:00
|
|
|
#endif
|
2016-12-03 15:45:06 +00:00
|
|
|
|
2017-08-03 11:28:11 +00:00
|
|
|
struct fib_notifier_ops *notifier_ops;
|
2016-12-03 15:45:06 +00:00
|
|
|
unsigned int fib_seq; /* protected by rtnl_mutex */
|
|
|
|
|
2017-09-27 06:23:13 +00:00
|
|
|
struct fib_notifier_ops *ipmr_notifier_ops;
|
|
|
|
unsigned int ipmr_seq; /* protected by rtnl_mutex */
|
|
|
|
|
2013-07-30 00:33:53 +00:00
|
|
|
atomic_t rt_genid;
|
2019-03-27 19:40:33 +00:00
|
|
|
siphash_key_t ip_id_key;
|
2007-12-16 21:29:36 +00:00
|
|
|
};
|
|
|
|
#endif
|