In order to have better cache layouts of struct sock (separate zones
for rx/tx paths), we need this preliminary patch.
Goal is to transfert fields used at lookup time in the first
read-mostly cache line (inside struct sock_common) and move sk_refcnt
to a separate cache line (only written by rx path)
This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
sport and id fields. This allows a future patch to define these
fields as macros, like sk_refcnt, without name clashes.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Atis Elsts wrote:
> Not sure if there is need to fill the mark from skb in tunnel xmit functions. In any case, it's not done for GRE or IPIP tunnels at the moment.
Ok, I'll just drop that part, I'm not sure what should be done in this case.
> Also, in this patch you are doing that for SIT (v6-in-v4) tunnels only, and not doing it for v4-in-v6 or v6-in-v6 tunnels. Any reason for that?
I just sent that patch out too quickly, here's a better one with the updates.
Add support for IPv6 route lookups using sk_mark.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.
Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch addresses:
* assigning -1 to np->tclass as it is currently done is not very meaningful,
since it turns into 0xff;
* RFC 3542, 6.5 allows -1 for clearing the sticky IPV6_TCLASS option
and specifies -1 to mean "use kernel default":
- RFC 2460, 7. requires that the default traffic class must be zero for
all 8 bits,
- this is consistent with RFC 2474, 4.1 which recommends a default PHB of 0,
in combination with a value of the ECN field of "non-ECT" (RFC 3168, 5.).
This patch changes the meaning of -1 from assigning 255 to mean the RFC 2460
default, which at the same time allows to satisfy clearing the sticky TCLASS
option as per RFC 3542, 6.5.
(When passing -1 as ancillary data, the fallback remains np->tclass, which
has either been set via socket options, or contains the default value.)
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This replaces assignments of the type "int on LHS" = "u8 on RHS" with
simpler code. The LHS can express all of the unsigned right hand side
values, hence the assigned value can not be negative.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
After switch (rthdr->type) {...},the check below is completely useless.Because:
if the type is 2,then hdrlen must be 2 and segments_left must be 1,clearly the
check is redundant;if the type is not 2,then goto sticky_done,the check is useless
too.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Reviewed-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove some pointless conditionals before kfree_skb().
The semantic match that finds the problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
expression E;
@@
- if (E)
- kfree_skb(E);
+ kfree_skb(E);
// </smpl>
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
When get receiving interface index while no message is received,
the the value seted with setsockopt() should be returned.
RFC 3542:
Issuing getsockopt() for the above options will return the sticky
option value i.e., the value set with setsockopt(). If no sticky
option value has been set getsockopt() will return the following
values:
- For the IPV6_PKTINFO option, it will return an in6_pktinfo
structure with ipi6_addr being in6addr_any and ipi6_ifindex being
zero.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are three reasons for me to add this support:
1.When no interface is specified in an IPV6_PKTINFO ancillary data
item, the interface specified in an IPV6_PKTINFO sticky optionis
is used.
RFC3542:
6.7. Summary of Outgoing Interface Selection
This document and [RFC-3493] specify various methods that affect the
selection of the packet's outgoing interface. This subsection
summarizes the ordering among those in order to ensure deterministic
behavior.
For a given outgoing packet on a given socket, the outgoing interface
is determined in the following order:
1. if an interface is specified in an IPV6_PKTINFO ancillary data
item, the interface is used.
2. otherwise, if an interface is specified in an IPV6_PKTINFO sticky
option, the interface is used.
2.When no IPV6_PKTINFO ancillary data is received,getsockopt() should
return the sticky option value which set with setsockopt().
RFC 3542:
Issuing getsockopt() for the above options will return the sticky
option value i.e., the value set with setsockopt(). If no sticky
option value has been set getsockopt() will return the following
values:
3.Make the setsockopt implementation POSIX compliant.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes two bugs:
1. setsockopt() of anything but a Type 2 routing header should return
EINVAL instead of EPERM. Noticed by Shan Wei
(shanwei@cn.fujitsu.com).
2. setsockopt()/sendmsg() of a Type 2 routing header with invalid
length or segments should return EINVAL. These values are statically
fixed in RFC 3775, unlike the variable Type 0 was.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When get receiving interface index while no message is received,
the bounded device's index of the socket should be returned.
RFC 3542:
Issuing getsockopt() for the above options will return the sticky
option value i.e., the value set with setsockopt(). If no sticky
option value has been set getsockopt() will return the following
values:
- For the IPV6_PKTINFO option, it will return an in6_pktinfo
structure with ipi6_addr being in6addr_any and ipi6_ifindex being
zero.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When Set Hop-by-Hop options header with NULL data
pointer and optlen is not zero use setsockopt(),
the kernel successfully return 0 instead of
return error EINVAL or EFAULT.
This patch fix the problem.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the sticky Hop-by-Hop options header by calling setsockopt()
for IPV6_HOPOPTS with a zero option length, per RFC3542.
Routing header and Destination options header does the same as
Hop-by-Hop options header.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes CVS keywords that weren't updated for a long time
from comments.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPV6_MULTICAST_HOPS, for example, is not valid for stream sockets.
Since they are virtually unavailable for stream sockets,
we should return ENOPROTOOPT instead of EINVAL.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Only 0 and 1 are valid for IPV6_MULTICAST_LOOP socket option,
and we should return an error of EINVAL otherwise, per RFC3493.
Based on patch from Shan Wei <shanwei@cn.fujitsu.com>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
It is not allowed to change underlying protocol for
int fd = socket(PF_INET6, SOCK_RAW, IPPROTO_UDP);
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
If get destination options with length which is not enough for that
option,getsockopt() will still return the real length of the option,
which is larger then the buffer space.
This is because ipv6_getsockopt_sticky() returns the real length of
the option.
This patch fix this problem.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
If we pass NULL data buffer to getsockopt(), it will return 0,
and the option length is set to -EFAULT:
getsockopt(sk, IPPROTO_IPV6, IPV6_DSTOPTS, NULL, &len);
This is because ipv6_getsockopt_sticky() will return -EFAULT or
-EINVAL if some error occur.
This patch fix this problem.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
The last hunk from the commit dae50295 (ipv4/ipv6 compat: Fix SSM
applications on 64bit kernels.) escaped from the compat_ipv6_setsockopt
to the ipv6_getsockopt (I guess due to patch smartness wrt searching
for context) thus breaking 32-bit and 64-bit-without-compat compilation.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for getsockopt for MCAST_MSFILTER for
both IPv4 and IPv6. It depends on the previous setsockopt patch,
and uses the same method.
Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support on 64-bit kernels for seting 32-bit compatible MCAST*
socket options.
Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check length of setsockopt's optval, which provided by user, before copy it
from user space.
For POSIX compliant, return -EINVAL for setsockopt of short lengths.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
| net/ipv6/ipv6_sockglue.c:162:16: warning: symbol 'net' shadows an earlier one
| net/ipv6/ipv6_sockglue.c:111:13: originally declared here
| net/ipv6/ipv6_sockglue.c:175:16: warning: symbol 'net' shadows an earlier one
| net/ipv6/ipv6_sockglue.c:111:13: originally declared here
| net/ipv6/ip6mr.c:1241:10: warning: symbol 'ret' shadows an earlier one
| net/ipv6/ip6mr.c:1163:6: originally declared here
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
This counter is about to become per-proto-and-per-net, so we'll need
two arguments to determine which cell in this "table" to work with.
All the places, but proc already pass proper net to it - proc will be
tuned a bit later.
Some indentation with spaces in proc files is done to keep the file
coding style consistent.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce per-sock inlines: sock_net(), sock_net_set()
and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Last part of hop-limit determination is always:
hoplimit = dst_metric(dst, RTAX_HOPLIMIT);
if (hoplimit < 0)
hoplimit = ipv6_get_hoplimit(dst->dev).
Let's consolidate it as ip6_dst_hoplimit(dst).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
This reverts commit db1ed684f6 ("[IPV6]
UDP: Rename IPv6 UDP files."), commit
8be8af8fa4 ("[IPV4] UDP: Move
IPv4-specific bits to other file.") and commit
e898d4db27 ("[UDP]: Allow users to
configure UDP-Lite.").
First, udplite is of such small cost, and it is a core protocol just
like TCP and normal UDP are.
We spent enormous amounts of effort to make udplite share as much code
with core UDP as possible. All of that work is less valuable if we're
just going to slap a config option on udplite support.
It is also causing build failures, as reported on linux-next, showing
that the changeset was not tested very well. In fact, this is the
second build failure resulting from the udplite change.
Finally, the config options provided was a bool, instead of a modular
option. Meaning the udplite code does not even get build tested
by allmodconfig builds, and furthermore the user is not presented
with a reasonable modular build option which is particularly needed
by distribution vendors.
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Cleanups (all functions are prefixed by sock_prot_inuse)
sock_prot_inc_use(prot) -> sock_prot_inuse_add(prot,-1)
sock_prot_dec_use(prot) -> sock_prot_inuse_add(prot,-1)
sock_prot_inuse() -> sock_prot_inuse_get()
New functions :
sock_prot_inuse_init() and sock_prot_inuse_free() to abstract pcounter use.
2) if CONFIG_PROC_FS=n, we can zap 'inuse' member from "struct proto",
since nobody wants to read the inuse value.
This saves 1372 bytes on i386/SMP and some cpu cycles.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset makes the different protocols to return an error code, so
the af_inet6 module can check the initialization was correct or not.
The raw6 was taken into account to be consistent with the rest of the
protocols, but the registration is at the same place.
Because the raw6 has its own init function, the proto and the ops structure
can be moved inside the raw6.c file.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If CONFIG_NETFILTER if not selected when compile the kernel source code,
ipv6_getsockopt will returen an EINVAL error if optname is not supported by
the kernel. But if CONFIG_NETFILTER is selected, ENOPROTOOPT error will
be return.
This patch fix to always return ENOPROTOOPT error if optname argument of
ipv6_getsockopt is not supported by the kernel.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From RFC 3493, Section 5.2:
IPV6_MULTICAST_IF
Set the interface to use for outgoing multicast packets. The
argument is the index of the interface to use. If the
interface index is specified as zero, the system selects the
interface (for example, by looking up the address in a routing
table and using the resulting interface).
This patch adds support for (index == 0) to reset the value to it's
original state, allowing the system to choose the best interface. IPv4
already behaves this way.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a bunch of sparse warnings. Mostly about 0 used as
NULL pointer, and shadowed variable declarations.
One notable case was that hash size should have been unsigned.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes most of the generic device layer network
namespace safe. This patch makes dev_base_head a
network namespace variable, and then it picks up
a few associated variables. The functions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctl
were modified to take a network namespace argument, and
deal with it.
vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.
So basically anthing in the core of the network stack that was
affected to by the change of dev_base was modified to handle
multiple network namespaces. The rest of the network stack was
simply modified to explicitly use &init_net the initial network
namespace. This can be fixed when those components of the network
stack are modified to handle multiple network namespaces.
For now the ifindex generator is left global.
Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.
At the same time there are assumptions in the network stack
that the ifindex of a network device won't change. Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add v4mapped address inline to avoid calls to ipv6_addr_type().
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>