Out of tcp6_timewait_sock, that now is just an aggregation of
inet_timewait_sock and inet6_timewait_sock, using tw_ipv6_offset in struct
inet_timewait_sock, that is common to the IPv6 transport protocols that use
timewait sockets, like DCCP and TCP.
tw_ipv6_offset plays the struct inet_sock pinfo6 role, i.e. for the generic
code to find the IPv6 area in a timewait sock.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using sk->sk_protocol instead of IPPROTO_TCP.
Will be used by DCCPv6 in the next changesets.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AF_UNIX stream socket performance on P4 CPUs tends to suffer due to a
lot of pipeline flushes from atomic operations. The patch below
removes the sock_hold() and sock_put() in unix_stream_sendmsg(). This
should be safe as the socket still holds a reference to its peer which
is only released after the file descriptor's final user invokes
unix_release_sock(). The only consideration is that we must add a
memory barrier before setting the peer initially.
Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It also looks like there were 2 places where the test on sk_err was
missing from the event wait logic (in sk_stream_wait_connect and
sk_stream_wait_memory), while the rest of the sock_error() users look
to be doing the right thing. This version of the patch fixes those,
and cleans up a few places that were testing ->sk_err directly.
Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes dead code. I don't see the reason to keep this cruft
around, besides cluttering the nice and functionally working code.
Signed-off-by: Roberto Nibali <ratz@drugphish.ch>
Signed-off-by: Horms <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since udp_checksum_init always returns 0 there is no point in
having it return a value.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a packet is obtained from skb_recv_datagram with MSG_PEEK enabled
it is left on the socket receive queue. This means that when we detect
a checksum error we have to be careful when trying to free the packet
as someone could have dequeued it in the time being.
Currently this delicate logic is duplicated three times between UDPv4,
UDPv6 and RAWv6. This patch moves them into a one place and simplifies
the code somewhat.
This is based on a suggestion by Eric Dumazet.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
And make the core DCCP code AF agnostic, just like TCP, now its time
to work on net/dccp/ipv6.c, we are close to the end!
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Renaming it to inet_csk_addr2sockaddr.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And move it to struct inet_connection_sock. DCCP will use it in the
upcoming changesets.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And inet6_rsk_offset in inet_request_sock, for the same reasons as
inet_sock's pinfo6 member.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
More work is needed tho to introduce inet6_request_sock from
tcp6_request_sock, in the same layout considerations as ipv6_pinfo in
inet_sock, next changeset will do that.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Another spin of Herbert Xu's "safer ip reassembly" patch
for 2.6.16.
(The original patch is here:
http://marc.theaimsgroup.com/?l=linux-netdev&m=112281936522415&w=2
and my only contribution is to have tested it.)
This patch (optionally) does additional checks before accepting IP
fragments, which can greatly reduce the possibility of reassembling
fragments which originated from different IP datagrams.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arthur Kepner <akepner@sgi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes ebt_log and ebt_ulog use the new nf_log api. This enables
the bridging packet filter to log packets e.g. via nfnetlink_log.
Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Part of a performance problem with ip_tables is that memory allocation
is not NUMA aware, but 'only' SMP aware (ie each CPU normally touch
separate cache lines)
Even with small iptables rules, the cost of this misplacement can be
high on common workloads. Instead of using one vmalloc() area
(located in the node of the iptables process), we now allocate an area
for each possible CPU, using vmalloc_node() so that memory should be
allocated in the CPU's node if possible.
Port to arp_tables and ip6_tables by Harald Welte.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace existing BIC version 1.1 with new version 2.0.
The main change is to replace the window growth function
with a cubic function as described in:
http://www.csc.ncsu.edu/faculty/rhee/export/bitcp/cubic-paper.pdf
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The latest BICTCP patch at:
http://www.csc.ncsu.edu:8080/faculty/rhee/export/bitcp/index_files/Page546.htm
disables the low_utilization feature of BICTCP because it doesn't work
in some cases. This patch removes it.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch series implements per packet access control via the
extension of the Linux Security Modules (LSM) interface by hooks in
the XFRM and pfkey subsystems that leverage IPSec security
associations to label packets. Extensions to the SELinux LSM are
included that leverage the patch for this purpose.
This patch implements the changes necessary to the XFRM subsystem,
pfkey interface, ipv4/ipv6, and xfrm_user interface to restrict a
socket to use only authorized security associations (or no security
association) to send/receive network packets.
Patch purpose:
The patch is designed to enable access control per packets based on
the strongly authenticated IPSec security association. Such access
controls augment the existing ones based on network interface and IP
address. The former are very coarse-grained, and the latter can be
spoofed. By using IPSec, the system can control access to remote
hosts based on cryptographic keys generated using the IPSec mechanism.
This enables access control on a per-machine basis or per-application
if the remote machine is running the same mechanism and trusted to
enforce the access control policy.
Patch design approach:
The overall approach is that policy (xfrm_policy) entries set by
user-level programs (e.g., setkey for ipsec-tools) are extended with a
security context that is used at policy selection time in the XFRM
subsystem to restrict the sockets that can send/receive packets via
security associations (xfrm_states) that are built from those
policies.
A presentation available at
www.selinux-symposium.org/2005/presentations/session2/2-3-jaeger.pdf
from the SELinux symposium describes the overall approach.
Patch implementation details:
On output, the policy retrieved (via xfrm_policy_lookup or
xfrm_sk_policy_lookup) must be authorized for the security context of
the socket and the same security context is required for resultant
security association (retrieved or negotiated via racoon in
ipsec-tools). This is enforced in xfrm_state_find.
On input, the policy retrieved must also be authorized for the socket
(at __xfrm_policy_check), and the security context of the policy must
also match the security association being used.
The patch has virtually no impact on packets that do not use IPSec.
The existing Netfilter (outgoing) and LSM rcv_skb hooks are used as
before.
Also, if IPSec is used without security contexts, the impact is
minimal. The LSM must allow such policies to be selected for the
combination of socket and remote machine, but subsequent IPSec
processing proceeds as in the original case.
Testing:
The pfkey interface is tested using the ipsec-tools. ipsec-tools have
been modified (a separate ipsec-tools patch is available for version
0.5) that supports assignment of xfrm_policy entries and security
associations with security contexts via setkey and the negotiation
using the security contexts via racoon.
The xfrm_user interface is tested via ad hoc programs that set
security contexts. These programs are also available from me, and
contain programs for setting, getting, and deleting policy for testing
this interface. Testing of sa functions was done by tracing kernel
behavior.
Signed-off-by: Trent Jaeger <tjaeger@cse.psu.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The below "jumbo" patch fixes the following problems in MLDv2.
1) Add necessary "ntohs" to recent "pskb_may_pull" check [breaks
all nonzero source queries on little-endian (!)]
2) Add locking to source filter list [resend of prior patch]
3) fix "mld_marksources()" to
a) send nothing when all queried sources are excluded
b) send full exclude report when source queried sources are
not excluded
c) don't schedule a timer when there's nothing to report
NOTE: RFC 3810 specifies the source list should be saved and each
source reported individually as an IS_IN. This is an obvious DOS
path, requiring the host to store and then multicast as many sources
as are queried (e.g., millions...). This alternative sends a full,
relevant report that's limited to number of sources present on the
machine.
4) fix "add_grec()" to send empty-source records when it should
The original check doesn't account for a non-empty source
list with all sources inactive; the new code keeps that
short-circuit case, and also generates the group header
with an empty list if needed.
5) fix mca_crcount decrement to be after add_grec(), which needs
its original value
These issues (other than item #1 ;-) ) were all found by Yan Zheng,
much thanks!
Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the checks are scattered all over and this leads
to inconsistencies and even cases where the check is not made.
Based upon a patch from Kris Katterjohn.
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to release idev->lcok before we call addrconf_dad_stop().
It calls ipv6_addr_del(), which will hold idev->lock.
Bug spotted by Yasuyuki KOZAKAI <yasuyuki.kozakai@toshiba.co.jp>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call nf_bridge_put() before allocating a new nf_bridge structure and
potentially overwriting the pointer to a previously allocated one.
This fixes a memory leak which can occur when the bridge topology
allows for an skb to traverse more than one bridge.
Signed-off-by: David Kimdon <david.kimdon@devicescape.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing default of 10 is just way too low.
Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Hiroyuki YAMAMORI <h-yamamo@db3.so-net.ne.jp>
Since regen_count is stored in the public address, we need to reset it
when we start renewing temporary address.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to relesae ifp->lock before we call addrconf_dad_stop(),
which will hold ifp->lock.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The problem is that when new policies are inserted, sockets do not see
the update (but all new route lookups do).
This bug is related to the SA insertion stale route issue solved
recently, and this policy visibility problem can be fixed in a similar
way.
The fix is to flush out the bundles of all policies deeper than the
policy being inserted. Consider beginning state of "outgoing"
direction policy list:
policy A --> policy B --> policy C --> policy D
First, realize that inserting a policy into a list only potentially
changes IPSEC routes for that direction. Therefore we need not bother
considering the policies for other directions. We need only consider
the existing policies in the list we are doing the inserting.
Consider new policy "B'", inserted after B.
policy A --> policy B --> policy B' --> policy C --> policy D
Two rules:
1) If policy A or policy B matched before the insertion, they
appear before B' and thus would still match after inserting
B'
2) Policy C and D, now "shadowed" and after policy B', potentially
contain stale routes because policy B' might be selected
instead of them.
Therefore we only need flush routes assosciated with policies
appearing after a newly inserted policy, if any.
Signed-off-by: David S. Miller <davem@davemloft.net>
I hope to actually change this behaviour shortly but this will help
anybody grepping code at present.
Signed-off-by: Ian McDonald <imcdnzl@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If you add more than one IPv6 address belonging to the same prefix and
delete the address that was last added, routing table entry for that
prefix is also deleted.
Tested on 2.6.14.4
To reproduce:
ip addr add 3ffe::1/64 dev eth0
ip addr add 3ffe::2/64 dev eth0
/* wait DAD */
sleep 1
ip addr del 3ffe::2/64 dev eth0
ip -6 route
(route to 3ffe::/64 should be gone)
In ipv6_del_addr(), if ifa == ifp, we set ifa->if_next to NULL, and later
assign ifap = &ifa->if_next, effectively terminating the for-loop.
This prevents us from checking if there are other addresses using the same
prefix that are valid, and thus resulting in deletion of the prefix.
This applies only if the first entry in idev->addr_list is the address to
be deleted.
Signed-off-by: Kristian Slavov <kristian.slavov@nomadiclab.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In vlan_ioctl_handler() the code misses couple checks for
error return values.
Signed-off-by: Mika Kukkonen <mikukkon@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
I found these while compiling with extra gcc warnings;
considering the indenting surely they are not intentional?
Signed-off-by: Mika Kukkonen <mikukkon@iki.fi>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A tentative address is not considered "assigned to an interface"
in the traditional sense (RFC2462 Section 4).
Don't try to select such an address for the source address.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
If the link was not available when the interface was created,
run DAD for pending tentative addresses when the link becomes ready.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
NETDEV_UP might be sent even if the link attached to the interface was
not ready. DAD does not make sense in such case, so we won't do so.
After interface
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
gss_create_upcall() should not error just because rpc.gssd closed the
pipe on its end. Instead, it should requeue the pending requests and then
retry.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Make sctp_writeable() use sk_wmem_alloc rather than sk_wmem_queued to
determine the sndbuf space available. It also removes all the modifications
to sk_wmem_queued as it is not currently used in SCTP.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we insert a new xfrm_state which potentially
subsumes an existing one, make sure all cached
bundles are flushed so that the new SA is used
immediately.
Signed-off-by: David S. Miller <davem@davemloft.net>
The route expiration time is stored in rt6i_expires in jiffies.
The argument of rt6_route_add() for adding a route is not the
expiration time in jiffies nor in clock_t, but the lifetime
(or time left before expiration) in clock_t.
Because of the confusion, we sometimes saw several strange errors
(FAILs) in TAHI IPv6 Ready Logo Phase-2 Self Test.
The symptoms were analyzed by Mitsuru Chinen <CHINEN@jp.ibm.com>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A typo caused some bridged IPv6 packets to get dropped randomly,
as reported by Sebastien Chaumontet. The patch below fixes this
(using skb->nh.raw instead of raw) and also makes the jumbo packet
length checking up-to-date with the code in
net/ipv6/exthdrs.c::ipv6_hop_jumbo.
Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
IP6_NF_TARGET_NFQUEUE depends on IP6_NF_IPTABLES, not IP_NF_IPTABLES.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
As noticed by Phil Oester, the GRE NAT protocol helper is initialized
before the NAT core, which makes registration fail.
Change the linking order to make NAT be initialized first.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>