This is required to pass namespace context into rt_cache_flush called from
->flush_cache.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass double tcf_proto pointers to tcf_destroy_chain() to make it
clear the start of the filter list for more consistency.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch add spectrum capability and required information
elements to association request providing AP has requested it and
it is supported by the driver
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Assaf Krauss <assaf.krauss@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch makes mac80211 refuse a WEP key whose length is not WEP40 nor
WEP104.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The workqueue provided by mac80211 should not be used for
scheduled tasks that acquire the RTNL lock. This could be done
when the driver uses the function ieee80211_iterate_active_interfaces()
within the scheduled work. Such behavior will end in locking
dependencies problems when an interface is being removed.
This patch will add a notification about the RTNL locking and
the mac80211 workqueue to prevent driver developers from
blindly using it.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Some drivers may want to to use the TKIP key offsets for TX and RX
MIC so lets move this out. Lets also clear up a bit how this is used
internally in mac80211.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[ Based upon original report and patch by Karsten Keil. Karsten
has verified that this fixes the TAHI test case "ICMPv6 test
v6LC.5.1.2 Part F". -DaveM ]
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 4960, Section 11.4. Protection of Non-SCTP-Capable Hosts
When an SCTP stack receives a packet containing multiple control or
DATA chunks and the processing of the packet requires the sending of
multiple chunks in response, the sender of the response chunk(s) MUST
NOT send more than one packet. If bundling is supported, multiple
response chunks that fit into a single packet MAY be bundled together
into one single response packet. If bundling is not supported, then
the sender MUST NOT send more than one response chunk and MUST
discard all other responses. Note that this rule does NOT apply to a
SACK chunk, since a SACK chunk is, in itself, a response to DATA and
a SACK does not require a response of more DATA.
We implement this by not servicing our outqueue until we reach the end
of the packet. This enables maximum bundling. We also identify
'response' chunks and make sure that we only send 1 packet when sending
such chunks.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to more easily grep for all things that set
sk->sk_socket, add sk_set_socket() helper inline function.
Suggested (although only half-seriously) by Evgeniy Polyakov.
Signed-off-by: David S. Miller <davem@davemloft.net>
In commits 33c732c361 ([IPV4]: Add raw
drops counter) and a92aa318b4 ([IPV6]:
Add raw drops counter), Wang Chen added raw drops counter for
/proc/net/raw & /proc/net/raw6
This patch adds this capability to UDP sockets too (/proc/net/udp &
/proc/net/udp6).
This means that 'RcvbufErrors' errors found in /proc/net/snmp can be also
be examined for each udp socket.
# grep Udp: /proc/net/snmp
Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors
Udp: 23971006 75 899420 16390693 146348 0
# cat /proc/net/udp
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt ---
uid timeout inode ref pointer drops
75: 00000000:02CB 00000000:0000 07 00000000:00000000 00:00000000 00000000 ---
0 0 2358 2 ffff81082a538c80 0
111: 00000000:006F 00000000:0000 07 00000000:00000000 00:00000000 00000000 ---
0 0 2286 2 ffff81042dd35c80 146348
In this example, only port 111 (0x006F) was flooded by messages that
user program could not read fast enough. 146348 messages were lost.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ROSE network is organized through nodes connected via hamradio or Internet.
AX25 packet radio frames sent to a remote ROSE address destination are routed
through these nodes.
Without the present patch, automatic routing mechanism did not work optimally
due to an improper parameter checking.
rose_get_neigh() function is called either by rose_connect() or by
rose_route_frame().
In the case of a call from rose_connect(), f0 timer is checked to find if a connection
is already pending. In that case it returns the address of the neighbour, or returns a NULL otherwise.
When called by rose_route_frame() the purpose was to route a packet AX25 frame
through an adjacent node given a destination rose address.
However, in that case, t0 timer checked does not indicate if the adjacent node
is actually connected even if the timer is not null. Thus, for each frame sent, the
function often tried to start a new connexion even if the adjacent node was already connected.
The patch adds a "new" parameter that is true when the function is called by
rose route_frame().
This instructs rose_get_neigh() to check node parameter "restarted".
If restarted is true it means that the route to the destination address is opened via a neighbour
node already connected.
If "restarted" is false the function returns a NULL.
In that case the calling function will initiate a new connection as before.
This results in a fast routing of frames, from nodes to nodes, until
destination is reached, as originaly specified by ROSE protocole.
Signed-off-by: Bernard Pidoux <f6bvp@amsat.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix three ct_extend/NAT extension related races:
- When cleaning up the extension area and removing it from the bysource hash,
the nat->ct pointer must not be set to NULL since it may still be used in
a RCU read side
- When replacing a NAT extension area in the bysource hash, the nat->ct
pointer must be assigned before performing the replacement
- When reallocating extension storage in ct_extend, the old memory must
not be freed immediately since it may still be used by a RCU read side
Possibly fixes https://bugzilla.redhat.com/show_bug.cgi?id=449315
and/or http://bugzilla.kernel.org/show_bug.cgi?id=10875
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Three major portions to this change:
1) Add IW_EV_COMPAT_LCP_LEN, IW_EV_COMPAT_POINT_OFF,
and IW_EV_COMPAT_POINT_LEN helper defines.
2) Delete iw_stream_check_add_*(), they are unused.
3) Add iw_request_info argument to iwe_stream_add_*(), and use it to
size the event and pointer lengths correctly depending upon whether
IW_REQUEST_FLAG_COMPAT is set or not.
4) The mechanical transformations to the drivers and wireless stack
bits to get the iw_request_info passed down into the routines
modified in #3. Also, explicit references to IW_EV_LCP_LEN are
replaced with iwe_stream_lcp_len(info).
With a lot of help and bug fixes from Masakazu Mokuno.
Signed-off-by: David S. Miller <davem@davemloft.net>
Next we can kill the hacks in fs/compat_ioctl.c and also
dispatch compat ioctls down into the driver and 80211 protocol
helper layers in order to handle iw_point objects embedded in
stream replies which need to be translated.
Signed-off-by: David S. Miller <davem@davemloft.net>
There are many possible ways to add this "salt", thus I made this
patch to be the last in the series to change it if required.
Currently I propose to use the struct net pointer itself as this
salt, but since this pointer is most often cache-line aligned, shift
this right to eliminate the bits, that are most often zeroed.
After this, simply add this mix to prepared hashfn-s.
For CONFIG_NET_NS=n case this salt is 0 and no changes in hashfn
appear.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Same as for inet_hashfn, prepare its ipv6 incarnation.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although this hash takes addresses into account, the ehash chains
can also be too long when, for instance, communications via lo occur.
So, prepare the inet_hashfn to take struct net into account.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Listening-on-one-port sockets in many namespaces produce long
chains in the listening_hash-es, so prepare the inet_lhashfn to
take struct net into account.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Binding to some port in many namespaces may create too long
chains in bhash-es, so prepare the hashfn to take struct net
into account.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change struct proto destroy function pointer to return void. Noticed
by Al Viro.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Take a __le16 directly rather than a host-endian value.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
reorder udp_iter_state to remove padding on 64bit builds
shrinks from 24 to 16 bytes, moving to a smaller slab when
CONFIG_NET_NS is undefined & seq_net_private = {}
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts two changesets, ec3c0982a2
("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and
the follow-on bug fix 9ae27e0adb
("tcp: Fix slab corruption with ipv6 and tcp6fuzz").
This change causes several problems, first reported by Ingo Molnar
as a distcc-over-loopback regression where connections were getting
stuck.
Ilpo Järvinen first spotted the locking problems. The new function
added by this code, tcp_defer_accept_check(), only has the
child socket locked, yet it is modifying state of the parent
listening socket.
Fixing that is non-trivial at best, because we can't simply just grab
the parent listening socket lock at this point, because it would
create an ABBA deadlock. The normal ordering is parent listening
socket --> child socket, but this code path would require the
reverse lock ordering.
Next is a problem noticed by Vitaliy Gusev, he noted:
----------------------------------------
>--- a/net/ipv4/tcp_timer.c
>+++ b/net/ipv4/tcp_timer.c
>@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data)
> goto death;
> }
>
>+ if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) {
>+ tcp_send_active_reset(sk, GFP_ATOMIC);
>+ goto death;
Here socket sk is not attached to listening socket's request queue. tcp_done()
will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should
release this sk) as socket is not DEAD. Therefore socket sk will be lost for
freeing.
----------------------------------------
Finally, Alexey Kuznetsov argues that there might not even be any
real value or advantage to these new semantics even if we fix all
of the bugs:
----------------------------------------
Hiding from accept() sockets with only out-of-order data only
is the only thing which is impossible with old approach. Is this really
so valuable? My opinion: no, this is nothing but a new loophole
to consume memory without control.
----------------------------------------
So revert this thing for now.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes CVS keywords that weren't updated for a long time
from comments.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As we do for other socket/timewait-socket specific parameters,
let the callers pass appropriate arguments to
tcp_v{4,6}_do_calc_md5_hash().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
We can share most part of the hash calculation code because
the only difference between IPv4 and IPv6 is their pseudo headers.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
This pacth makes IPv6 address labels per network namespace.
It keeps the global label tables, ip6addrlbl_table, but
adds a 'net' member to each ip6addrlbl_entry.
This new member is taken into account when matching labels.
Changelog
=========
* v1: Initial version
* v2:
* Minize the penalty when network namespaces are not configured:
* the 'net' member is added only if CONFIG_NET_NS is
defined. This saves space when network namespaces are not
configured.
* 'net' value is retrieved with the inlined function
ip6addrlbl_net() that always return &init_net when
CONFIG_NET_NS is not defined.
* 'net' member in ip6addrlbl_entry renamed to the less generic
'lbl_net' name (helps code search).
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
This patches removes IFA_GLOBAL definition from linux/include/net/if_inet6.h
as it is unused.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Wei Yongjun noticed that we may call reqsk_free on request sock objects where
the opt fields may not be initialized, fix it by introducing inet_reqsk_alloc
where we initialize ->opt to NULL and set ->pktopts to NULL in
inet6_reqsk_alloc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- The tcp_unhash() method in /include/net/tcp.h is no more needed, as the
unhash method in tcp_prot structure is now inet_unhash (instead of
tcp_unhash in the
past); see tcp_prot structure in net/ipv4/tcp_ipv4.c.
- So, this patch removes tcp_unhash() declaration from include/net/tcp.h
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes nf_ct_ipv4_ct_gather_frags() method declaration from
include/net/netfilter/ipv4/nf_conntrack_ipv4.h, since it is unused in
the Linux kernel.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the last packet of a connection isn't accounted when its causing
abnormal termination.
Introduces nf_ct_kill_acct() which increments the accounting counters on
conntrack kill. The new function was necessary, because there are calls
to nf_ct_kill() which don't need accounting:
nf_conntrack_proto_tcp.c line ~847:
Kills ct and returns NF_REPEAT. We don't want to count twice.
nf_conntrack_proto_tcp.c line ~880:
Kills ct and returns NF_DROP. I think we don't want to count dropped
packets.
nf_conntrack_netlink.c line ~824:
As far as I can see ctnetlink_del_conntrack() is used to destroy a
conntrack on behalf of the user. There is an sk_buff, but I don't think
this is an actual packet. Incrementing counters here is therefore not
desired.
Signed-off-by: Fabian Hugelshofer <hugelshofer2006@gmx.ch>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Encapsulate the common
if (del_timer(&ct->timeout))
ct->timeout.function((unsigned long)ct)
sequence in a new function.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a port of the IPv4 security table for IPv6.
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following patch implements a new "security" table for iptables, so
that MAC (SELinux etc.) networking rules can be managed separately to
standard DAC rules.
This is to help with distro integration of the new secmark-based
network controls, per various previous discussions.
The need for a separate table arises from the fact that existing tools
and usage of iptables will likely clash with centralized MAC policy
management.
The SECMARK and CONNSECMARK targets will still be valid in the mangle
table to prevent breakage of existing users.
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit e9df2e8fd8 ("[IPV6]: Use
appropriate sock tclass setting for routing lookup.") also changed the
way that ECN capable transports mark this capability in IPv6. As a
result, SCTP was not marking ECN capablity because the traffic class
was never set. This patch brings back the markings for IPv6 traffic.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we are trying to fast retransmit the lowest outstanding TSN, we
need to restart the T3-RTX timer, so that subsequent timeouts will
correctly tag all the packets necessary for retransmissions.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Correctly keep track of Fast Recovery state and do not reduce
congestion window multiple times during sucht state.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 UDP sockets wth IPv4 mapped address use udp_sendmsg to send the data
actually. In this case ip_flush_pending_frames should be called instead
of ip6_flush_pending_frames.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
- Allow longer lifetimes (>= 0x7fffffff/HZ) on 64bit archs
by using unsigned long.
- Shadow this arithmetic overflow workaround by introducing
helper functions: addrconf_timeout_fixup() and
addrconf_finite_timeout().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Commit 7cbca67c07 ("[IPV6]: Support
Source Address Selection API (RFC5014)") introduced NULL dereference
of asoc to sctp_v6_get_saddr in net/sctp/ipv6.c.
Pointed out by Johann Felix Soden <johfel@users.sourceforge.net>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Make nlmsg_trim(), nlmsg_cancel(), genlmsg_cancel(), and
nla_nest_cancel() void functions.
Return -EMSGSIZE instead of -1 if the provided message buffer is not
big enough.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows to disable FAT channel in specific configurations.
For example the configuration (8, +1), (primary channel 8, extension
channel 12) isn't permitted in U.S., but (8, -1), (primary channel 8,
extension channel 4) is. When FAT channel configuration is not
permitted, FAT channel should be reported as not supported in the
capabilities of the HT IE in association request. And sssociation is
performed on 20Mhz channel.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The purpose of nla_parse_nested_compat() is to parse attributes which
contain a struct followed by a stream of nested attributes. So far,
it called nla_parse_nested() to parse the stream of nested attributes
which was wrong, as nla_parse_nested() expects a container attribute
as data which holds the attribute stream. It needs to call
nla_parse() directly while pointing at the next possible alignment
point after the struct in the beginning of the attribute.
With this patch, I can no longer reproduce the reported leftover
warnings.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch updates mac80211 and drivers to be multi-queue aware and
use that instead of the internal queue mapping. Also does a number
of cleanups in various pieces of the code that fall out and reduces
internal mac80211 state size.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There really is no reason for a driver to reject a frame on
an A-MPDU queue when it can stop that queue for any period
of time and is given frames one by one. Hence, disallow it
with a big warning and reduce mac80211-internal state.
Also add a warning when we try to fragment a frame destined
for an A-MPDU queue and drop it, the actual bug needs to be
fixed elsewhere but I'm not exactly sure how to yet.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch converts mac80211 and all drivers to have transmit
information and status in skb->cb rather than allocating extra
memory for it and copying all the data around. To make it fit,
a union is used where only data that is necessary for all steps
is kept outside of the union.
A number of fixes were done by Ivo, as well as the rt2x00 part
of this patch.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch modifies struct ieee80211_tx_control to give band
info and the rate index (instead of rate pointers) to drivers.
This mostly serves to reduce the TX control structure size to
make it fit into skb->cb so that the fragmentation code can
put it there and we can think about passing it to drivers that
way in the future.
The rt2x00 driver update was done by Ivo, thanks.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Having drivers start queues is just confusing, their ->start()
callback can block and do whatever is necessary, so let mac80211
start queues and have drivers wake queues when necessary (to get
packets flowing again right away.)
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This tunnel uses its own private structure and requires separate
patch to switch from private stats to on-device ones.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
All users already use on-device statistics, so this field can be
safely removed.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Noticed from Al Viro <viro@ftp.linux.org.uk> via David Miller
<davem@davemloft.net>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This one stores all ctl-heads in one list and restricts the
permissions not give write access to non-init net namespaces.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit e38bad4766
mac80211: make ieee80211_iterate_active_interfaces not need rtnl
rt2500usb and rt73usb broke down due to attempting register access
in atomic context (which is not possible for USB hardware).
This patch restores ieee80211_iterate_active_interfaces() to use RTNL lock,
and provides the non-RTNL version under a new name:
ieee80211_iterate_active_interfaces_atomic()
So far only rt2x00 uses ieee80211_iterate_active_interfaces(), and those
drivers require the RTNL version of ieee80211_iterate_active_interfaces().
Since they already call that function directly, this patch will automatically
fix the USB rt2x00 drivers.
v2: Rename ieee80211_iterate_active_interfaces_rtnl
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There was some cleanup issues during early mount which would trigger
a kernel bug for certain types of failure. This patch reorganizes the
cleanup to get rid of the bad behavior.
This also merges the 9pnet and 9pnet_fd modules for the purpose of
configuration and initialization. Keeping the fd transport separate
from the core 9pnet code seemed like a good idea at the time, but in
practice has caused more harm and confusion than good.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
The kernel-doc comments of much of the 9p system have been in disarray since
reorganization. This patch fixes those problems, adds additional documentation
and a template book which collects the 9p information.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
trying to clean up the signal/noise code. the previous code in mac80211 had
confusing names for the related variables, did not have much definition of
what units of signal and noise were provided and used implicit mechanisms from
the wireless extensions.
this patch introduces hardware capability flags to let the hardware specify
clearly if it can provide signal and noise level values and which units it can
provide. this also anticipates possible new units like RCPI in the future.
for signal:
IEEE80211_HW_SIGNAL_UNSPEC - unspecified, unknown, hw specific
IEEE80211_HW_SIGNAL_DB - dB difference to unspecified reference point
IEEE80211_HW_SIGNAL_DBM - dBm, difference to 1mW
for noise we currently only have dBm:
IEEE80211_HW_NOISE_DBM - dBm, difference to 1mW
if IEEE80211_HW_SIGNAL_UNSPEC or IEEE80211_HW_SIGNAL_DB is used the driver has
to provide the maximum value (max_signal) it reports in order for applications
to make sense of the signal values.
i tried my best to find out for each driver what it can provide and update it
but i'm not sure (?) for some of them and used the more conservative guess in
doubt. this can be fixed easily after this patch has been merged by changing
the hardware flags of the driver.
DRIVER SIGNAL MAX NOISE QUAL
-----------------------------------------------------------------
adm8211 unspec(?) 100 n/a missing
at76_usb unspec(?) (?) unused missing
ath5k dBm dBm percent rssi
b43legacy dBm dBm percent jssi(?)
b43 dBm dBm percent jssi(?)
iwl-3945 dBm dBm percent snr+more
iwl-4965 dBm dBm percent snr+more
p54 unspec 127 n/a missing
rt2x00 dBm n/a percent rssi+tx/rx frame success
rt2400 dBm n/a
rt2500pci dBm n/a
rt2500usb dBm n/a
rt61pci dBm n/a
rt73usb dBm n/a
rtl8180 unspec(?) 65 n/a (?)
rtl8187 unspec(?) 65 (?) noise(?)
zd1211 dB(?) 100 n/a percent
drivers/net/wireless/ath5k/base.c: Changes-licensed-under: 3-Clause-BSD
Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Replace u16ho with put/get_unaligned functions
Signed-off-by: Graf Yang <graf.yang@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch eliminates the (very remote) chance of a crash resulting
from a partially initialized socket or native port unexpectedly
receiving a message. Now, during the creation of a socket or native
port, the underlying generic port's lock is not released until all
initialization required to handle incoming messages has been done.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The syncppp layer wants a mid-level netdev private pointer.
It was using netdev->priv but that only worked by accident,
and thus this scheme was broken when the device private
allocation strategy changed.
Add a proper mid-layer private pointer for uses like this,
update syncppp and all users, and remove the HDLC_PPP broken
tag from drivers/net/wan/Kconfig
Signed-off-by: David S. Miller <davem@davemloft.net>
The specification of sctp_connectx() has been changed to return
an association id. We've added a new socket option that will
return the association id as the return value from the setsockopt()
call. The library that implements sctp_connectx() interface will
implement both socket options.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brings delayed_ack socket option set/get into line with the latest ietf
socket extensions API draft, while maintaining backwards compatibility.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This
* makes the queue number passed to drivers a u16
(as it will be with skb_get_queue_mapping)
* removes the useless queue number defines
* splits hw->queues into hw->queues/ampdu_queues
* removes the debugfs files for per-queue counters
* removes some dead QoS code
* removes the beacon queue configuration for IBSS
so that the drivers now never get a queue number
bigger than (hw->queues + hw->ampdu_queues - 1)
for tx and only in the range 0..hw->queues-1 for
conf_tx.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The queue info in struct ieee80211_tx_status is never used.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The callback takes a ieee80211_tx_queue_stats with a contained
array of ieee80211_tx_queue_stats_data, remove the former, rename
the latter to ieee80211_tx_queue_stats and make tx_stats() take
the array directly.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
After the bcm43xx removal ieee80211_wx_{get,set}_auth() were no longer
used.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
After the softmac removal ieee80211_tx_frame() was no longer used.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This adds a new flag to the ieee80211_key_conf structure.
This flag will inform the driver the key is pairwise rather then
a shared key.
This is important for drivers who support both types of keys,
and need to be informed which type of key this is. Alternative
would be drivers checking the address argument of set_key(),
but it will be safer when mac80211 is more explicit.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The hw_key_idx inside the ieee80211_key_conf structure does
not provide all the information drivers might need to perform
hardware encryption.
This is in particular true for rt2x00 who needs to know the
key algorithm and whether it is a shared or pairwise key.
By passing the ieee80211_key_conf pointer it assures us that
drivers can make full use of all information that it should know
about a particular key.
Additionally this patch updates all drivers to grab the hw_key_idx from
the ieee80211_key_conf structure.
v2: Removed bogus u16 cast
v3: Add warning about ieee80211_tx_control pointers
v4: Update warning about ieee80211_tx_control pointers
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (53 commits)
tcp: Overflow bug in Vegas
[IPv4] UFO: prevent generation of chained skb destined to UFO device
iwlwifi: move the selects to the tristate drivers
ipv4: annotate a few functions __init in ipconfig.c
atm: ambassador: vcc_sf semaphore to mutex
MAINTAINERS: The socketcan-core list is subscribers-only.
netfilter: nf_conntrack: padding breaks conntrack hash on ARM
ipv4: Update MTU to all related cache entries in ip_rt_frag_needed()
sch_sfq: use del_timer_sync() in sfq_destroy()
net: Add compat support for getsockopt (MCAST_MSFILTER)
net: Several cleanups for the setsockopt compat support.
ipvs: fix oops in backup for fwmark conn templates
bridge: kernel panic when unloading bridge module
bridge: fix error handling in br_add_if()
netfilter: {nfnetlink,ip,ip6}_queue: fix skb_over_panic when enlarging packets
netfilter: x_tables: fix net namespace leak when reading /proc/net/xxx_tables_names
netfilter: xt_TCPOPTSTRIP: signed tcphoff for ipv6_skip_exthdr() retval
tcp: Limit cwnd growth when deferring for GSO
tcp: Allow send-limited cwnd to grow up to max_burst when gso disabled
[netdrvr] gianfar: Determine TBIPA value dynamically
...
* 'audit.b50' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
[PATCH] new predicate - AUDIT_FILETYPE
[patch 2/2] Use find_task_by_vpid in audit code
[patch 1/2] audit: let userspace fully control TTY input auditing
[PATCH 2/2] audit: fix sparse shadowed variable warnings
[PATCH 1/2] audit: move extern declarations to audit.h
Audit: MAINTAINERS update
Audit: increase the maximum length of the key field
Audit: standardize string audit interfaces
Audit: stop deadlock from signals under load
Audit: save audit_backlog_limit audit messages in case auditd comes back
Audit: collect sessionid in netlink messages
Audit: end printk with newline
commit 0794935e "[NETFILTER]: nf_conntrack: optimize hash_conntrack()"
results in ARM platforms hashing uninitialised padding. This padding
doesn't exist on other architectures.
Fix this by replacing NF_CT_TUPLE_U_BLANK() with memset() to ensure
everything is initialised. There were only 4 bytes that
NF_CT_TUPLE_U_BLANK() wasn't clearing anyway (or 12 bytes on ARM).
Signed-off-by: Philip Craig <philipc@snapgear.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add struct net_device parameter to ip_rt_frag_needed() and update MTU to
cache entries where ifindex is specified. This is similar to what is
already done in ip_rt_redirect().
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for getsockopt for MCAST_MSFILTER for
both IPv4 and IPv6. It depends on the previous setsockopt patch,
and uses the same method.
Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes bug http://bugzilla.kernel.org/show_bug.cgi?id=10556
where conn templates with protocol=IPPROTO_IP can oops backup box.
Result from ip_vs_proto_get() should be checked because
protocol value can be invalid or unsupported in backup. But
for valid message we should not fail for templates which use
IPPROTO_IP. Also, add checks to validate message limits and
connection state. Show state NONE for templates using IPPROTO_IP.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
iwlwifi: Allow building iwl3945 without iwl4965.
wireless: Fix compile error with wifi & leds
tcp: Fix slab corruption with ipv6 and tcp6fuzz
ipv4/ipv6 compat: Fix SSM applications on 64bit kernels.
[IPSEC]: Use digest_null directly for auth
sunrpc: fix missing kernel-doc
can: Fix copy_from_user() results interpretation
Revert "ipv6: Fix typo in net/ipv6/Kconfig"
tipc: endianness annotations
ipv6: result of csum_fold() is already 16bit, no need to cast
[XFRM] AUDIT: Fix flowlabel text format ambibuity.
Previously I added sessionid output to all audit messages where it was
available but we still didn't know the sessionid of the sender of
netlink messages. This patch adds that information to netlink messages
so we can audit who sent netlink messages.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add support on 64-bit kernels for seting 32-bit compatible MCAST*
socket options.
Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds IPv6 support to the interfaces that are used to express nfsd
exports. All addressed are stored internally as IPv6; backwards
compatibility is maintained using mapped addresses.
Thanks to Bruce Fields, Brian Haley, Neil Brown and Hideaki Joshifuji
for comments
Signed-off-by: Aurelien Charbon <aurelien.charbon@bull.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Brian Haley <brian.haley@hp.com>
Cc: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
As it stands it's impossible to use any authentication algorithms
with an ID above 31 portably. It just happens to work on x86 but
fails miserably on ppc64.
The reason is that we're using a bit mask to check the algorithm
ID but the mask is only 32 bits wide.
After looking at how this is used in the field, I have concluded
that in the long term we should phase out state matching by IDs
because this is made superfluous by the reqid feature. For current
applications, the best solution IMHO is to allow all algorithms when
the bit masks are all ~0.
The following patch does exactly that.
This bug was identified by IBM when testing on the ppc64 platform
using the NULL authentication algorithm which has an ID of 251.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This deblats ~200 bytes when ipv6 and dccp are 'y'.
Besides, this will ease compilation issues for patches
I'm working on to make inet hash tables more scalable
wrt net namespaces.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As I can see from the code, two places (tcp_v6_syn_recv_sock and
dccp_v6_request_recv_sock) that call this one already run with
BHs disabled, so it's safe to call __inet_inherit_port there.
Besides (in case I missed smth with code review) the calltrace
tcp_v6_syn_recv_sock
`- tcp_v4_syn_recv_sock
`- __inet_inherit_port
and the similar for DCCP are valid, but assumes BHs to be disabled.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need to send BSS changes to driver from beacons processed
during scanning. We are more interested in beacons from an AP with which
we are associated - these will still be used to send updates to driver as
the beacons are received without scanning.
This change·removes the requirement that bss_info_changed needs to be atomic.
The beacons received during scanning are processed from a tasklet, but if we
do not call bss_info_changed for these beacons there is no need for it to be
atomic. This function (bss_info_changed) is called either from workqueue or
ioctl in all other instances.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Tomas Winkler <tomas.winkler@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This change is necessary to allow cwnd to grow during persistent
reordering. Cwnd moderation is applied when in the disorder state
and an ack that fills the hole comes in. If the hole was greater
than 3 packets, but less than tp->reordering, cwnd will shrink when
it should not have.
Signed-off-by: John Heffner <jheffner@napa.(none)>
Signed-off-by: David S. Miller <davem@davemloft.net>
Protocol control sockets and netlink kernel sockets should not prevent the
namespace stop request. They are initialized and disposed in a special way by
sk_change_net/sk_release_kernel.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make release_net/hold_net noop for performance-hungry people. This is a debug
staff and should be used in the debug mode only.
Add check for net != NULL in hold/release calls. This will be required
later on.
[ Added minor simplifications suggested by Brian Haley. -DaveM ]
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This one is responsible for calling ->dellink on each net
device found in net to help with vlan net_exit hook in the
nearest future.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the elastic array of void * pointer to the struct net.
The access rules are simple:
1. register the ops with register_pernet_gen_device to get
the id of your private pointer
2. call net_assign_generic() to put the private data on the
struct net (most preferably this should be done in the
->init callback of the ops registered)
3. do not store any private reference on the net_generic array;
4. do not change this pointer while the net is alive;
5. use the net_generic() to get the pointer.
When adding a new pointer, I copy the old array, replace it
with a new one and schedule the old for kfree after an RCU
grace period.
Since the net_generic explores the net->gen array inside rcu
read section and once set the net->gen->ptr[x] pointer never
changes, this grants us a safe access to generic pointers.
Quoting Paul: "... RCU is protecting -only- the net_generic
structure that net_generic() is traversing, and the [pointer]
returned by net_generic() is protected by a reference counter
in the upper-level struct net."
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To make some per-net generic pointers, we need some way to address
them, i.e. - IDs. This is simple IDA-based IDs generator for pernet
subsystems.
Addressing questions about potential checkpoint/restart problems:
these IDs are "lite-offsets" within the net structure and are by no
means supposed to be exported to the userspace.
Since it will be used in the nearest future by devices only (tun,
vlan, tunnels, bridge, etc), I make it resemble the functionality
of register_pernet_device().
The new ids is stored in the *id pointer _before_ calling the init
callback to make this id available in this callback.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even kernel 2.2.26 (sic) already contains the
#undef CONFIG_IRLAN_SEND_GRATUITOUS_ARP
with the comment "but for some reason the machine crashes if you use DHCP".
Either someone finally looks into this or it's simply time to remove
this dead code.
Reported-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch modifies TIPC's socket code to follow the same approach
used by other protocols. This change eliminates the need for a
mutex in the TIPC-specific portion of the socket protocol data
structure -- in its place, the standard Linux socket backlog queue
and associated locking routines are utilized. These changes fix
a long-standing receive queue bug on SMP systems, and also enable
individual read and write threads to utilize a socket without
unnecessarily interfering with each other.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Directly call IPv4 and IPv6 variants where the address family is
easily known.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Connection tracking helpers (specifically FTP) need to be called
before NAT sequence numbers adjustments are performed to be able
to compare them against previously seen ones. We've introduced
two new hooks around 2.6.11 to maintain this ordering when NAT
modules were changed to get called from conntrack helpers directly.
The cost of netfilter hooks is quite high and sequence number
adjustments are only rarely needed however. Add a RCU-protected
sequence number adjustment function pointer and call it from
IPv4 conntrack after calling the helper.
Signed-off-by: Patrick McHardy <kaber@trash.net>
New extensions may only be added to unconfirmed conntracks to avoid races
when reallocating the storage.
Also change NF_CT_ASSERT to use WARN_ON to get backtraces.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Adding extensions to confirmed conntracks is not allowed to avoid races
on reallocation. Don't setup NAT for confirmed conntracks in case NAT
module is loaded late.
The has one side-effect, the connections existing before the NAT module
was loaded won't enter the bysource hash. The only case where this actually
makes a difference is in case of SNAT to a multirange where the IP before
NAT is also part of the range. Since old connections don't enter the
bysource hash the first new connection from the IP will have a new address
selected. This shouldn't matter at all.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Move to nf_nat_proto_common and rename to nf_nat_proto_... since they're
also used by protocols that don't have port numbers.
Signed-off-by: Patrick McHardy <kaber@trash.net>
This expresses __skb_append in terms of __skb_queue_after, exploiting that
__skb_append(old, new, list) = __skb_queue_after(list, old, new).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
And replace all its usage with init_net's socket.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And replace all its usage with init_net's socket.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to the overall struct net design, it will be
filled with DCCP-related members.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to create seq_operations for each instance of 'netstat'.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Smack doesn't have the need to create a private copy of the LSM "domain" when
setting NetLabel security attributes like SELinux, however, the current
NetLabel code requires a private copy of the LSM "domain". This patches fixes
that by letting the LSM determine how it wants to pass the domain value.
* NETLBL_SECATTR_DOMAIN_CPY
The current behavior, NetLabel assumes that the domain value is a copy and
frees it when done
* NETLBL_SECATTR_DOMAIN
New, Smack-friendly behavior, NetLabel assumes that the domain value is a
reference to a string managed by the LSM and does not free it when done
Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix 3 warnings about discarding const qualifiers:
net/sctp/ulpevent.c:862: warning: passing argument 1 of 'sctp_event2skb' discards qualifiers from pointer target type
net/sctp/sm_statefuns.c:4393: warning: passing argument 1 of 'SCTP_ASOC' discards qualifiers from pointer target type
net/sctp/socket.c:5874: warning: passing argument 1 of 'cmsg_nxthdr' discards qualifiers from pointer target type
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When receiving an error length INIT-ACK during COOKIE-WAIT,
a 0-vtag ABORT will be responsed. This action violates the
protocol apparently. This patch achieves the following things.
1 If the INIT-ACK contains all the fixed parameters, use init-tag
recorded from INIT-ACK as vtag.
2 If the INIT-ACK doesn't contain all the fixed parameters,
just reflect its vtag.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MIP6_OPT_PAD_X are actually for paddings in destination
option header. Replace them with our standard IPV6_TLV_PADX.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Allow the use of SACK and window scaling when syncookies are used
and the client supports tcp timestamps. Options are encoded into
the timestamp sent in the syn-ack and restored from the timestamp
echo when the ack is received.
Based on earlier work by Glenn Griffin.
This patch avoids increasing the size of structs by encoding TCP
options into the least significant bits of the timestamp and
by not using any 'timestamp offset'.
The downside is that the timestamp sent in the packet after the synack
will increase by several seconds.
changes since v1:
don't duplicate timestamp echo decoding function, put it into ipv4/syncookie.c
and have ipv6/syncookies.c use it.
Feedback from Glenn Griffin: fix line indented with spaces, kill redundant if ()
Reviewed-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Standlaone ip6_null_entry is no longer needed as it is replaced by
the ip6_null_entry member of ipv6 (instance of struct netns_ipv6) in
struct net (as a result of Network Namespaces patches).
2) These 3 methods from this same header are not defined anywhere:
ip6_rt_addr_add(), ip6_rt_addr_del(), rt6_sndmsg()
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes two unused method declarations in
include/net/ndisc.h: ndisc_forwarding_on(void) and
ndisc_forwarding_off(void);
Also igmp6_cleanup(void) appears twice in this header, so one
igmp6_cleanup(void) declaration is removed.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sk_filter function is too big to be inlined. This saves 2296 bytes
of text on allyesconfig.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new API to MAC80211 to allow low level driver to
notify MAC with driver status.
Signed-off-by: Mohamed Abbas <mabbas@linux.intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch is necessary for the upcoming Accesspoint patch for p54.
Signed-off-by: Christian Lamparter <chunkeey@web.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds assocation capability, timestamp (tsf) and beacon interval
to bss_conf. This is required for successful assocation of iwlwifi drivers
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch eliminates the use of conf_ht, replacing it with
bss_info_changed.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This fixes Bugzilla #10384
tcp_simple_retransmit does L increment without any checking
whatsoever for overflowing S+L when Reno is in use.
The simplest scenario I can currently think of is rather
complex in practice (there might be some more straightforward
cases though). Ie., if mss is reduced during mtu probing, it
may end up marking everything lost and if some duplicate ACKs
arrived prior to that sacked_out will be non-zero as well,
leading to S+L > packets_out, tcp_clean_rtx_queue on the next
cumulative ACK or tcp_fastretrans_alert on the next duplicate
ACK will fix the S counter.
More straightforward (but questionable) solution would be to
just call tcp_reset_reno_sack() in tcp_simple_retransmit but
it would negatively impact the probe's retransmission, ie.,
the retransmissions would not occur if some duplicate ACKs
had arrived.
So I had to add reno sacked_out reseting to CA_Loss state
when the first cumulative ACK arrives (this stale sacked_out
might actually be the explanation for the reports of left_out
overflows in kernel prior to 2.6.23 and S+L overflow reports
of 2.6.24). However, this alone won't be enough to fix kernel
before 2.6.24 because it is building on top of the commit
1b6d427bb7 ([TCP]: Reduce sacked_out with reno when purging
write_queue) to keep the sacked_out from overflowing.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: Alessandro Suardi <alessandro.suardi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a generic requirement, so make inet_ctl_sock_create namespace
aware and create a inet_ctl_sock_destroy wrapper around
sk_release_kernel.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All upper protocol layers are already use sock internally.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This call is nothing common with INET connection sockets code. It
simply creates an unhashes kernel sockets for protocol messages.
Move the new call into af_inet.c after the rename.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This does not look good, but there is no other choice. The compilation
without CONFIG_NET is broken and can not be fixed with ease.
After that there is no need for the following commits:
1567ca7eec3edf8fa5cc2d38f9a4f8
Revert them.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch updates the Linux the Intra-Site Automatic Tunnel Addressing
Protocol (ISATAP) implementation. It places the ISATAP potential router
list (PRL) in the kernel and adds three new private ioctls for PRL
management.
[Add several changes of structure name, constant names etc. - yoshfuji]
Signed-off-by: Fred L. Templin <fred.l.templin@boeing.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Allocate the skb for llc responses with the received packet size by
using the size adjustable llc_frame_alloc.
Don't allocate useless extra payload.
Cleanup magic numbers.
So, this fixes oops.
Reported by Jim Westfall:
kernel: skb_over_panic: text:c0541fc7 len:1000 put:997 head:c166ac00 data:c166ac2f tail:0xc166b017 end:0xc166ac80 dev:eth0
kernel: ------------[ cut here ]------------
kernel: kernel BUG at net/core/skbuff.c:95!
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Such an accounting would cost us two more dereferences to get the
percpu variable from the struct net, so I make sock_prot_inuse_get
and _add calls work differently depending on CONFIG_NET_NS - without
it old optimized routines are used.
The per-cpu counter for init_net is prepared in core_initcall, so
that even af_inet, that starts as fs_initcall, will already have the
init_net prepared.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This counter is about to become per-proto-and-per-net, so we'll need
two arguments to determine which cell in this "table" to work with.
All the places, but proc already pass proper net to it - proc will be
tuned a bit later.
Some indentation with spaces in proc files is done to keep the file
coding style consistent.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's already some stuff on the struct net, that should better
be folded into netns_core structure. I'm making the per-proto inuse
counter be per-net also, which is also a candidate for this, so
introduce this structure and populate it a bit.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to create seq_operations for each instance of 'netstat'.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
An uppercut - do not use the pcounter on struct proto.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Constructive part of the set is finished here. We have to remove the
pcounter, so start with its init and free functions.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And redirect sock_prot_inuse_add and _get to use one.
As far as the dereferences are concerned. Before the patch we made
1 dereference to proto->inuse.add call, the call itself and then
called the __get_cpu_var() on a static variable. After the patch we
make a direct call, then one dereference to proto->inuse_idx and
then the same __get_cpu_var() on a still static variable. So this
patch doesn't seem to produce performance penalty on SMP.
This is not per-net yet, but I will deliberately make NET_NS=y case
separated from NET_NS=n one, since it'll cost us one-or-two more
dereferences to get the struct net and the inuse counter.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The inuse counters are going to become a per-cpu array. Introduce an
index for this array on the struct proto.
To handle the case of proto register-unregister-register loop the
bitmap is used. All its bits manipulations are protected with
proto_list_lock and a sanity check for the bitmap being exhausted is
also added.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On Fri, 2008-03-28 at 03:24 -0700, Andrew Morton wrote:
> they should all be renamed.
Done for include/net and net
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kill unnecessary llc_station_mac_sa.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patches removes unused declaration of addrconf_forwarding_on() method
in include/net/addrconf.h.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With a was number of callsites sctp_add_cmd_sf wrapper bloats
kernel by some amount. Due to unlikely tracking allyesconfig,
with the initial result were around ~7kB (thus caught my
attention) while a non-debug config produced only ~2.3kB effect.
I (ij) proposed first a patch to uninline it but Vlad responded
with a patch that removed the only sctp_add_cmd call which is
wrapped by sctp_add_cmd_sf (I wasn't sure if I could do that).
I did minor cleanup to Vlad's patch.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes three unused method declarations in include/net/ipv6.h:
inet_getfrag_t(), ipv6_build_nfrag_opts() and ipv6_build_frag_opts().
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes ieee80211_get_channel a static inline defined in
cfg80211's header file which simply calls __ieee80211_get_channel
to avoid symbol clashes with the ieee80211 code.
The problem was pointed out by David Miller, thanks!
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch introduces a new member, fl_net, in struct ip6_flowlabel.
This allows to create labels with the same value in different namespaces.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of the network namespace information to have this protocol to
handle several network namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IPv6 BEET output function is incorrectly including the inner
header in the payload to be protected. This causes a crash as
the packet doesn't actually have that many bytes for a second
header.
The IPv4 BEET output on the other hand is broken when it comes
to handling an inner IPv6 header since it always assumes an
inner IPv4 header.
This patch fixes both by making sure that neither BEET output
function touches the inner header at all. All access is now
done through the protocol-independent cb structure. Two new
attributes are added to make this work, the IP header length
and the IPv4 option length. They're filled in by the inner
mode's output function.
Thanks to Joakim Koskela for finding this problem.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add some flesh to ipv4_sysctl_init_net and ipv4_sysctl_exit_net,
i.e. copy the table, alter .data pointers and register it per-net.
Other ipv4_table's sysctls are now global, but this is going to
change once sysctl permissions patches migrate from -mm tree to
mainline in 2.6.26 merge window :)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initialization is moved to icmp_sk_init, all the places, that
refer to them use init_net for now.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent commits from YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
have been introduced a several compilation warnings
'assignment discards qualifiers from pointer target type'
due to extra const modifier in the inline call parameters of
{dev|sock|twsk}_net_set.
Drop it.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for multiple media channels and use it to create
expectations for video streams when present.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create expectations for incoming signalling connections when seeing
a REGISTER request. This is needed when the registrar uses a
different source port number for signalling messages and for receiving
incoming calls from other endpoints than the registrar.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce expectation classes and policies. An expectation class
is used to distinguish different types of expectations by the
same helper (for example audio/video/t.120). The expectation
policy is used to hold the maximum number of expectations and
the initial timeout for each class.
The individual classes are isolated from each other, which means
that for example an audio expectation will only evict other audio
expectations.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is useful for the SIP helper and signalling expectations.
We don't want to create a full-blown expectation with a wildcard
as source based on a single UDP packet, but need to know the
final port anyways. With inactive expectations we can register
the expectation and reserve the tuple, but wait for confirmation
from the registrar before activating it.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
NF_CT_TUPLE_DUMP prints IPv4 addresses as IPv6, fix this and use printk
(guarded by #ifdef DEBUG) directly instead of pr_debug since the tuple
is usually printed at the end of line and we don't want to include a
log-level.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ieee80211_get_channel() which gets you a channel struct for a
specific wiphy if that channel is present in that wiphy.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch makes mac80211 able to send a phase1 key for TKIP
decryption.
This is needed for drivers that don't do the rekeying by themselves
(i.e. iwlwifi). Upon IV16 wrap around, the packet is decrypted in SW,
if decryption is ok, mac80211 calls to update_tkip_key with a new
phase 1 RX key.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch makes mac80211 able to compute a TKIP key from an skb.
The requested key can be a phase 1 or a phase 2 key.
This is useful for drivers who need to provide tkip key to their
HW to enable HW encryption.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Introduce an inline net_eq() to compare two namespaces.
Without CONFIG_NET_NS, since no namespace other than &init_net
exists, it is always 1.
We do not need to convert 1) inline vs inline and
2) inline vs &init_net comparisons.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Introduce neigh_parms/pneigh_entry inlines: neigh_parms_net(), pneigh_net().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Introduce per-sock inlines: sock_net(), sock_net_set()
and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Last part of hop-limit determination is always:
hoplimit = dst_metric(dst, RTAX_HOPLIMIT);
if (hoplimit < 0)
hoplimit = ipv6_get_hoplimit(dst->dev).
Let's consolidate it as ip6_dst_hoplimit(dst).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Each MIPv6 XFRM state (DSTOPT/RH2) holds either destination or source
address to be mangled in the IPv6 header (that is "CoA").
On Inter-MN communication after both nodes binds each other,
they use route optimized traffic two MIPv6 states applied, and
both source and destination address in the IPv6 header
are replaced by the states respectively.
The packet format is correct, however, next-hop routing search
are not.
This patch fixes it by remembering address pairs for later states.
Based on patch from Masahide NAKAMURA <nakam@linux-ipv6.org>.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
IP layer now can handle multiple namespaces normally. So, process such
packets normally and drop them only if the transport layer is not
aware about namespaces.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_options_compile uses inet_addr_type which requires a namespace. The
packet argument is optional, so parameter is the only way to obtain
it. Pass the init_net there for now.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Proxy neighbors do not have any reference counting, so any caller
of pneigh_lookup (unless it's a netlink triggered add/del routine)
should _not_ perform any actions on the found proxy entry.
There's one exception from this rule - the ipv6's ndisc_recv_ns()
uses found entry to check the flags for NTF_ROUTER.
This creates a race between the ndisc and pneigh_delete - after
the pneigh is returned to the caller, the nd_tbl.lock is dropped
and the deleting procedure may proceed.
One of the fixes would be to add a reference counting, but this
problem exists for ndisc only. Besides such a patch would be too
big for -rc4.
So I propose to introduce a __pneigh_lookup() which is supposed
to be called with the lock held and use it in ndisc code to check
the flags on alive pneigh entry.
Changes from v2:
As David noticed, Exported the __pneigh_lookup() to ipv6 module.
The checkpatch generates a warning on it, since the EXPORT_SYMBOL
does not follow the symbol itself, but in this file all the
exports come at the end, so I decided no to break this harmony.
Changes from v1:
Fixed comments from YOSHIFUJI - indentation of prototype in header
and the pndisc_check_router() name - and a compilation fix, pointed
by Daniel - the is_routed was (falsely) considered as uninitialized
by gcc.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_datamsg_free and sctp_datamsg_track are just aliases for
sctp_datamsg_put and sctp_chunk_hold, respectively.
Saves 32 Bytes on x86.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
the first u32 copied from syncookie_secret is overwritten by the
minute-counter four lines below. After adjusting the destination
address, the size of syncookie_secret can be reduced accordingly.
AFAICS, the only other user of syncookie_secret[] is the ipv6
syncookie support. Because ipv6 syncookies only grab 44 bytes from
syncookie_secret[], this shouldn't affect them in any way.
With fixes from Glenn Griffin.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Glenn Griffin <ggriffin.kernel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sorry for the patch sequence confusion :| but I found that the similar
thing can be done for raw sockets easily too late.
Expand the proto.h union with the raw_hashinfo member and use it in
raw_prot and rawv6_prot. This allows to drop the protocol specific
versions of hash and unhash callbacks.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
After this we have only udp_lib_get_port to get the port and two
stubs for ipv4 and ipv6. No difference in udp and udplite except
for initialized h.udp_hash member.
I tried to find a graceful way to drop the only difference between
udp_v4_get_port and udp_v6_get_port (i.e. the rcv_saddr comparison
routine), but adding one more callback on the struct proto didn't
appear such :( Maybe later.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Inspired by the commit ab1e0a13 ([SOCK] proto: Add hashinfo member to
struct proto) from Arnaldo, I made similar thing for UDP/-Lite IPv4
and -v6 protocols.
The result is not that exciting, but it removes some levels of
indirection in udpxxx_get_port and saves some space in code and text.
The first step is to union existing hashinfo and new udp_hash on the
struct proto and give a name to this union, since future initialization
of tcpxxx_prot, dccp_vx_protinfo and udpxxx_protinfo will cause gcc
warning about inability to initialize anonymous member this way.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_options->is_data is assigned only and never checked. The structure is
not a part of kernel interface to the userspace. So, it is safe to remove
this field.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change TCP_DEFER_ACCEPT implementation so that it transitions a
connection to ESTABLISHED after handshake is complete instead of
leaving it in SYN-RECV until some data arrvies. Place connection in
accept queue when first data packet arrives from slow path.
Benefits:
- established connection is now reset if it never makes it
to the accept queue
- diagnostic state of established matches with the packet traces
showing completed handshake
- TCP_DEFER_ACCEPT timeouts are expressed in seconds and can now be
enforced with reasonable accuracy instead of rounding up to next
exponential back-off of syn-ack retry.
Signed-off-by: Patrick McManus <mcmanus@ducksong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the inline trick (same as pr_debug) to get checking of debug
statements even if no code is generated.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduced by 270637abff
("[SCTP]: Fix a race between module load and protosw access")
Reported by Gabriel C:
In file included from net/sctp/sm_statetable.c:50:
include/net/sctp/sctp.h: In function 'sctp_v6_pf_init':
include/net/sctp/sctp.h:392: warning: 'return' with a value, in function returning void
In file included from net/sctp/sm_statefuns.c:62:
include/net/sctp/sctp.h: In function 'sctp_v6_pf_init':
include/net/sctp/sctp.h:392: warning: 'return' with a value, in function returning void
...
Signed-off-by: David S. Miller <davem@davemloft.net>
The proc init/exit functions take a new network namespace parameter in
order to register/unregister /proc/net/udp6 for a namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch, like udp proc, makes the proc functions to take care of
which namespace the socket belongs.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the common udp proc functions to take care of which
socket they should show taking into account the namespace it belongs.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update: My mailer ate one of Jarek's feedback mails... Fixed the
parameter in netif_set_gso_max_size() to be u32, not u16. Fixed the
whitespace issue due to a patch import botch. Changed the types from
u32 to unsigned int to be more consistent with other variables in the
area. Also brought the patch up to the latest net-2.6.26 tree.
Update: Made gso_max_size container 32 bits, not 16. Moved the
location of gso_max_size within netdev to be less hotpath. Made more
consistent names between the sock and netdev layers, and added a
define for the max GSO size.
Update: Respun for net-2.6.26 tree.
Update: changed max_gso_frame_size and sk_gso_max_size from signed to
unsigned - thanks Stephen!
This patch adds the ability for device drivers to control the size of
the TSO frames being sent to them, per TCP connection. By setting the
netdevice's gso_max_size value, the socket layer will set the GSO
frame size based on that value. This will propogate into the TCP
layer, and send TSO's of that size to the hardware.
This can be desirable to help tune the bursty nature of TSO on a
per-adapter basis, where one may have 1 GbE and 10 GbE devices
coexisting in a system, one running multiqueue and the other not, etc.
This can also be desirable for devices that cannot support full 64 KB
TSO's, but still want to benefit from some level of segmentation
offloading.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a race is SCTP between the loading of the module
and the access by the socket layer to the protocol functions.
In particular, a list of addresss that SCTP maintains is
not initialized prior to the registration with the protosw.
Thus it is possible for a user application to gain access
to SCTP functions before everything has been initialized.
The problem shows up as odd crashes during connection
initializtion when we try to access the SCTP address list.
The solution is to refactor how we do registration and
initialize the lists prior to registering with the protosw.
Care must be taken since the address list initialization
depends on some other pieces of SCTP initialization. Also
the clean-up in case of failure now also needs to be refactored.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Comparing with kernel 2.6.24, tbench result has regression with
2.6.25-rc1.
1) On 2 quad-core processor stoakley: 4%.
2) On 4 quad-core processor tigerton: more than 30%.
bisect located below patch.
b4ce92775c is first bad commit
commit b4ce92775c
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue Nov 13 21:33:32 2007 -0800
[IPV6]: Move nfheader_len into rt6_info
The dst member nfheader_len is only used by IPv6. It's also currently
creating a rather ugly alignment hole in struct dst. Therefore this patch
moves it from there into struct rt6_info.
Above patch changes the cache line alignment, especially member
__refcnt. I did a testing by adding 2 unsigned long pading before
lastuse, so the 3 members, lastuse/__refcnt/__use, are moved to next
cache line. The performance is recovered.
I created a patch to rearrange the members in struct dst_entry.
With Eric and Valdis Kletnieks's suggestion, I made finer arrangement.
1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So
sizeof(dst_entry)=200 no matter if CONFIG_NET_CLS_ROUTE=y/n. I
tested many patches on my 16-core tigerton by moving tclassid to
different place. It looks like tclassid could also have impact on
performance. If moving tclassid before metrics, or just don't move
tclassid, the performance isn't good. So I move it behind metrics.
2) Add comments before __refcnt.
On 16-core tigerton:
If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18%
better than the one without the patch;
If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30%
better than the one without the patch.
With 32bit 2.6.25-rc1 on 8-core stoakley, the new patch doesn't
introduce regression.
Thank Eric, Valdis, and David!
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's a horrible slab abuse in net/netfilter/nf_conntrack_extend.c
that can be replaced with a call to ksize().
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch clarifies the use of IEEE80211_TXCTL_OFDM_HT flag.
Can by united with patch "mac80211: adding mac80211_tx_control
flags and HT flags"
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch makes enum from the defines previously dwelled inside
ieee80211_tx_control for better readability.
The patch also addes HT flags, for 802.11n drivers:
- IEEE80211_TXCTL_OFDM_HT: request low-level driver to use HT OFDM rates
- IEEE80211_TXCTL_GREEN_FIELD: use green field protection
- IEEE80211_TXCTL_DUP_DATA: duplicate data on both 20 Mhz channels
- IEEE80211_TXCTL_40_MHZ_WIDTH: send this frame in 40Mhz width
- IEEE80211_TXCTL_SHORT_GI: send this frame with short guard interval
Tx command can be a combination of any of these flags, along with
bitrate represented by ieee80211_rate. this will allow legacy drivers to
switch easily to any 11n rate representation.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
CC: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch make use of the network namespace information at the right
places to handle the multicast for several network namespaces. It
makes the socket control to be per namespace too.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of having a tcp6_socket global to all the namespace, there is
tcp6 socket control per namespace. That is consistent with which
namespace sent a RST and allows to pass the socket to the underlying
function to retrieve the network namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make ndisc socket control per namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current /proc/net is done with so called "shadows", but current
implementation is broken and has little chances to get fixed.
The problem is that dentries subtree of /proc/net directory has
fancy revalidation rules to make processes living in different
net namespaces see different entries in /proc/net subtree, but
currently, tasks see in the /proc/net subdir the contents of any
other namespace, depending on who opened the file first.
The proposed fix is to turn /proc/net into a symlink, which points
to /proc/self/net, which in turn shows what previously was in
/proc/net - the network-related info, from the net namespace the
appropriate task lives in.
# ls -l /proc/net
lrwxrwxrwx 1 root root 8 Mar 5 15:17 /proc/net -> self/net
In other words - this behaves like /proc/mounts, but unlike
"mounts", "net" is not a file, but a directory.
Changes from v2:
* Fixed discrepancy of /proc/net nlink count and selinux labeling
screwup pointed out by Stephen.
To get the correct nlink count the ->getattr callback for /proc/net
is overridden to read one from the net->proc_net entry.
To make selinux still work the net->proc_net entry is initialized
properly, i.e. with the "net" name and the proc_net parent.
Selinux fixes are
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Changes from v1:
* Fixed a task_struct leak in get_proc_task_net, pointed out by Paul.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit db1ed684f6 ("[IPV6]
UDP: Rename IPv6 UDP files."), commit
8be8af8fa4 ("[IPV4] UDP: Move
IPv4-specific bits to other file.") and commit
e898d4db27 ("[UDP]: Allow users to
configure UDP-Lite.").
First, udplite is of such small cost, and it is a core protocol just
like TCP and normal UDP are.
We spent enormous amounts of effort to make udplite share as much code
with core UDP as possible. All of that work is less valuable if we're
just going to slap a config option on udplite support.
It is also causing build failures, as reported on linux-next, showing
that the changeset was not tested very well. In fact, this is the
second build failure resulting from the udplite change.
Finally, the config options provided was a bool, instead of a modular
option. Meaning the udplite code does not even get build tested
by allmodconfig builds, and furthermore the user is not presented
with a reasonable modular build option which is particularly needed
by distribution vendors.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch eliminates warnings about undeclared symbols.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes code associated with optional, user-specified
fields of the TIPC message header. Such fields were never
utilized by TIPC, and have now been removed from the protocol
specification.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quite a while ago I started this book. The required kernel-doc
patches have since gone into the tree so it is now possible to
build the book in mainline.
The actual documentation is still rather incomplete and not all
things are linked into the book, but this enables us to edit
the documentation collaboratively, hopefully driver authors can
add documentation based on their experience with mac80211.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Various cleanups, reducing the #ifdef mess and other things.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Added support for mesh id and mesh path operation as well as
station structure dumping.
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
hci_sock_cleanup() always returns 0 and its return value isn't used
anywhere in the code.
Compile-tested with 'make allyesconfig && make net/bluetooth/bluetooth.ko'
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(Anonymous) unions can help us to avoid ugly casts.
A common cast it the (struct rtable *)skb->dst one.
Defining an union like :
union {
struct dst_entry *dst;
struct rtable *rtable;
};
permits to use skb->rtable in place.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an netns parameter to ip6_route_output. That will allow to access
to the right routing table for outgoing traffic.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch propagates the network namespace pointer to the address
configuration routines which need it, which means adding a new
parameter to these functions, and make them use it instead of using
the initial network namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
If all of the entropy is in the local and foreign addresses,
but xor'ing together would cancel out that entropy, the
current hash performs poorly.
Suggested by Cosmin Ratiu:
Basically, the situation is as follows: There is a client
machine and a server machine. Both create 15000 virtual
interfaces, open up a socket for each pair of interfaces and
do SIP traffic. By profiling I noticed that there is a lot of
time spent walking the established hash chains with this
particular setup.
The addresses were distributed like this: client interfaces
were 198.18.0.1/16 with increments of 1 and server interfaces
were 198.18.128.1/16 with increments of 1. As I said, there
were 15000 interfaces. Source and destination ports were 5060
for each connection. So in this case, ports don't matter for
hashing purposes, and the bits from the address pairs used
cancel each other, meaning there are no differences in the
whole lot of pairs, so they all end up in the same hash chain.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the necessary changes to make IPv6 dst_entry garbage
collection work with multiple network namespaces.
In ip6_dst_gc(), static local variables are now declared
per-namespace.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ip6_dst_ops is moved inside the network namespace structure. All
references to this structure are now relative to the initial network
namespace.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rt6_info structures are moved inside the network namespace
structure. All references to these structures are now relative to the
initial network namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch make mindless changes and prepares the code to use dynamic
allocation for rt6_info structure. The code accesses the rt6_info
structure as a pointer instead of a global static variable.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the routing engine use the network namespaces to
access routing informations: Add a network namespace parameter to
ipv6_route_ioctl and propagate the network namespace value to all the
routing code that have not yet been changed.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a network namespace parameter to rt6_purge_dflt_routers. This is
needed to call fib6_get_table with the appropriate network namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a network namespace parameter to rt6_lookup().
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rt6_stats is now per namespace with this patch. It is allocated
when a network namespace is created and freed when the network
namespace exits and references are relative to the network namespace.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allocates the rt6_stats struct dynamically when the fib6 is
initialized. That provides the ability to create several instances of
this structure for the network namespaces.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fib6_rules_ops is moved to the network namespace structure. All
references are changed to have it relatively to it.
Each time a network namespace is created a new fib6_rules_ops is
allocated, initialized and stored into the network namespace
structure.
The common part of the fib rules is namespace aware, so it is quite
easy to retrieve the network namespace from the rules and use it in
the different callbacks.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the timer initialization at the network namespace creation and
store the network namespace in the timer argument.
That enables multiple timers (one per network namespace) to do garbage
collecting.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fib tables are now relative to the network namespace. When the
garbage collector timer expires, we must have a network namespace
parameter in order to retrieve the tables. For now this is the
init_net, but we should be able to have a timer per namespace and use
the timer callback parameter to pass the network namespace from the
expired timer.
The timer callback, fib6_run_gc, is actually used to be called
synchronously by some functions and asynchronously when the timer
expires.
When the timer expires, the delay specified for fib6_run_gc parameter
is always zero. So, I changed fib6_run_gc to not be a timer callback
but a function called by the timer callback and I added a timer
callback where its work is just to retrieve from the data arg of the
timer the network namespace and call fib6_run_gc with zero expiring
time and the network namespace parameters. That makes the code cleaner
for the fib6_run_gc callers.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function fib6_clean_all takes the network namespace as
parameter. That allows to flush the routes related to a specific
network namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fib table for ipv6 are moved to the network namespace structure.
All references to them are made relatively to the network namespace.
All external calls to the ip6_fib functions taking the network
namespace parameter are made using the init_net variable, so the
ip6_fib engine is ready for the namespaces but the callers not yet.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
For later use, this patch is renaming ndisc_dst_alloc()
(and related function/structures) to icmp6_dst_alloc()
(and so on). This patch also removing unused function-
pointer argument for it.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Since most users of ipv6_get_saddr() pass non-NULL as
dst argument, use ipv6_dev_get_saddr() directly.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
neigh_is_connected() is not popular at all, and the only user
drivers/net/cxgb3/l2t.c:t3_l2t_update() also have raw (expanded) expression.
Let's expand it and remove the inline function.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Updated to incorporate Eric's suggestion of using a per cpu buffer
rather than allocating on the stack. Just a two line change, but will
resend in it's entirety.
Signed-off-by: Glenn Griffin <ggriffin.kernel@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
This patch clarifies the use of the irqsafe vs. non-irq-safe
functions and their respective locking requirements.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This reworks the code for TX filtered frames, splitting it out to
a new function to handle those cases, making the clear instruction
a flag and renaming a few things to be easier to understand and
less Atheros hardware specific. Finally, it also makes the comments
explain more.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
enable IBSS cell merging. if an IBSS beacon with the same channel, same ESSID
and a TSF higher than the local TSF (mactime) is received, we have to join its
BSSID. while this might not be immediately apparent from reading the 802.11
standard it is compliant and necessary to make IBSS mode functional in many
cases. most drivers have a similar behaviour.
* move the relevant code section (previously only containing debug code) down
to the end of the function, so we can reuse the bss structure.
* we have to compare the mactime (TSF at the time of packet receive) rather
than the current TSF. since mactime is defined as the time the first data
symbol arrived we add the time until byte 24 where the timestamp resides, since
this is how the beacon timestamp is defined. as some some drivers are not able
to give a reliable mactime we fall back to use the current TSF, which will be
enough to catch most (but not all) cases where an IBSS merge is necessary.
* in IBSS mode we want to allow beacons to override probe response info so we
can correctly do merges.
* we don't only configure beacons based on scan results, so change that
message.
* to enable this we have to let all beacons thru in IBSS mode, even if they
have a different BSSID.
Signed-off-by: Bruno Randolf <bruno@thinktube.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
define mactime as the time when the first data symbol arrived at the HW. the
old definition was questionable because 802.11 defines timestamp only for
beacon and probe response frames, and there it means the timestamp field.
a stricter definition of mactime is necessary for correct merging of IBSS.
note that it is up to the driver to convert whatever its hardware returns to
this definition. unfortunately we don't know for example when atheros hardware
takes its rx timestamp exactly :(
Signed-off-by: Bruno Randolf <bruno@thinktube.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This extends the filter flags documentation to make it clear
what clearing a flag really means.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This changes mac80211 to pass the burst time to conf_tx in txop
units rather than 0.1msec units. 0.1msec units are only required
by atheros hardware (according to current driver support), all
other drivers do other calculations or require the txop value.
Therefore, it results in fewer calculations and more precision
if we just pass the txop value through to the driver.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This allows precise control over what a monitor interface shows.
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch creates new cfg80211 wiphy API for channel and bitrate
registration and converts mac80211 and drivers to the new API. The
old mac80211 API is completely ripped out. All drivers (except ath5k)
are updated to the new API, in many cases I expect that optimisations
can be done.
Along with the regulatory code I've also ripped out the
IEEE80211_HW_DEFAULT_REG_DOMAIN_CONFIGURED flag, I believe it to be
unnecessary if the hardware simply gives us whatever channels it wants
to support and we then enable/disable them as required, which is pretty
much required for travelling.
Additionally, the patch adds proper "basic" rate handling for STA
mode interface, AP mode interface will have to have new API added
to allow userspace to set the basic rate set, currently it'll be
empty... However, the basic rate handling will need to be moved to
the BSS conf stuff.
I do expect there to be bugs in this, especially wrt. transmit
power handling where I'm basically clueless about how it should work.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds fields to ieee80211_tx_status in order to allow block ack
information exchange between low-level driver,mac80211 and rate scaling
module.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch allows qdisc support in A-MPDU Tx. a method to
handle QoS <-> TID switches is present in this patch.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds the API for 3 stages in A-MPDU Tx session flow:
- request mac80211 to start/stop A-MPDU Tx session for specific TID. such a
request should be issued by a load aware element, either mac80211 itself
or external element.
- requests by mac80211 to low-level driver to start/stop Tx aggregation.
notice that low level driver responds now with Starting Sequence Number.
- async feedback by low-level to mac80211 to inform that HW is ready for
next A-MPDU Tx state.
Changes in API to Rx A-MPDU were also made, reflected in iwlwifi changes as
well.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
After the patch:
$ git-grep llc_addrany | wc -l
0
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the patch:
$ git-grep sctp_sysctl_jiffies_ms | wc -l
0
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It looks like dst parameter is used in this API due to historical
reasons. Actually, it is really used in the direct call to
tcp_v4_send_synack only. So, create a wrapper for tcp_v4_send_synack
and remove dst from rtx_syn_ack.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 3873 specifies several MIB objects that can't be obtained by the
current data set exported by /proc/sys/net/sctp/assoc. This patch
adds the missing pieces of data that allow us to compute all the
objects in the sctpAssocTable object.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All preparations are done. Now just add a hook to perform an
initialization on namespace startup and replace icmpv6_sk macro with
proper inline call. Actual namespace the packet belongs too will be
passed later along with the one for the routing.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All preparations are done. Now just add a hook to perform an
initialization on namespace startup and replace icmp_sk macro with
proper inline call.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This staff will be needed for non-netlink kernel sockets, which should
also not pin a namespace like tcp_socket and icmp_socket.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
icmp_init could fail and this is normal for namespace other than initial.
So, the panic should be triggered only on init_net initialization path.
Additionally create rollback path for icmp_init as a separate function.
It will also be used later during namespace destruction.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct net_proto_family* is not used in icmp[v6]_init, ndisc_init,
igmp_init and tcp_v4_init. Remove it.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change xfrm_policy and xfrm_state walking algorithm from O(n^2) to O(n).
This is achieved adding the entries to one more list which is used
solely for walking the entries.
This also fixes some races where the dump can have duplicate or missing
entries when the SPD/SADB is modified during an ongoing dump.
Dumping SADB with 20000 entries using "time ip xfrm state" the sys
time dropped from 1.012s to 0.080s.
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Skip the prefix length matching in source address selection for
orchid -> non-orchid addresses.
Overlay Routable Cryptographic Hash IDentifiers (RFC 4843,
2001:10::/28) are currenty not globally reachable. Without this
check a host with an ORCHID address can end up preferring those over
regular addresses when talking to other regular hosts in the 2001::/16
range thus breaking non-orchid connections.
Signed-off-by: Juha-Matti Tapio <jmtapio@verkkotelakka.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new SCTP socket api (draft 16) updates the AUTH API structures.
We never exported these since we knew they would change.
Update the rest to match the draft.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Use the added dev_alloc_name() call to create tunnel device name,
rather than iterate in a hand-made loop with an artificial limit.
Thanks Patrick for noticing this.
[ The way this works is, when the device is actually registered,
the generic code noticed the '%' in the name and invokes
dev_alloc_name() to fully resolve the name. -DaveM ]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing structure kernel-doc descriptions to sock.h & skbuff.h
to fix kernel-doc warnings.
(I think that Stephen H. sent a similar patch, but I can't find it.
I just want to kill the warnings, with either patch.)
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro spotted a bogus use of u64 on the input sequence number which
is big-endian. This patch fixes it by giving the input sequence number
its own member in the xfrm_skb_cb structure.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes unused declaration of dflt_rt_lookup() method in
include/net/ndisc.h
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes current use of: init_timer(), add_timer()
and del_timer() to setup_timer() with mod_timer(), which
should be safer anyway.
Reported-by: Jann Traschewski <jann@gmx.de>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to one of Jann's OOPS reports it looks like
BUG_ON(timer_pending(timer)) triggers during add_timer()
in ax25_start_t1timer(). This patch changes current use
of: init_timer(), add_timer() and del_timer() to
setup_timer() with mod_timer(), which should be safer
anyway.
Reported-by: Jann Traschewski <jann@gmx.de>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All these static inlines are unused:
in_own_zone 1 (net/tipc/addr.h)
msg_dataoctet 1 (net/tipc/msg.h)
msg_direct 1 (include/net/tipc/tipc_msg.h)
msg_options 1 (include/net/tipc/tipc_msg.h)
tipc_nmap_get 1 (net/tipc/bcast.h)
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes some unused definitions and one method typedef
declaration (f_pnode)
in include/net/ip6_fib.h, as they are not used in the kernel.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove IP6_RT_PRIO_FW and IP6_RT_FLOW_MASK definitions in
include/net/ip6_route.h, as they are not used in the kernel.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ->move operation has two bugs:
- It is called with the same extension as source and destination,
so it doesn't update the new extension.
- The address of the old extension is calculated incorrectly,
instead of (void *)ct->ext + ct->ext->offset[i] it uses
ct->ext + ct->ext->offset[i].
Fixes a crash on x86_64 reported by Chuck Ebbert <cebbert@redhat.com>
and Thomas Woerner <twoerner@redhat.com>.
Tested-by: Thomas Woerner <twoerner@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This merges the mux.c (including the connection interface) with trans_fd
in preparation for transport API changes. Ultimately, trans_fd will need
to be rewritten to clean it up and simplify the implementation, but this
reorganization is viewed as the first step.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
GDM gets unhappy if /var/gdm doesn't have the sticky bit set. This patch adds
support for the sticky bit in much the same way setuid/setgid is supported.
With this patch, I can launch X from a v9fs rootfs (although I quickly run out
of fds in the server once gnome starts up).
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
This replaces the console-based virto client with a block-based
client using a single request queue.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Add a new transport function which allows a cut-thru directly to
the transport instead of processing request through the mux if the
cut-thru exists.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Add a new set of configuration functions to the NetLabel/LSM API so that
LSMs can perform their own configuration of the NetLabel subsystem without
relying on assistance from userspace.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I was notified by Randy Stewart that lksctp claims to be
"the reference implementation". First of all, "the
refrence implementation" was the original implementation
of SCTP in usersapce written ty Randy and a few others.
Second, after looking at the definiton of 'reference implementation',
we don't really meet the requirements.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
The port offset calculations depend on the protocol family, but, as
Adrian noticed, I broke this logic with the commit
5ee31fc1ec
[INET]: Consolidate inet(6)_hash_connect.
Return this logic back, by passing the port offset directly into the
consolidated function.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Noticed-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move ipv6_icmp_sysctl_init and ipv6_route_sysctl_init into the right
ifdef section otherwise that does not compile when CONFIG_SYSCTL=yes
and CONFIG_PROC_FS=no
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
compile error building without CONFIG_FS_PROC:
net/ipv4/fib_frontend.c: In function 'fib_net_init':
net/ipv4/fib_frontend.c:1032: error: implicit declaration of function 'fib_proc_
init'
net/ipv4/fib_frontend.c: In function 'fib_net_exit':
net/ipv4/fib_frontend.c:1047: error: implicit declaration of function 'fib_proc_
exit'
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This way we can remove TCP and DCCP specific versions of
sk->sk_prot->get_port: both v4 and v6 use inet_csk_get_port
sk->sk_prot->hash: inet_hash is directly used, only v6 need
a specific version to deal with mapped sockets
sk->sk_prot->unhash: both v4 and v6 use inet_hash directly
struct inet_connection_sock_af_ops also gets a new member, bind_conflict, so
that inet_csk_get_port can find the per family routine.
Now only the lookup routines receive as a parameter a struct inet_hashtable.
With this we further reuse code, reducing the difference among INET transport
protocols.
Eventually work has to be done on UDP and SCTP to make them share this
infrastructure and get as a bonus inet_diag interfaces so that iproute can be
used with these protocols.
net-2.6/net/ipv4/inet_hashtables.c:
struct proto | +8
struct inet_connection_sock_af_ops | +8
2 structs changed
__inet_hash_nolisten | +18
__inet_hash | -210
inet_put_port | +8
inet_bind_bucket_create | +1
__inet_hash_connect | -8
5 functions changed, 27 bytes added, 218 bytes removed, diff: -191
net-2.6/net/core/sock.c:
proto_seq_show | +3
1 function changed, 3 bytes added, diff: +3
net-2.6/net/ipv4/inet_connection_sock.c:
inet_csk_get_port | +15
1 function changed, 15 bytes added, diff: +15
net-2.6/net/ipv4/tcp.c:
tcp_set_state | -7
1 function changed, 7 bytes removed, diff: -7
net-2.6/net/ipv4/tcp_ipv4.c:
tcp_v4_get_port | -31
tcp_v4_hash | -48
tcp_v4_destroy_sock | -7
tcp_v4_syn_recv_sock | -2
tcp_unhash | -179
5 functions changed, 267 bytes removed, diff: -267
net-2.6/net/ipv6/inet6_hashtables.c:
__inet6_hash | +8
1 function changed, 8 bytes added, diff: +8
net-2.6/net/ipv4/inet_hashtables.c:
inet_unhash | +190
inet_hash | +242
2 functions changed, 432 bytes added, diff: +432
vmlinux:
16 functions changed, 485 bytes added, 492 bytes removed, diff: -7
/home/acme/git/net-2.6/net/ipv6/tcp_ipv6.c:
tcp_v6_get_port | -31
tcp_v6_hash | -7
tcp_v6_syn_recv_sock | -9
3 functions changed, 47 bytes removed, diff: -47
/home/acme/git/net-2.6/net/dccp/proto.c:
dccp_destroy_sock | -7
dccp_unhash | -179
dccp_hash | -49
dccp_set_state | -7
dccp_done | +1
5 functions changed, 1 bytes added, 242 bytes removed, diff: -241
/home/acme/git/net-2.6/net/dccp/ipv4.c:
dccp_v4_get_port | -31
dccp_v4_request_recv_sock | -2
2 functions changed, 33 bytes removed, diff: -33
/home/acme/git/net-2.6/net/dccp/ipv6.c:
dccp_v6_get_port | -31
dccp_v6_hash | -7
dccp_v6_request_recv_sock | +5
3 functions changed, 5 bytes added, 38 bytes removed, diff: -33
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The namespace is not available in the fib_sync_down_addr, add it as a
parameter.
Looking up a device by the pointer to it is OK. Looking up using a
result from fib_trie/fib_hash table lookup is also safe. No need to
fix that at all. So, just fix lookup by address and insertion to the
hash table path.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is required to make fib_info lookups namespace aware. In the
other case initial namespace devices are marked as dead in the local
routing table during other namespace stop.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
fib_sync_down can be called with an address and with a device. In
reality it is called either with address OR with a device. The
codepath inside is completely different, so lets separate it into two
calls for these two cases.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current ip route cache implementation is not suited to large caches.
We can consume a lot of CPU when cache must be invalidated, since we
currently need to evict all cache entries, and this eviction is
sometimes asynchronous. min_delay & max_delay can somewhat control this
asynchronism behavior, but whole thing is a kludge, regularly triggering
infamous soft lockup messages. When entries are still in use, this also
consumes a lot of ram, filling dst_garbage.list.
A better scheme is to use a generation identifier on each entry,
so that cache invalidation can be performed by changing the table
identifier, without having to scan all entries.
No more delayed flushing, no more stalling when secret_interval expires.
Invalidated entries will then be freed at GC time (controled by
ip_rt_gc_timeout or stress), or when an invalidated entry is found
in a chain when an insert is done.
Thus we keep a normal equilibrium.
This patch :
- renames rt_hash_rnd to rt_genid (and makes it an atomic_t)
- Adds a new rt_genid field to 'struct rtable' (filling a hole on 64bit)
- Checks entry->rt_genid at appropriate places :
Add a net argument to inet6_lookup and propagate it further.
Actually, this is tcp-v6 implementation of what was done for
tcp-v4 sockets in a previous patch.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a net argument to inet_lookup and propagate it further
into lookup calls. Plus tune the __inet_check_established.
The dccp and inet_diag, which use that lookup functions
pass the init_net into them.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This tags the inet_bind_bucket struct with net pointer,
initializes it during creation and makes a filtering
during lookup.
A better hashfn, that takes the net into account is to
be done in the future, but currently all bind buckets
with similar port will be in one hash chain.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
These two functions are the same except for what they call
to "check_established" and "hash" for a socket.
This saves half-a-kilo for ipv4 and ipv6.
add/remove: 1/0 grow/shrink: 1/4 up/down: 582/-1128 (-546)
function old new delta
__inet_hash_connect - 577 +577
arp_ignore 108 113 +5
static.hint 8 4 -4
rt_worker_func 376 372 -4
inet6_hash_connect 584 25 -559
inet_hash_connect 586 25 -561
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Constify a few data tables use const qualifiers on variables where
possible in the nf_*_proto_tcp sources.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename all "conntrack" variables to "ct" for more consistency and
avoiding some overly long lines.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reorder struct nf_conntrack_l4proto so all members used during packet
processing are in the same cacheline.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
nf_ct_tuple_src_equal() and nf_ct_tuple_dst_equal() both compare the protocol
numbers. Unfortunately gcc doesn't optimize out the second comparison, so
remove it and prefix both functions with __ to indicate that they should not
be used directly.
Saves another 16 byte of text in __nf_conntrack_find() on x86_64:
nf_conntrack_tuple_taken | -20 # 320 -> 300, size inlines: 181 -> 161
__nf_conntrack_find | -16 # 267 -> 251, size inlines: 127 -> 115
__nf_conntrack_confirm | -40 # 875 -> 835, size inlines: 570 -> 537
3 functions changed, 76 bytes removed
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ignoring specific entries in __nf_conntrack_find() is only needed by NAT
for nf_conntrack_tuple_taken(). Remove it from __nf_conntrack_find()
and make nf_conntrack_tuple_taken() search the hash itself.
Saves 54 bytes of text in the hotpath on x86_64:
__nf_conntrack_find | -54 # 321 -> 267, # inlines: 3 -> 2, size inlines: 181 -> 127
nf_conntrack_tuple_taken | +305 # 15 -> 320, lexblocks: 0 -> 3, # inlines: 0 -> 3, size inlines: 0 -> 181
nf_conntrack_find_get | -2 # 90 -> 88
3 functions changed, 305 bytes added, 56 bytes removed, diff: +249
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the RCU conversion only write_lock usages of nf_conntrack_lock are
left (except one read_lock that should actually use write_lock in the
H.323 helper). Switch to a spinlock.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use RCU for expectation hash. This doesn't buy much for conntrack
runtime performance, but allows to reduce the use of nf_conntrack_lock
for /proc and nf_netlink_conntrack.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hashtable size is really unsigned so sparse complains when you pass
a signed integer. Change all uses to make it consistent.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now it's possible to list and manipulate per-netns ip6tables rules.
Filtering decisions are based on init_net's table so far.
P.S.: remove init_net check in inet6_create() to see the effect
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now, iptables show and configure different set of rules in different
netnss'. Filtering decisions are still made by consulting only
init_net's set.
Changes are identical except naming so no splitting.
P.S.: one need to remove init_net checks in nf_sockopt.c and inet_create()
to see the effect.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In fact all we want is per-netns set of rules, however doing that will
unnecessary complicate routines such as ipt_hook()/ipt_do_table, so
make full xt_table array per-netns.
Every user stubbed with init_net for a while.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The address of IPv6 raw sockets was shown in the wrong format, from
IPv4 ones. The problem has been introduced by the commit
42a73808ed ("[RAW]: Consolidate proc
interface.")
Thanks to Adrian Bunk who originally noticed the problem.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Different hashtables are used for IPv6 and IPv4 raw sockets, so no
need to check the socket family in the iterator over hashtables. Clean
this out.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A userspace program may wish to set the mark for each packets its send
without using the netfilter MARK target. Changing the mark can be used
for mark based routing without netfilter or for packet filtering.
It requires CAP_NET_ADMIN capability.
Signed-off-by: Laszlo Attila Toth <panther@balabit.hu>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for combined mode algorithms with GCM being
the first algorithm supported.
Combined mode algorithms can be added through the xfrm_user interface
using the new algorithm payload type XFRMA_ALG_AEAD. Each algorithms
is identified by its name and the ICV length.
For the purposes of matching algorithms in xfrm_tmpl structures,
combined mode algorithms occupy the same name space as encryption
algorithms. This is in line with how they are negotiated using IKE.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts ESP to use the crypto_aead interface and in particular
the authenc algorithm. This lays the foundations for future support of
combined mode algorithms.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most trusted OSs, with the exception of Linux, have the ability to specify
static security labels for unlabeled networks. This patch adds this ability to
the NetLabel packet labeling framework.
If the NetLabel subsystem is called to determine the security attributes of an
incoming packet it first checks to see if any recognized NetLabel packet
labeling protocols are in-use on the packet. If none can be found then the
unlabled connection table is queried and based on the packets incoming
interface and address it is matched with a security label as configured by the
administrator using the netlabel_tools package. The matching security label is
returned to the caller just as if the packet was explicitly labeled using a
labeling protocol.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
In order to do any sort of IP header inspection of incoming packets we need to
know which address family, AF_INET/AF_INET6/etc., it belongs to and since the
sk_buff structure does not store this information we need to pass along the
address family separate from the packet itself.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
This patch adds support to the NetLabel LSM secattr struct for a secid token
and a type field, paving the way for full LSM/SELinux context support and
"static" or "fallback" labels. In addition, this patch adds a fair amount
of documentation to the core NetLabel structures used as part of the
NetLabel kernel API.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
Basically, this piece looks relatively easy. Namespace is already
available on the dst entry via device and the device is safe to
dereferrence. Compare it with one of a searcher and skip entry if
appropriate.
The only exception is ip_rt_frag_needed. So, add namespace parameter to it.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_route_connect and ip_route_newports are a part of routing API
presented to the socket layer. The namespace is available inside them
through a socket.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert packet schedulers to use the netlink API. Unfortunately a gradual
conversion is not possible without breaking compilation in the middle or
adding lots of casts, so this patch converts them all in one step. The
patch has been mostly generated automatically with some minor edits to
at least allow seperate conversion of classifiers and actions.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Used to append data to a message without a header or padding.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Needed to propagate it down to the ip_route_output_flow.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Needed to propagate it down to the __ip_route_output_key.
Signed_off_by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is only required to propagate it down to the
ip_route_output_slow.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently fib_select_default calls fib_get_table() with the
init_net. Prepare it to provide a correct namespace to lookup default
route.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The difference in the implementation of the fib_select_default when
CONFIG_IP_MULTIPLE_TABLES is (not) defined looks
negligible. Consolidate it and place into fib_frontend.c.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two small issues fixed:
- fib_select_multipath is exported from fib_semantics.c rather than from
fib_frontend.c. So, move the declaration below appropriate comment.
- struct rt_entry declaration is not used. Drop it.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
On x86_64, sizeof(struct rtable) is 0x148, which is rounded up to
0x180 bytes by SLAB allocator.
We can reduce this to exactly 0x140 bytes, without alignment overhead,
and store 12 struct rtable per PAGE instead of 10.
rate_tokens is currently defined as an "unsigned long", while its
content should not exceed 6*HZ. It can safely be converted to an
unsigned int.
Moving tclassid right after rate_tokens to fill the 4 bytes hole
permits to save 8 bytes on 'struct dst_entry', which finally permits
to save 8 bytes on 'struct rtable'
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On namespace start we mainly prepare the ctl variables.
When the namespace is stopped we have to kill all the fragments that
point to this namespace. The inet_frags_exit_net() handles it.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The inet_frags.lru_list is used for evicting only, so we have
to make it per-namespace, to evict only those fragments, who's
namespace exceeded its high threshold, but not the whole hash.
Besides, this helps to avoid long loops in evictor.
The spinlock is not per-namespace because it protects the
hash table as well, which is global.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we have one hashtable to lookup the fragment, having
different secret_interval-s for hash rebuild doesn't make
sense, so move this one to inet_frags.
The inet_frags_ctl becomes empty after this, so remove it.
The appropriate ctl table is kept read-only in namespaces.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the same as with the timeout variable.
Currently, after exceeding the high threshold _all_
the fragments are evicted, but it will be fixed in
later patch.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move it to the netns_frags, adjust the usage and
make the appropriate ctl table writable.
Now fragment, that live in different namespaces can
live for different times.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each namespace has to have own tables to tune their
different parameters, so duplicate the tables and
register them.
All the tables in sub-namespaces are temporarily made
read-only.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is also simple, but introduces more changes, since
then mem counter is altered in more places.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is simple - just move the variable from struct inet_frags
to struct netns_frags and adjust the usage appropriately.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since fragment management code is consolidated, we cannot have the
pointer from inet_frag_queue to struct net, since we must know what
king of fragment this is.
So, I introduce the netns_frags structure. This one is currently
empty, but will be eventually filled with per-namespace
attributes. Each inet_frag_queue is tagged with this one.
The conntrack_reasm is not "netns-izated", so it has one static
netns_frags instance to keep working in init namespace.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a preparation for sysctl netns-ization.
Move the ctl tables to the files, where the tuning
variables reside. Plus make the helpers to register
the tables.
This will simplify the later patches and will keep
similar things closer to each other.
ipv4, ipv6 and conntrack_reasm are patched differently,
but the result is all the tables are in appropriate files.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following sparse warnings:
| net/ipv6/route.c:2491:18: warning: symbol 'ipv6_route_sysctl_init' was not declared. Should it be static?
| net/ipv6/icmp.c:922:18: warning: symbol 'ipv6_icmp_sysctl_init' was not declared. Should it be static?
| net/ipv6/reassembly.c:628:6: warning: symbol 'ipv6_frag_sysctl_init' was not declared. Should it be static?
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
This patch (based on Ron Rindjunsky's) creates a framework for
a unified way to pass BSS configuration to drivers that require
the information, e.g. for implementing power save mode.
This patch introduces new ieee80211_bss_conf structure that is
passed to the driver via the new bss_info_changed() callback
when the BSS configuration changes.
This new BSS configuration infrastructure adds the following
new features:
* drivers are notified of their association AID
* drivers are notified of association status
and replaces the erp_ie_changed() callback. The patch also does
the relevant driver updates for the latter change.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Drivers that support mixed AP/STA operation may well need to
know the type of a virtual interface when iterating over them.
The easiest way to support that is to move the interface type
variable into the vif structure.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch gets rid of the if_id stuff where possible in favour of
a new per-virtual-interface structure "struct ieee80211_vif". This
structure is located at the end of the per-interface structure and
contains a variable length driver-use data area.
This has two advantages:
* removes the need to look up interfaces by if_id, this is better
for working with network namespaces and performance
* allows drivers to store and retrieve per-interface data without
having to allocate own lists/hash tables
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This short patch modifies the IPv4 networking to enable use of the
240.0.0.0/4 (aka "class-E") address space as propsed in the internet
draft draft-fuller-240space-00.txt.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Save namespace context on the fib rule at the rule creation time and
call routing lookup in the correct namespace.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The backward link from FIB rules operations to the network namespace
will allow to simplify the API a bit.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes IrPORT and the old dongle drivers (all off them
have replacement drivers).
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The network namespace pointer can be stored into the dst_ops structure.
This is usefull when there are multiple instances of the dst_ops for a
protocol. When there are no several instances, this field will be never
used in the protocol. So there is no impact for the protocols which do
implement the network namespaces.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The garbage collection function receive the dst_ops structure as
parameter. This is useful for the next incoming patchset because it
will need the dst_ops (there will be several instances) and the
network namespace pointer (contained in the dst_ops).
The protocols which do not take care of the namespaces will not be
impacted by this change (expect for the function signature), they do
just ignore the parameter.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Remove declarations of non-existing variables and functions
- Move helper init/cleanup function declarations to nf_conntrack_helper.h
- Remove unneeded __nf_conntrack_attach declaration and make it static
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initialization of the slab cache's should be done when IP is
initialized to make sure of available memory, and that code can be
marked __init.
Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make them static.
[ Moved the inline before, instead of after, call sites. -DaveM ]
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
fib_rules_unregister is called only after successful register and the
return code is never checked.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull the struct net pointer up to the showing functions
to filter the sockets depending on their namespaces.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Looks if the address is belonging to the network namespace, otherwise
discard the address for the check.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The inet6_addr_lst is browsed taking into account the network
namespace specified as parameter. If an address does not belong
to the specified namespace, it is ignored.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a new address is added, we must check if the new address does not
already exists. This patch makes this check to be aware of a network
namespace, so the check will look if the address already exists for
the specified network namespace. While the addresses are browsed, the
addresses which do not belong to the namespace are discarded.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When I studied the neighbor code I puzzled over what the NUD can mean
for quite a long time.
Finally I asked Alexey and he said that this was smth like "neighbor
unreachability detection".
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise we beat heavily on the global tcp_memory atomics
when all of the sockets in the system are slowly sending
perioding packet clumps.
Noticed and suggested by Eric Dumazet.
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the core. Declare and register the pernet subsys for
addrconf. The init callback the will create the devconf-s.
The init_net will reuse the existing statically declared confs,
so that accessing them from inside the ipv6 code will still
work.
The register_pernet_subsys() is moved above the ipv6_add_dev()
call for loopback, because this function will need the
net->devconf_dflt pointer to be already set.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
seq_open_net requires that first field of the seq->private data to be
struct seq_net_private. In reality this is a single pointer to a
struct net for now. The patch makes code consistent.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
... up to rtentry_to_fib_config
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the netlink socket to be per namespace. That allows
to have each namespace its own socket for routing queries.
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The preparatory work has been done. All we need is to substitute
fib_table_hash with net->ipv4.fib_table_hash. Netns context is
available when required.
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The final trick for rules: place fib4_rules_ops into struct net and
modify initialization path for this.
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
nl_info is used to track the end-user destination of routing change
notification. This is a natural object to hold a namespace on. Place
it there and utilize the context in the appropriate places.
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch extends the inet_addr_type and inet_dev_addr_type with the
network namespace pointer. That allows to access the different tables
relatively to the network namespace.
The modification of the signature function is reported in all the
callers of the inet_addr_type using the pointer to the well known
init_net.
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch extends the fib_get_table and the fib_new_table functions
with the network namespace pointer. That will allow to access the
table relatively from the network namespace.
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace the direct pointers to local and main tables with
calls to fib_get_table() with appropriate argument.
This doesn't introduce additional dereferences, but makes the access to fib
tables uniform in any (CONFIG_IP_MULTIPLE_TABLES) case.
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the fib to be initialized as a subsystem for the
network namespaces. The code does not handle several namespaces yet,
so in case of a creation of a network namespace, the
creation/initialization will not occur.
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds error paths into both versions of fib4_rules_init
(with/without CONFIG_IP_MULTIPLE_TABLES) and returns error code to the
caller.
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds netns parameter to fib_proc_init/exit and replaces __init
specifier with __net_init. After this, we will not yet have these proc
files show info from the specific namespace - this will be done when
these tables become namespaced.
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move static rules_ops & rules_mod_lock to the struct net, register the
pernet subsys to init them and enjoy the fact that the core rules
infrastructure works in the namespace.
Real IPv4 fib rules virtualization requires fib tables support in the
namespace and will be done seriously later in the patchset.
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
fib_rules_ops contains operations and the list of configured rules. ops will
become per/namespace soon, so we need them to be known in the default_pref
callback.
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch extends the different fib rules API in order to pass the
network namespace pointer. That will allow to access the different
tables from a namespace relative object. As usual, the pointer to the
init_net variable is passed as parameter so we don't break the
network.
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the icmpv6_time sysctl to the network namespace
structure.
Because the ipv6 protocol is not yet per namespace, the variable is
accessed relatively to the initial network namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All the sysctl concerning the routes are moved to the network
namespace structure. A helper function is called to initialize the
variables.
Because the ipv6 protocol is not yet per namespace, the variables are
accessed relatively from the network namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ip6_frags is moved to the network namespace structure. Because
there can be multiple instances of the network namespaces, and the
ip6_frags is no longer a global static variable, a helper function has
been added to facilitate the initialization of the variables.
Until the ipv6 protocol is not per namespace, the variables are
accessed relatively from the initial network namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the bindv6only sysctl to the network namespace
structure. Until the ipv6 protocol is not per namespace, the sysctl
variable is always from the initial network namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each network namespace wants its own set of sysctl value, eg. we
should not be able from a namespace to set a sysctl value for another
namespace , especially for the initial network namespace.
This patch duplicates the sysctl table when we register a new network
namespace for ipv6. The duplicated table are postfixed with the
"template" word to notify the developper the table is cloned.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like the ipv4 part, this patch adds an ipv6 structure in the net
structure to aggregate the different resources to make ipv6 per
namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the function ipv6_sysctl_register to return a
value. The af_inet6 init function is now able to handle an error and
catch it from the initialization of the sysctl.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The conntracks subsystem has a similar infrastructure
to maintain ctl_paths, but since we already have it
on the generic level, I think it's OK to switch to
using it.
So, basically, this patch just replaces the ctl_table-s
with ctl_path-s, nf_register_sysctl_table with
register_sysctl_paths() and removes no longer needed code.
After this the net/netfilter/nf_sysctl.c file contains
the paths only.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This includes the most simple cases for netfilter.
The first part is tne queue modules for ipv4 and ipv6,
on which the net/ipv4/ and net/ipv6/ paths are reused
from the appropriate ipv4 and ipv6 code.
The conntrack module is also patched, but this hunk is
very small and simple.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The feature of ipvs ctls is that the net/ipv4/vs path
is common for core ipvs ctls and for two schedulers,
so I make it exported and re-use it in modules.
Two other .c files required linux/sysctl.h to make the
extern declaration of this path compile well.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some devices have a seperate LED which indicates if the radio is
enabled or not. This adds a LED trigger to mac80211 where drivers
can hook into when they are interested in radio status changes.
v2: Check hw.conf.radio_enabled when calling start().
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the API to perform A-MPDU actions between mac80211 and low
level driver.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The info placeholder member of dst_entry seems to be unused in the
network stack.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since __xfrm_policy_destroy is used to destory the resources
allocated by xfrm_policy_alloc. So using the name
__xfrm_policy_destroy is not correspond with xfrm_policy_alloc.
Rename it to xfrm_policy_destroy.
And along with some instances that call xfrm_policy_alloc
but not using xfrm_policy_destroy to destroy the resource,
fix them.
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previous NETNS patches broke CONFIG_SYSCTL=n case
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Cleanups (all functions are prefixed by sock_prot_inuse)
sock_prot_inc_use(prot) -> sock_prot_inuse_add(prot,-1)
sock_prot_dec_use(prot) -> sock_prot_inuse_add(prot,-1)
sock_prot_inuse() -> sock_prot_inuse_get()
New functions :
sock_prot_inuse_init() and sock_prot_inuse_free() to abstract pcounter use.
2) if CONFIG_PROC_FS=n, we can zap 'inuse' member from "struct proto",
since nobody wants to read the inuse value.
This saves 1372 bytes on i386/SMP and some cpu cycles.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In include/net/ip_vs.h:
- The ip_vs_secure_tcp_set() method is not implemented anywhere.
- IP_VS_APP_TYPE_FTP is an unused definition.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These three declarations in include/net/ip.h are not implemented
anywhere:
ip_mc_dropsocket(), ip_mc_dropdevice() and ip_net_unreachable().
Also, correct a comment to be "Functions provided by ip_fragment.c"
(instead of by ip_fragment.o) in consistency with the other comments
in this header.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For five years we had two xfrm_policy_flush prototypes and every time that
function's signature changed people have been diligently updating both of
them without noticing :)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The snd_up check should be enough. I suspect this has been
there to provide a minor optimization in clean_rtx_queue which
used to have a small if (!->sacked) block which could skip
snd_up check among the other work.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces new memory accounting functions for each network
protocol. Most of them are renamed from memory accounting functions
for stream protocols. At the same time, some stream memory accounting
functions are removed since other functions do same thing.
Renaming:
sk_stream_free_skb() -> sk_wmem_free_skb()
__sk_stream_mem_reclaim() -> __sk_mem_reclaim()
sk_stream_mem_reclaim() -> sk_mem_reclaim()
sk_stream_mem_schedule -> __sk_mem_schedule()
sk_stream_pages() -> sk_mem_pages()
sk_stream_rmem_schedule() -> sk_rmem_schedule()
sk_stream_wmem_schedule() -> sk_wmem_schedule()
sk_charge_skb() -> sk_mem_charge()
Removeing
sk_stream_rfree(): consolidates into sock_rfree()
sk_stream_set_owner_r(): consolidates into skb_set_owner_r()
sk_stream_mem_schedule()
The following functions are added.
sk_has_account(): check if the protocol supports accounting
sk_mem_uncharge(): do the opposite of sk_mem_charge()
In addition, to achieve consolidation, updating sk_wmem_queued is
removed from sk_mem_charge().
Next, to consolidate memory accounting functions, this patch adds
memory accounting calls to network core functions. Moreover, present
memory accounting call is renamed to new accounting call.
Finally we replace present memory accounting calls with new interface
in TCP and SCTP.
Signed-off-by: Takahiro Yasui <tyasui@redhat.com>
Signed-off-by: Hideo Aoki <haoki@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_forward_alloc being signed, we should take care of divides by
SK_STREAM_MEM_QUANTUM we do in sk_stream_pages() and
__sk_stream_mem_reclaim()
This patchs introduces SK_STREAM_MEM_QUANTUM_SHIFT, defined
as ilog2(SK_STREAM_MEM_QUANTUM), to be able to use right
shifts instead of plain divides.
This should help compiler to choose right shifts instead of
expensive divides (as seen with CONFIG_CC_OPTIMIZE_FOR_SIZE=y on x86)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I'm actually surprised at how much was involved. At first glance it
appears that the neighbour table data structures are already split by
network device so all that should be needed is to modify the user
interface commands to filter the set of neighbours by the network
namespace of their devices.
However a couple things turned up while I was reading through the
code. The proxy neighbour table allows entries with no network
device, and the neighbour parms are per network device (except for the
defaults) so they now need a per network namespace default.
So I updated the two structures (which surprised me) with their very
own network namespace parameter. Updated the relevant lookup and
destroy routines with a network namespace parameter and modified the
code that interacts with users to filter out neighbour table entries
for devices of other namespaces.
I'm a little concerned that we can modify and display the global table
configuration and from all network namespaces. But this appears good
enough for now.
I keep thinking modifying the neighbour table to have per network
namespace instances of each table type would should be cleaner. The
hash table is already dynamically sized so there are it is not a
limiter. The default parameter would be straight forward to take care
of. However when I look at the how the network table is built and
used I still find some assumptions that there is only a single
neighbour table for each type of table in the kernel. The netlink
operations, neigh_seq_start, the non-core network users that call
neigh_lookup. So while it might be doable it would require more
refactoring than my current approach of just doing a little extra
filtering in the code.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a number of new IPsec audit events to meet the auditing
requirements of RFC4303. This includes audit hooks for the following events:
* Could not find a valid SA [sections 2.1, 3.4.2]
. xfrm_audit_state_notfound()
. xfrm_audit_state_notfound_simple()
* Sequence number overflow [section 3.3.3]
. xfrm_audit_state_replay_overflow()
* Replayed packet [section 3.4.3]
. xfrm_audit_state_replay()
* Integrity check failure [sections 3.4.4.1, 3.4.4.2]
. xfrm_audit_state_icvfail()
While RFC4304 deals only with ESP most of the changes in this patch apply to
IPsec in general, i.e. both AH and ESP. The one case, integrity check
failure, where ESP specific code had to be modified the same was done to the
AH code for the sake of consistency.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because sk_wmem_queued, sk_sndbuf are signed, a divide per two
may force compiler to use an integer divide.
We can instead use a right shift.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several length variables cannot be negative, so convert int to
unsigned int. This also allows us to do sane shift operations
on those variables.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
After a station is added to the kernel's structures, userspace
has to be able to retrieve statistics about that station, especially
whether the station was idle and how much bytes were transferred
to and from it. This adds the necessary code to nl80211.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds station handling to cfg80211/nl80211.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds the necessary API to cfg80211/nl80211 to allow
changing beaconing settings.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This implements cfg80211's get_key() to allow retrieving the sequence
counter for a TKIP or CCMP key from userspace. It also cleans up and
documents the associated low-level driver interface.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This introduces key handling to cfg80211/nl80211. Default
and group keys can be added, changed and removed; sequence
counters for each key can be retrieved.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are various decisions influencing the decision whether to buffer
a frame for after the next DTIM beacon. The "do we have stations in PS
mode" condition cannot be tested by the driver so mac80211 has to do
that. To ease driver writing for hardware that can buffer frames until
after the next DTIM beacon, introduce a new txctl flag telling the
driver to buffer a specific frame.
While at it, restructure and comment the code for multicast buffering
and remove spurious "inline" directives.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The previous patch left only one user of the ieee80211_is_eapol()
function and that user can be eliminated easily by introducing
a new "frame is EAPOL" flag to handle the frame specially (we
already have this information) instead of doing the (expensive)
ieee80211_is_eapol() all the time.
Also, allow unencrypted frames to be sent when they are injected.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a number of small but potentially troublesome things in the
XFRM/IPsec code:
* Use the 'audit_enabled' variable already in include/linux/audit.h
Removed the need for extern declarations local to each XFRM audit fuction
* Convert 'sid' to 'secid' everywhere we can
The 'sid' name is specific to SELinux, 'secid' is the common naming
convention used by the kernel when refering to tokenized LSM labels,
unfortunately we have to leave 'ctx_sid' in 'struct xfrm_sec_ctx' otherwise
we risk breaking userspace
* Convert address display to use standard NIP* macros
Similar to what was recently done with the SPD audit code, this also also
includes the removal of some unnecessary memcpy() calls
* Move common code to xfrm_audit_common_stateinfo()
Code consolidation from the "less is more" book on software development
* Proper spacing around commas in function arguments
Minor style tweak since I was already touching the code
Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This statistics is shown factor dropped by transformation
at /proc/net/xfrm_stat for developer.
It is a counter designed from current transformation source code
and defined as linux private MIB.
See Documentation/networking/xfrm_proc.txt for the detail.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 specific thing is wrongly removed from transformation at net-2.6.25.
This patch recovers it with current design.
o Update "path" of xfrm_dst since IPv6 transformation should
care about routing changes. It is required by MIPv6 and
off-link destined IPsec.
o Rename nfheader_len which is for non-fragment transformation used by
MIPv6 to rt6i_nfheader_len as IPv6 name space.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is -700 bytes from the net/ipv4/built-in.o
add/remove: 1/0 grow/shrink: 1/3 up/down: 340/-1040 (-700)
function old new delta
__inet_lookup_established - 339 +339
tcp_sacktag_write_queue 2254 2255 +1
tcp_v4_err 1304 973 -331
tcp_v4_rcv 2089 1744 -345
tcp_v4_do_rcv 826 462 -364
Exporting is for dccp module (used via e.g. inet_lookup).
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This one is used in quite many places in the networking code and
seems to big to be inline.
After the patch net/ipv4/build-in.o loses ~650 bytes:
add/remove: 2/0 grow/shrink: 0/5 up/down: 461/-1114 (-653)
function old new delta
__inet_hash_nolisten - 282 +282
__inet_hash - 179 +179
tcp_sacktag_write_queue 2255 2254 -1
__inet_lookup_listener 284 274 -10
tcp_v4_syn_recv_sock 755 493 -262
tcp_v4_hash 389 35 -354
inet_hash_connect 1086 599 -487
This version addresses the issue pointed by Eric, that
while being inline this function was optimized by gcc
in respect to the 'listen_possible' argument.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ADD-IP spec has a special case for processing ABORTs:
F4) ... One special consideration is that ABORT
Chunks arriving destined to the IP address being deleted MUST be
ignored (see Section 5.3.1 for further details).
Check if the address we received on is in the DEL state, and if
so, ignore the ABORT.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The processing of the ASCONF chunks has changed a lot in the
spec. New items are:
1. A list of ASCONF-ACK chunks is now cached
2. The source of the packet is used in response.
3. New handling for unexpect ASCONF chunks.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ADD-IP "Set Primary IP Address" parameter is allowed in the
INIT/INIT-ACK exchange. Allow processing of this parameter during
the INIT/INIT-ACK.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Address Parameter in the parameter list of the ASCONF chunk
may be a wildcard address. In this case special processing
is required. For the 'add' case, the source IP of the packet is
added. In the 'del' case, all addresses except the source IP
of packet are removed. In the "mark primary" case, the source
address is marked as primary.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SNMP macros use raw_smp_processor_id() in process context
which is illegal because the process may be preempted and then
migrated to another CPU.
This patch makes it use get_cpu/put_cpu to disable preemption.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
A few netfilter modules provide their own union of IPv4 and IPv6
address storage. Will unify that in this patch series.
(1/4): Rename union nf_conntrack_address to union nf_inet_addr and
move it to x_tables.h.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
nf_nat_setup_info gets the hook number and translates that to the
manip type to perform. This is a relict from the time when one
manip per hook could exist, the exact hook number doesn't matter
anymore, its converted to the manip type. Most callers already
know what kind of NAT they want to perform, so pass the maniptype
in directly.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes mac80211 include the low-level MAC timestamp
in the radiotap header if the driver indicated (by a new
RX flag) that the timestamp is valid.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The crc32c library used an identical table and algorithm
as SCTP. Switch to using the library instead of carrying
our own table. Using crypto layer proved to have too
much overhead compared to using the library directly.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the core.
Add all and default pointers on the netns_ipv4 and register
a new pernet subsys to initialize them.
Also add the ctl_table_header to register the
net.ipv4.ip_forward ctl.
I don't allocate additional memory for init_net, but use
global devinets.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This one will need to set the IPV4_DEVCONF_ALL(PROXY_ARP), but
there's no ways to get the net right in place, so we have to
pull one from the inet_ioctl's struct sock.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ipv4 will store its parameters inside this structure.
This one is empty now, but it will be eventually filled.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
X86_32 was the last user of the FASTCALL macro, now that it
uses regparm(3) by default, this macro expands to nothing.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload. This patch implements this
for ICMP traffic that originates from or terminates on localhost.
This is activated on outbound with the new policy flag XFRM_POLICY_ICMP,
and on inbound by the new state flag XFRM_STATE_ICMP.
On inbound the policy check is now performed by the ICMP protocol so
that it can repeat the policy check where necessary.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload. This patch adds the functions
xfrm_decode_session_reverse and xfrmX_policy_check_reverse so we can get
the reverse flow to perform such a lookup.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces an enum for bits in the flags argument of xfrm_lookup.
This is so that we can cram more information into it later.
Since all current users use just the values 0 and 1, XFRM_LOOKUP_WAIT has
been added with the value 1 << 0 to represent the current meaning of flags.
The test in __xfrm_lookup has been changed accordingly.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recently David Miller and Herbert Xu pointed out that struct net becomes
overbloated and un-maintainable. There are two solutions:
- provide a pointer to a network subsystem definition from struct net.
This costs an additional dereferrence
- place sub-system definition into the structure itself. This will speedup
run-time access at the cost of recompilation time
The second approach looks better for us. Other sub-systems will follow.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset makes the different protocols to return an error code, so
the af_inet6 module can check the initialization was correct or not.
The raw6 was taken into account to be consistent with the rest of the
protocols, but the registration is at the same place.
Because the raw6 has its own init function, the proto and the ops structure
can be moved inside the raw6.c file.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the inet6_register_protosw to return an error code.
The different protocols can be aware the registration was successful or
not and can pass the error to the initial caller, af_inet6.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the frag_init to return an error code, so the af_inet6
module can handle the error.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch factorize the code for the differents init functions for rthdr,
nodata, destopt in a single function exthdrs_init.
This function returns an error so the af_inet6 module can check correctly
the initialization.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the flowlab subsystem to return an error code and makes
some cleanup with procfs ifdefs.
The af_inet6 will use the flowlabel init return code to check the initialization
was correct.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the xfrm_input_state helper function which returns the
current xfrm state being processed on the input path given an sk_buff.
This is currently only used by xfrm_input but will be used by ESP upon
asynchronous resumption.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
With fixes from Arnaldo Carvalho de Melo.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch defines the usual static inline functions when the code is
disabled for fib6_rules. That's allow to remove some ifdef in route.c
file and make the code a little more clear.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following patch create the usual static inline functions to disable
the xfrm6_init and xfrm6_fini function when XFRM is off.
That's allow to remove some ifdef and make the code a little more clear.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just move the variable on the struct net and adjust
its usage.
Others sysctls from sys.net.core table are more
difficult to virtualize (i.e. make them per-namespace),
but I'll look at them as well a bit later.
Signed-off-by: Pavel Emelyanov <xemul@oenvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Making them per-namespace is required for the following
two reasons:
First, some ctl values have a per-namespace meaning.
Second, making them writable from the sub-namespace
is an isolation hole.
So I introduce the pernet operations to create these
tables. For init_net I use the existing statically
declared tables, for sub-namespace they are duplicated
and the write bits are removed from the mode.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SNMP_INC_STATS_OFFSET_BH is used only by ICMP6_INC_STATS_OFFSET_BH.
The ICMP6_INC_STATS_OFFSET_BH is unused.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are only 2 users and it doesn't hurt to call fib_get_table
instead, and it makes it easier to make the fib network namespace
aware.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The route initialization function does not return any value to notify
if the initialization is successful or not. This patch checks all
calls made for the initilization in order to return a value for the
caller.
Unfortunately, proc_net_fops_create will return a NULL pointer if
CONFIG_PROC_FS is off, so we can not check the return code without an
ifdef CONFIG_PROC_FS block in the ip6_route_init function.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the fib_rules initialization finished, no return code is provided
so there is no way to know, for the caller, if the initialization has
been successful or has failed. This patch fix that.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The xfrm initialization function does not return any error code, so if
there is an error, the caller can not be advise of that. This patch
checks the return code of the different called functions in order to
return a successful or failed initialization.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
If there is an error in the initialization function, nothing is
followed up to the caller. So I add a return value to be set for the
init function.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The previous move of the the UDP inDatagrams counter caused the
counting of encapsulated packets, SUNRPC data (as opposed to call)
packets and RXRPC packets to go missing.
This patch restores all of these.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the same as I did for the net/core/ table in the
second patch in his series: use the paths and isolate the
whole table in the .c file.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using ctl paths we can put all the stuff, related to net/core/
sysctl table, into one file and remove all the references on it.
As a good side effect this hides the "core_table" name from
the global scope :)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move common fields for queue management to struct nf_info and rename it
to struct nf_queue_entry. The avoids one allocation/free per packet and
simplifies the code a bit.
Alternatively we could add some private room at the tail, but since
all current users use identical structs this seems easier.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new rate estimator target (using gen_estimator). In combination with
the rateest match (next patch) this can be used for load-based multipath
routing.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Constify include/net/dsfield.h
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Address type search can be limited to an interface by
inet_dev_addr_type function.
Signed-off-by: Laszlo Attila Toth <panther@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch speedups compilation when net_namespace.h is changed.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When merging the input paths of IPsec I accidentally left a hard-coded
AF_INET for the state lookup call. This broke IPv6 obviously. This
patch fixes by getting the input callers to specify the family through
skb->cb.
Credit goes to Kazunori Miyazawa for diagnosing this and providing an
initial patch.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pointing to the next skb is necessary to avoid referencing
already SACKed skbs which will soon be on a separate list.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch configures the 802.11n mode of operation
internally in ieee80211_conf structure and in the low-level
driver as well (through op conf_ht).
It does not include AP configuration flows.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New structures:
- ieee80211_ht_info: describing STA's HT capabilities
- ieee80211_ht_bss_info: describing BSS's HT characteristics
Changed structures:
- ieee80211_hw_mode: now also holds PHY HT capabilities for each HW mode
- ieee80211_conf: ht_conf holds current self HT configuration
ht_bss_conf holds current BSS HT configuration
- flag IEEE80211_CONF_SUPPORT_HT_MODE added to indicate if HT use is
desired
- sta_info: now also holds Peer's HT capabilities
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Interface iteration in mac80211 can be done without holding any
locks because I converted it to RCU. Initially, I thought this
wouldn't be needed for ieee80211_iterate_active_interfaces but
it's turning out that multi-BSS AP support can be much simpler
in a driver if ieee80211_iterate_active_interfaces can be called
without holding locks. This converts it to use RCU, it adds a
requirement that the callback it invokes cannot sleep.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the core.
* add the ctl_table_header on the struct net;
* make the unix_sysctl_register and _unregister clone the table;
* moves calls to them into per-net init and exit callbacks;
* move the .data pointer in the proper place.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will make all the sub-namespaces always use the
default value (10) and leave the tuning via sysctl
to the init namespace only.
Per-namespace tuning is coming.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the struct net * argument to both of them to use in
the future. Also make the register one return an error code.
It is useless right now, but will make the future patches
much simpler.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The user interface is: register_net_sysctl_table and
unregister_net_sysctl_table. Very much like the current
interface except there is a network namespace parameter.
With this any sysctl registered with register_net_sysctl_table
will only show up to tasks in the same network namespace.
All other sysctls continue to be globally visible.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Daniel Lezcano <dlezcano@fr.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows to get rid of the CONFIG_NETFILTER dependency of NET_ACT_NAT.
This patch redefines the old names to keep the noise low, the next patch
converts all users.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch includes support for the Intra-Site Automatic Tunnel
Addressing Protocol (ISATAP) per RFC4214. It uses the SIT
module, and is configured using extensions to the "iproute2"
utility. The diffs are specific to the Linux 2.6.24-rc2 kernel
distribution.
This version includes the diff for ./include/linux/if.h which was
missing in the v2.4 submission and is needed to make the
patch compile. The patch has been installed, compiled and
tested in a clean 2.6.24-rc2 kernel build area.
Signed-off-by: Fred L. Templin <fred.l.templin@boeing.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 3rd argument is always zero (according to grep :) Eliminate
it and merge the function with sk_stream_alloc_skb.
This saves 44 more bytes, and together with the previous patch
we have:
add/remove: 1/0 grow/shrink: 0/8 up/down: 183/-751 (-568)
function old new delta
sk_stream_alloc_skb - 183 +183
ip_rt_init 529 525 -4
arp_ignore 112 107 -5
__inet_lookup_listener 284 274 -10
tcp_sendmsg 2583 2481 -102
tcp_sendpage 1449 1300 -149
tso_fragment 417 258 -159
tcp_fragment 1149 988 -161
__tcp_push_pending_frames 1998 1837 -161
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function seems too big for inlining. Indeed, it saves
half-a-kilo when uninlined:
add/remove: 1/0 grow/shrink: 0/7 up/down: 195/-719 (-524)
function old new delta
sk_stream_alloc_pskb - 195 +195
ip_rt_init 529 525 -4
__inet_lookup_listener 284 274 -10
tcp_sendmsg 2583 2486 -97
tcp_sendpage 1449 1305 -144
tso_fragment 417 267 -150
tcp_fragment 1149 992 -157
__tcp_push_pending_frames 1998 1841 -157
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Better place exists in update_send_head (other non-queue related
adjustments are done there as well) which is the only caller of
tcp_advance_send_head (now that the bogus call from mtu_probe is
gone).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sometimes drivers need to know which interfaces are associated with
their hardware. Rather than forcing those drivers to keep track of
the interfaces that were added, this adds an iteration function to
mac80211.
As it is intended to be used from the interface add/remove callbacks,
the iteration function may currently only be called under RTNL.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both ipv6/raw.c and ipv4/raw.c use the seq files to walk
through the raw sockets hash and show them.
The "walking" code is rather huge, but is identical in both
cases. The difference is the hash table to walk over and
the protocol family to check (this was not in the first
virsion of the patch, which was noticed by YOSHIFUJI)
Make the ->open store the needed hash table and the family
on the allocated raw_iter_state and make the start/next/stop
callbacks work with it.
This removes most of the code.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Same as the ->hash one, this is easily consolidated.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Having the raw_hashinfo it's easy to consolidate the
raw[46]_hash functions.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ipv4/raw.c and ipv6/raw.c contain many common code (most
of which is proc interface) which can be consolidated.
Most of the places to consolidate deal with the raw sockets
hashtable, so introduce a struct raw_hashinfo which describes
the raw sockets hash.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Same as in the previous patch for ipv4, compact the
API and hide hash table and rwlock inside the raw.c
file.
Plus fix some "bad" places from checkpatch.pl point
of view (assignments inside if()).
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The raw sockets functions are explicitly used from
inside the kernel in two places:
1. in ip_local_deliver_finish to intercept skb-s
2. in icmp_error
For this purposes many functions and even data structures,
that are naturally internal for raw protocol, are exported.
Compact the API to two functions and hide all the other
(including hash table and rwlock) inside the net/ipv4/raw.c
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is done by making packet_sklist_lock and packet_sklist per
network namespace and adding an additional filter condition on
received packets to ensure they came from the proper network
namespace.
Changes from v1:
- prohibit to call inet_dgram_ops.ioctl in other than init_net
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After this patch none of the netlink callback support anything
except the initial network namespace but the rtnetlink infrastructure
now handles multiple network namespaces.
Changes from v2:
- IPv6 addrlabel processing
Changes from v1:
- no need for special rtnl_unlock handling
- fixed IPv6 ndisc
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Key points of this patch are:
- In case new SACK information is advance only type, no skb
processing below previously discovered highest point is done
- Optimize cases below highest point too since there's no need
to always go up to highest point (which is very likely still
present in that SACK), this is not entirely true though
because I'm dropping the fastpath_skb_hint which could
previously optimize those cases even better. Whether that's
significant, I'm not too sure.
Currently it will provide skipping by walking. Combined with
RB-tree, all skipping would become fast too regardless of window
size (can be done incrementally later).
Previously a number of cases in TCP SACK processing fails to
take advantage of costly stored information in sack_recv_cache,
most importantly, expected events such as cumulative ACK and new
hole ACKs. Processing on such ACKs result in rather long walks
building up latencies (which easily gets nasty when window is
huge). Those latencies are often completely unnecessary
compared with the amount of _new_ information received, usually
for cumulative ACK there's no new information at all, yet TCP
walks whole queue unnecessary potentially taking a number of
costly cache misses on the way, etc.!
Since the inclusion of highest_sack, there's a lot information
that is very likely redundant (SACK fastpath hint stuff,
fackets_out, highest_sack), though there's no ultimate guarantee
that they'll remain the same whole the time (in all unearthly
scenarios). Take advantage of this knowledge here and drop
fastpath hint and use direct access to highest SACKed skb as
a replacement.
Effectively "special cased" fastpath is dropped. This change
adds some complexity to introduce better coveraged "fastpath",
though the added complexity should make TCP behave more cache
friendly.
The current ACK's SACK blocks are compared against each cached
block individially and only ranges that are new are then scanned
by the high constant walk. For other parts of write queue, even
when in previously known part of the SACK blocks, a faster skip
function is used (if necessary at all). In addition, whenever
possible, TCP fast-forwards to highest_sack skb that was made
available by an earlier patch. In typical case, no other things
but this fast-forward and mandatory markings after that occur
making the access pattern quite similar to the former fastpath
"special case".
DSACKs are special case that must always be walked.
The local to recv_sack_cache copying could be more intelligent
w.r.t DSACKs which are likely to be there only once but that
is left to a separate patch.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is going to replace the sack fastpath hint quite soon... :-)
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sock_valbool_flag() helper is used in setsockopt to
set or reset some flag on the sock. This helper is required
in the net/socket.c only, so move it there.
Besides, patch two places in sys_setsockopt() that repeat
this helper functionality manually.
Since this is not a bugfix, but a trivial cleanup, I
prepared this patch against net-2.6.25, but it also
applies (with a single offset) to the latest net-2.6.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Qdisc_class_ops are const, and Qdisc_ops are mostly read.
Using "const" and "__read_mostly" qualifiers helps to reduce false
sharing.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Policy table is implemented as an RCU linear list since we do not expect
large list nor frequent updates.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
After changeset:
[NETFILTER]: Introduce NF_INET_ hook values
It always evaluates to NF_INET_POST_ROUTING.
Signed-off-by: David S. Miller <davem@davemloft.net>
The IPv4 and IPv6 hook values are identical, yet some code tries to figure
out the "correct" value by looking at the address family. Introduce NF_INET_*
values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__
section for userspace compatibility.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for async resumptions on input. To do so, the
transform would return -EINPROGRESS and subsequently invoke the
function xfrm_input_resume to resume processing.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The nhoff field isn't actually necessary in xfrm_input. For tunnel
mode transforms we now throw away the output IP header so it makes no
sense to fill in the nexthdr field. For transport mode we can now let
the function transport_finish do the setting and it knows where the
nexthdr field is.
The only other thing that needs the nexthdr field to be set is the
header extraction code. However, we can simply move the protocol
extraction out of the generic header extraction.
We want to minimise the amount of info we have to carry around between
transforms as this simplifies the resumption process for async crypto.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently x->lastused is u64 which means that it cannot be
read/written atomically on all architectures. David Miller observed
that the value stored in it is only an unsigned long which is always
atomic.
So based on his suggestion this patch changes the internal
representation from u64 to unsigned long while the user-interface
still refers to it as u64.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of the work on asynchronous cryptographic operations, we need
to be able to resume from the spot where they occur. As such, it
helps if we isolate them to one spot.
This patch moves most of the remaining family-specific processing into
the common input code.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for async resumptions on output. To do so,
the transform would return -EINPROGRESS and subsequently invoke the
function xfrm_output_resume to resume processing.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of the work on asynchrnous cryptographic operations, we need
to be able to resume from the spot where they occur. As such, it
helps if we isolate them to one spot.
This patch moves most of the remaining family-specific processing into
the common output code.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most callers of the LOCAL_OUT chain will set the IP packet length
before doing so. They also share the same output function dst_output.
This patch creates a new function called ip6_local_out which does all
of that and converts the appropriate users over to it.
Apart from removing duplicate code, it will also help in merging the
IPsec output path.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most callers of the LOCAL_OUT chain will set the IP packet length and
header checksum before doing so. They also share the same output
function dst_output.
This patch creates a new function called ip_local_out which does all
of that and converts the appropriate users over to it.
Apart from removing duplicate code, it will also help in merging the
IPsec output path once the same thing is done for IPv6.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
With inter-family transforms the inner mode differs from the outer
mode. Attempting to handle both sides from the same function means
that it needs to handle both IPv4 and IPv6 which creates duplication
and confusion.
This patch separates the two parts on the input path so that each
function deals with one family only.
In particular, the functions xfrm4_extract_inut/xfrm6_extract_inut
moves the pertinent fields from the IPv4/IPv6 IP headers into a
neutral format stored in skb->cb. This is then used by the inner mode
input functions to modify the inner IP header. In this way the input
function no longer has to know about the outer address family.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
With inter-family transforms the inner mode differs from the outer
mode. Attempting to handle both sides from the same function means
that it needs to handle both IPv4 and IPv6 which creates duplication
and confusion.
This patch separates the two parts on the output path so that each
function deals with one family only.
In particular, the functions xfrm4_extract_output/xfrm6_extract_output
moves the pertinent fields from the IPv4/IPv6 IP headers into a
neutral format stored in skb->cb. This is then used by the outer mode
output functions to write the outer IP header. In this way the output
function no longer has to know about the inner address family.
Since the extract functions are only called by tunnel modes (the only
modes that can support inter-family transforms), I've also moved the
xfrm*_tunnel_check_size calls into them. This allows the correct ICMP
message to be sent as opposed to now where you might call icmp_send
with an IPv6 packet and vice versa.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes the prototype of ipv4_copy_dscp and ipv6_copy_dscp so
that they directly take the outer DSCP rather than the outer IP header.
This will help us to unify the code for inter-family tunnels.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Half of the code in xfrm4_bundle_create and xfrm6_bundle_create are
common. This patch extracts that logic and puts it into
xfrm_bundle_create. The rest of it are then accessed through afinfo.
As a result this fixes the problem with inter-family transforms where
we treat every xfrm dst in the bundle as if it belongs to the top
family.
This patch also fixes a long-standing error-path bug where we may free
the xfrm states twice.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the flow construction from the callers of
xfrm_dst_lookup into that function. It also changes xfrm_dst_lookup
so that it takes an xfrm state as its argument instead of explicit
addresses.
This removes any address-specific logic from the callers of
xfrm_dst_lookup which is needed to correctly support inter-family
transforms.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The functions local_addr and remote_addr are more than what they're
needed for. The same thing can be done easily with flags on the type
object. This patch does that and simplifies the wrapper functions in
xfrm6_policy accordingly.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The file net/netevent.h only refers to struct dst_entry * so it
doesn't need to include dst.h. I've replaced it with a forward
declaration.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have a number of copies of dst_discard scattered around the place
which all do the same thing, namely free a packet on the input or
output paths.
This patch deletes all of them except dst_discard and points all the
users to it.
The only non-trivial bit is decnet where it returns an error.
However, conceptually this is identical to the blackhole functions
used in IPv4 and IPv6 which do not return errors. So they should
either all return errors or all return zero. For now I've stuck with
the majority and picked zero as the return value.
It doesn't really matter in practice since few if any driver would
react differently depending on a zero return value or NET_RX_DROP.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dst member nfheader_len is only used by IPv6. It's also currently
creating a rather ugly alignment hole in struct dst. Therefore this patch
moves it from there into struct rt6_info.
It also reorders the fields in rt6_info to minimize holes.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add raw drops counter for IPv4 in /proc/net/raw .
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An IPoIB subnet on an IB fabric that spans multiple IB subnets can't
use link-local scope in multicast GIDs. The existing routines that
map IP/IPv6 multicast addresses into IB link-level addresses hard-code
the scope to link-local, and they also leave the partition key field
uninitialised. This patch adds a parameter (the link-level broadcast
address) to the mapping routines, allowing them to initialise both the
scope and the P_Key appropriately, and fixes up the call sites.
The next step will be to add a way to configure the scope for an IPoIB
interface.
Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
It seems commit fda9ef5d67 introduced a RCU
protection for sk_filter(), without a rcu_dereference()
Either we need a rcu_dereference(), either a comment should explain why we
dont need it. I vote for the former.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
alg_key_len is the length in bits of the key, not in bytes.
Best way to fix this is to move alg_len() function from net/xfrm/xfrm_user.c
to include/net/xfrm.h, and to use it in xfrm_algo_clone()
alg_len() is renamed to xfrm_alg_len() because of its global exposition.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both NetLabel and SELinux (other LSMs may grow to use it as well) rely
on the 'iif' field to determine the receiving network interface of
inbound packets. Unfortunately, at present this field is not
preserved across a skb clone operation which can lead to garbage
values if the cloned skb is sent back through the network stack. This
patch corrects this problem by properly copying the 'iif' field in
__skb_clone() and removing the 'iif' field assignment from
skb_act_clone() since it is no longer needed.
Also, while we are here, put the assignments in the same order as the
offsets to reduce cacheline bounces.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The even should be called SCTP_AUTHENTICATION_INDICATION.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move veth.h from net/ to linux/ since it is a user api, and add it to
user header processing Kbuild.
[ Use header-y as suggested by Sam Ravnborg. -DaveM ]
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some users do "modprobe ip_conntrack hashsize=...". Since we have the
module aliases this loads nf_conntrack_ipv4 and nf_conntrack, the
hashsize parameter is unknown for nf_conntrack_ipv4 however and makes
it fail.
Allow to specify hashsize= for both nf_conntrack and nf_conntrack_ipv4.
Note: the nf_conntrack message in the ringbuffer will display an
incorrect hashsize since nf_conntrack is first pulled in as a
dependency and calculates the size itself, then it gets changed
through a call to nf_conntrack_set_hashsize().
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
During accept/migrate the code attempts to copy the addresses from
the parent endpoint to the new endpoint. However, if the parent
was bound to a wildcard address, then we end up pointlessly copying
all of the current addresses on the system.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_rt_advice has been gone, so no need to keep prototype and debug message.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
SCTP-AUTH requires selection of CRYPTO, HMAC and SHA1 since
SHA1 is a MUST requirement for AUTH. We also support SHA256,
but that's optional, so fix the code to treat it as such.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
The inet_ehash_locks_alloc() looks like this:
#ifdef CONFIG_NUMA
if (size > PAGE_SIZE)
x = vmalloc(...);
else
#endif
x = kmalloc(...);
Unlike it, the inet_ehash_locks_alloc() looks like this:
#ifdef CONFIG_NUMA
if (size > PAGE_SIZE)
vfree(x);
else
#else
kfree(x);
#endif
The error is obvious - if the NUMA is on and the size
is less than the PAGE_SIZE we leak the pointer (kfree is
inside the #else branch).
Compiler doesn't warn us because after the kfree(x) there's
a "x = NULL" assignment, so here's another (minor?) bug: we
don't set x to NULL under certain circumstances.
Boring explanation, I know... Patch explains it better.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
if (net_ratelimit())
IEEE80211_DEBUG_DROP(...)
can pollute the logs with messages like:
printk: 1 messages suppressed.
printk: 2 messages suppressed.
printk: 7 messages suppressed.
if debugging information is disabled. These messages are printed by
net_ratelimit(). Add a wrapper to net_ratelimit() that takes into account
the log level, so that net_ratelimit() is called only when we really want
to print something.
Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When the abstraction functions got added, conversion here was
made incorrectly. As a result, the skb may end up pointing
to skb which got included to the probe skb and then was freed.
For it to trigger, however, skb_transmit must fail sending as
well.
Signed-off-by: Ilpo Jrvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch the remaining IPVS sysctl entries over to to use CTL_UNNUMBERED,
I stronly doubt that anyone is using the sys_sysctl interface to
these variables.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
sysctl table check failed: /net/ipv4/vs/lblc_expiration .3.5.21.19 Missing strategy
[...]
sysctl table check failed: /net/ipv4/vs/lblcr_expiration .3.5.21.20 Missing strategy
Switch these entried over to use CTL_UNNUMBERED as clearly
the sys_syscal portion wasn't working.
This is along the same lines as Christian Borntraeger's patch that fixes
up entries with no stratergy in net/ipv4/ipvs/ip_vs_ctl.c
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Running the latest git code I get the following messages during boot:
sysctl table check failed: /net/ipv4/vs/drop_entry .3.5.21.4 Missing strategy
[...]
sysctl table check failed: /net/ipv4/vs/drop_packet .3.5.21.5 Missing strategy
[...]
sysctl table check failed: /net/ipv4/vs/secure_tcp .3.5.21.6 Missing strategy
[...]
sysctl table check failed: /net/ipv4/vs/sync_threshold .3.5.21.24 Missing strategy
I removed the binary sysctl handler for those messages and also removed
the definitions in ip_vs.h. The alternative would be to implement a
proper strategy handler, but syscall sysctl is deprecated.
There are other sysctl definitions that are commented out or work with
the default sysctl_data strategy. I did not touch these.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Indeed my previous change to alloc_pskb has made it possible
for the TCP header to be misaligned iff the MTU is not a multiple
of 4 (and less than a page). So I suspect the optimised IPsec
MTU calculation is giving you just such an MTU :)
This patch fixes it by changing alloc_pskb to make sure that
the size is at least 32-bit aligned. This does not cause the
problem fixed by the previous patch because max_header is always
32-bit aligned which means that in the SG/NOTSO case this will
be a no-op.
I thought about putting this in the callers but all the current
callers are from TCP. If and when we get a non-TCP caller we
can always create a TCP wrapper for this function and move the
alignment over there.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The request_sock_queue's listen_opt is either vmalloc-ed or
kmalloc-ed depending on the number of table entries. Thus it
is expected to be handled properly on free, which is done in
the reqsk_queue_destroy().
However the error path in inet_csk_listen_start() calls
the lite version of reqsk_queue_destroy, called
__reqsk_queue_destroy, which calls the kfree unconditionally.
Fix this and move the __reqsk_queue_destroy into a .c file as
it looks too big to be inline.
As David also noticed, this is an error recovery path only,
so no locking is required and the lopt is known to be not NULL.
reqsk_queue_yank_listen_sk is also now only used in
net/core/request_sock.c so we should move it there too.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We round up the header size in sk_stream_alloc_pskb so that
TSO packets get zero tail room. Unfortunately this rounding
up is not coordinated with the select_size() function used by
TCP to calculate the second parameter of sk_stream_alloc_pskb.
As a result, we may allocate more than a page of data in the
non-TSO case when exactly one page is desired.
In fact, rounding up the head room is detrimental in the non-TSO
case because it makes memory that would otherwise be available to
the payload head room. TSO doesn't need this either, all it wants
is the guarantee that there is no tail room.
So this patch fixes this by adjusting the skb_reserve call so that
exactly the requested amount (which all callers have calculated in
a precise way) is made available as tail room.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reverts Eric's commit 2b008b0a8e
It diets .text & .data section of the kernel if CONFIG_NET_NS is not set.
This is safe after list operations cleanup.
Signed-of-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The inetpeer.c tracks the LRU list of inet_perr-s, but makes
it by hands. Use the list_head-s for this.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a small memory leak. Default fib rules can be deleted by
the user if the rule does not carry FIB_RULE_PERMANENT flag, f.e. by
ip rule flush
Such a rule will not be freed as the ref-counter has 2 on start and becomes
clearly unreachable after removal.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
This counter is _always_ modified under the unix_gc_lock spinlock,
so its atomicity can be provided w/o additional efforts.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver operations set_ieee8021x(), set_port_auth() and
set_privacy_invoked() are not used by any drivers, except
set_privacy_invoked() they aren't even used by mac80211.
Remove them at least until we need to support drivers with
mac80211 that require getting this information.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This allows a driver to ask for a specific rate control algorithm.
The rate control algorithm asked for must be registered and be
available as a module or built-in.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There are many places that get the dst entry, increase the
__use counter and set the "lastuse" time stamp.
Make a helper for this.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
SCTP-AUTH and future ADD-IP updates have a requirement to
do additional verification of parameters and an ability to
ABORT the association if verification fails. So, introduce
additional return code so that we can clear signal a required
action.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
This patch adds a tunable that will allow ADD_IP to work without
AUTH for backward compatibility. The default value is off since
the default value for ADD_IP is off as well. People who need
to use ADD-IP with older implementations take risks of connection
hijacking and should consider upgrading or turning this tunable on.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
After learning more about rcu, it looks like the ADD-IP hadling
doesn't need to call call_rcu_bh. All the rcu critical sections
use rcu_read_lock, so using call_rcu_bh is wrong here.
Now, restore the local_bh_disable() code blocks and use normal
call_rcu() calls. Also restore the missing return statement.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Commit d0ce92910b broke several retransmit
cases including fast retransmit. The reason is that we should
only delay by rto while doing retranmists as a result of a timeout.
Retransmit as a result of path mtu discover, fast retransmit, or
other evernts that should trigger immidiate retransmissions got broken.
Also, since rto is doubled prior to marking of packets elegable for
retransmission, we never marked correct chunks anyway.
The fix is provide a reason for a given retransmission so that we
can mark chunks appropriately and to save the old rto value to do
comparisons against.
All regressions tests passed with this code.
Spotted by Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
As done two years ago on IP route cache table (commit
22c047ccbc) , we can avoid using one
lock per hash bucket for the huge TCP/DCCP hash tables.
On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for
litle performance differences. (we hit a different cache line for the
rwlock, but then the bucket cache line have a better sharing factor
among cpus, since we dirty it less often). For netstat or ss commands
that want a full scan of hash table, we perform fewer memory accesses.
Using a 'small' table of hashed rwlocks should be more than enough to
provide correct SMP concurrency between different buckets, without
using too much memory. Sizing of this table depends on
num_possible_cpus() and various CONFIG settings.
This patch provides some locking abstraction that may ease a future
work using a different model for TCP/DCCP table.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the master daemon to sync the connection when it is about
to close. This makes the connections on the backup to close or timeout
according their state. Before the sync was performed only if the
connection is in ESTABLISHED state which always made the connections to
timeout in the hard coded 3 minutes. However the Andy Gospodarek's patch
([IPVS]: use proper timeout instead of fixed value) effectively did nothing
more than increasing this to 15 minutes (Established state timeout). So
this patch makes use of proper timeout since it syncs the connections on
status changes to FIN_WAIT (2min timeout) and CLOSE (10sec timeout).
However if the backup misses CLOSE hopefully it did not miss FIN_WAIT.
Otherwise we will just have to wait for the ESTABLISHED state timeout. As
it is without this patch. This way the number of the hanging connections
on the backup is kept to minimum. And very few of them will be left to
timeout with a long timeout.
This is important if we want to make use of the fix for the real server
overcommit on master/backup fail-over.
Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the problem with node overload on director fail-over.
Given the scenario: 2 nodes each accepting 3 connections at a time and 2
directors, director failover occurs when the nodes are fully loaded (6
connections to the cluster) in this case the new director will assign
another 6 connections to the cluster, If the same real servers exist
there.
The problem turned to be in not binding the inherited connections to
the real servers (destinations) on the backup director. Therefore:
"ipvsadm -l" reports 0 connections:
root@test2:~# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP test2.local:5999 wlc
-> node473.local:5999 Route 1000 0 0
-> node484.local:5999 Route 1000 0 0
while "ipvs -lnc" is right
root@test2:~# ipvsadm -lnc
IPVS connection entries
pro expire state source virtual destination
TCP 14:56 ESTABLISHED 192.168.0.10:39164 192.168.0.222:5999
192.168.0.51:5999
TCP 14:59 ESTABLISHED 192.168.0.10:39165 192.168.0.222:5999
192.168.0.52:5999
So the patch I am sending fixes the problem by binding the received
connections to the appropriate service on the backup director, if it
exists, else the connection will be handled the old way. So if the
master and the backup directors are synchronized in terms of real
services there will be no problem with server over-committing since
new connections will not be created on the nonexistent real services
on the backup. However if the service is created later on the backup,
the binding will be performed when the next connection update is
received. With this patch the inherited connections will show as
inactive on the backup:
root@test2:~# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP test2.local:5999 wlc
-> node473.local:5999 Route 1000 0 1
-> node484.local:5999 Route 1000 0 1
rumen@test2:~$ cat /proc/net/ip_vs
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP C0A800DE:176F wlc
-> C0A80033:176F Route 1000 0 1
-> C0A80032:176F Route 1000 0 1
Regards,
Rumen Bogdanovski
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
There are places that check for CONFIG_IP_MULTIPLE_TABLES
twice in the same file, but the internals of these #ifdefs
can be merged.
As a side effect - remove one ifdef from inside a function.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
"struct proto" currently uses an array stats[NR_CPUS] to track change on
'inuse' sockets per protocol.
If NR_CPUS is big, this means we use a big memory area for this.
Moreover, all this memory area is located on a single node on NUMA
machines, increasing memory pressure on the boot node.
In this patch, I tried to :
- Keep a fast !CONFIG_SMP implementation
- Keep a fast CONFIG_SMP implementation for often used protocols
(tcp,udp,raw,...)
- Introduce a NUMA efficient implementation
Some helper macros are defined in include/net/sock.h
These macros take into account CONFIG_SMP
If a "struct proto" is declared without using DEFINE_PROTO_INUSE /
REF_PROTO_INUSE
macros, it will automatically use a default implementation, using a
dynamically allocated percpu zone.
This default implementation will be NUMA efficient, but might use 32/64
bytes per possible cpu
because of current alloc_percpu() implementation.
However it still should be better than previous implementation based on
stats[NR_CPUS] field.
When a "struct proto" is changed to use the new macros, we use a single
static "int" percpu variable,
lowering the memory and cpu costs, still preserving NUMA efficiency.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Not architecture specific code should not #include <asm/scatterlist.h>.
This patch therefore either replaces them with
#include <linux/scatterlist.h> or simply removes them if they were
unused.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
When the CONFIG_NET_NS is n there's no need in refcounting
the initial net namespace. So relax this code by making a
stupid stubs for the "n" case.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Finally, the zero_it argument can be completely removed from
the callers and from the function prototype.
Besides, fix the checkpatch.pl warnings about using the
assignments inside if-s.
This patch is rather big, and it is a part of the previous one.
I splitted it wishing to make the patches more readable. Hope
this particular split helped.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sock_copy() call is not used outside the sock.c file,
so just move it into a sock.c
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is not safe to to place struct pernet_operations in a special section.
We need struct pernet_operations to last until we call unregister_pernet_subsys.
Which doesn't happen until module unload.
So marking struct pernet_operations is a disaster for modules in two ways.
- We discard it before we call the exit method it points to.
- Because I keep struct pernet_operations on a linked list discarding
it for compiled in code removes elements in the middle of a linked
list and does horrible things for linked insert.
So this looks safe assuming __exit_refok is not discarded
for modules.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes three needlessly global functions static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_update_copy_cksum() is no longer used.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reworked skb_clone looks uglier with the single ifdef
CONFIG_NET_CLS_ACT This patch introduces skb_act_clone which will
replace skb_clone in tc actions
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
UDP currently uses skb->dev->ifindex which may provide the wrong
information when the socket bound to a specific interface.
This patch makes inet_iif() accessible to UDP and makes UDP use it.
The scenario we are trying to fix is when a client is running on
the same system and the server and both client and server bind to
a non-loopback device.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some are already declared in include/linux/netdevice.h, while
some others (xfrm ones) need to be declared.
The driver/net/rrunner.c just uses same extern as well, so
cleanup it also.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tcp_minshall_update() function is called in exactly one place, and is
passed an unsigned integer for the mss_len argument. Make the sign of the
argument match the sign of the passed variable in order to eliminate an
unneeded implicit type cast and a mixed sign comparison in
tcp_minshall_update().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the Bluetooth 1.2 specification the Extended SCO feature for
better audio connections was introduced. So far the Bluetooth core
wasn't able to handle any eSCO connections correctly. This patch
adds simple eSCO support while keeping backward compatibility with
older devices.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In case the remote entity tries to negogiate retransmission or flow
control mode, reject it and fall back to basic mode.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Bluetooth 1.2 specification introduced a specific features mask
value to interoperate with newer versions of the specification. So far
this piece of information was never needed, but future extensions will
rely on it. This patch adds a generic way to retrieve this information
only once per connection setup.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
After the change to the L2CAP configuration parameter handling the
global conf_mtu variable is no longer needed and so remove it.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Bluetooth HCI commands are divided into logical OGF groups for
easier identification of their purposes. While this still makes sense
for the written specification, its makes the code only more complex
and harder to read. So instead of using separate OGF and OCF values
to identify the commands, use a common 16-bit opcode that combines
both values. As a side effect this also reduces the complexity of
OGF and OCF calculations during command header parsing.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The task_struct->pid member is going to be deprecated, so start
using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in
the kernel.
The first thing to start with is the pid, printed to dmesg - in
this case we may safely use task_pid_nr(). Besides, printks produce
more (much more) than a half of all the explicit pid usage.
[akpm@linux-foundation.org: git-drm went and changed lots of stuff]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the largest patch in the set. Make all (I hope) the places where
the pid is shown to or get from user operate on the virtual pids.
The idea is:
- all in-kernel data structures must store either struct pid itself
or the pid's global nr, obtained with pid_nr() call;
- when seeking the task from kernel code with the stored id one
should use find_task_by_pid() call that works with global pids;
- when showing pid's numerical value to the user the virtual one
should be used, but however when one shows task's pid outside this
task's namespace the global one is to be used;
- when getting the pid from userspace one need to consider this as
the virtual one and use appropriate task/pid-searching functions.
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: nuther build fix]
[akpm@linux-foundation.org: yet nuther build fix]
[akpm@linux-foundation.org: remove unneeded casts]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Paul Menage <menage@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds a new field to xfrm states called inner_mode. The existing
mode object is renamed to outer_mode.
This is the first part of an attempt to fix inter-family transforms. As it
is we always use the outer family when determining which mode to use. As a
result we may end up shoving IPv4 packets into netfilter6 and vice versa.
What we really want is to use the inner family for the first part of outbound
processing and the outer family for the second part. For inbound processing
we'd use the opposite pairing.
I've also added a check to prevent silly combinations such as transport mode
with inter-family transforms.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is convenient to have a pointer from xfrm_state to address-specific
functions such as the output function for a family. Currently the
address-specific policy code calls out to the xfrm state code to get
those pointers when we could get it in an easier way via the state
itself.
This patch adds an xfrm_state_afinfo to xfrm_mode (since they're
address-specific) and changes the policy code to use it. I've also
added an owner field to do reference counting on the module providing
the afinfo even though it isn't strictly necessary today since IPv6
can't be unloaded yet.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently BEET mode does not reinject the packet back into the stack
like tunnel mode does. Since BEET should behave just like tunnel mode
this is incorrect.
This patch fixes this by introducing a flags field to xfrm_mode that
tells the IPsec code whether it should terminate and reinject the packet
back into the stack.
It then sets the flag for BEET and tunnel mode.
I've also added a number of missing BEET checks elsewhere where we check
whether a given mode is a tunnel or not.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The type and mode maps are only used by SAs, not policies. So it makes
sense to move them from xfrm_policy.c into xfrm_state.c. This also allows
us to mark xfrm_get_type/xfrm_put_type/xfrm_get_mode/xfrm_put_mode as
static.
The only other change I've made in the move is to get rid of the casts
on the request_module call for types. They're unnecessary because C
will promote them to ints anyway.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently xfrm6_rcv_spi gets the nexthdr value itself from the packet.
This means that we need to fix up the value in case we have a 4-on-6
tunnel. Moving this logic into the caller simplifies things and allows
us to merge the code with IPv4.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the tunnel parsing for IPv4 out of xfrm4_input and into
xfrm4_tunnel. This change is in line with what IPv6 does and will allow
us to merge the two input functions.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The proposed fix is to delay the reference counter decrement
until the quiescent state pass. This will give sk_clone() a
chance to get the reference on the cloned filter.
Regular sk_filter_uncharge can happen from the sk_free() only
and there's no need in delaying the put - the socket is dead
anyway and is to be release itself.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is done merely as a preparation for the fix.
The sk_filter_uncharge() unaccounts the filter memory and calls
the sk_filter_release(), which in turn decrements the refcount
anf frees the filter.
The latter function will be required separately.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since this callback is used to check for conflicts in
hashtable when inserting a newly created frag queue, we can
do the same by checking for matching the queue with the
argument, used to create one.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here we need another callback ->match to check whether the
entry found in hash matches the key passed. The key used
is the same as the creation argument for inet_frag_create.
Yet again, this ->match is the same for netfilter and ipv6.
Running a frew steps forward - this callback will later
replace the ->equal one.
Since the inet_frag_find() uses the already consolidated
inet_frag_create() remove the xxx_frag_create from protocol
codes.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This one uses the xxx_frag_intern() and xxx_frag_alloc()
routines, which are already consolidated, so remove them
from protocol code (as promised).
The ->constructor callback is used to init the rest of
the frag queue and it is the same for netfilter and ipv6.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just perform the kzalloc() allocation and setup common
fields in the inet_frag_queue(). Then return the result
to the caller to initialize the rest.
The inet_frag_alloc() may return NULL, so check the
return value before doing the container_of(). This looks
ugly, but the xxx_frag_alloc() will be removed soon.
The xxx_expire() timer callbacks are patches,
because the argument is now the inet_frag_queue, not
the protocol specific queue.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This routine checks for the existence of a given entry
in the hash table and inserts the new one if needed.
The ->equal callback is used to compare two frag_queue-s
together, but this one is temporary and will be removed
later. The netfilter code and the ipv6 one use the same
routine to compare frags.
The inet_frag_intern() always returns non-NULL pointer,
so convert the inet_frag_queue into protocol specific
one (with the container_of) without any checks.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A sysctl method was added to enable and disable debugging levels. After
further review, it was decided that there are better approaches to doing this
and the sysctl methodology isn't really desirable. This patch removes the
sysctl code from 9p.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
This patch moves transport dynamic registration and matching to the net
module to prevent a bad Kconfig dependency between the net and fs 9p modules.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
The 9P2000 protocol requires the authentication and permission checks to be
done in the file server. For that reason every user that accesses the file
server tree has to authenticate and attach to the server separately.
Multiple users can share the same connection to the server.
Currently v9fs does a single attach and executes all I/O operations as a
single user. This makes using v9fs in multiuser environment unsafe as it
depends on the client doing the permission checking.
This patch improves the 9P2000 support by allowing every user to attach
separately. The patch defines three modes of access (new mount option
'access'):
- attach-per-user (access=user) (default mode for 9P2000.u)
If a user tries to access a file served by v9fs for the first time, v9fs
sends an attach command to the server (Tattach) specifying the user. If
the attach succeeds, the user can access the v9fs tree.
As there is no uname->uid (string->integer) mapping yet, this mode works
only with the 9P2000.u dialect.
- allow only one user to access the tree (access=<uid>)
Only the user with uid can access the v9fs tree. Other users that attempt
to access it will get EPERM error.
- do all operations as a single user (access=any) (default for 9P2000)
V9fs does a single attach and all operations are done as a single user.
If this mode is selected, the v9fs behavior is identical with the current
one.
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
This patch abstracts out the interfaces to underlying transports so that
new transports can be added as modules. This should also allow kernel
configuration of transports without ifdef-hell.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
With all the users of the double pointers removed from the IPv6 input path,
this patch converts all occurances of sk_buff ** to sk_buff * in IPv6 input
handlers.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
These ones use the generic data types too, so move
them in one place.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The evictors collect some statistics for ipv4 and ipv6,
so make it return the number of evicted queues and account
them all at once in the caller.
The XXX_ADD_STATS_BH() macros are just for this case,
but maybe there are places in code, that can make use of
them as well.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
To make in possible we need to know the exact frag queue
size for inet_frags->mem management and two callbacks:
* to destoy the skb (optional, used in conntracks only)
* to free the queue itself (mandatory, but later I plan to
move the allocation and the destruction of frag_queues
into the common place, so this callback will most likely
be optional too).
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This code works with the generic data types as well, so
move this into inet_fragment.c
This move makes it possible to hide the secret_timer
management and the secret_rebuild routine completely in
the inet_fragment.c
Introduce the ->hashfn() callback in inet_frags() to get
the hashfun for a given inet_frag_queue() object.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since now all the xxx_frag_kill functions now work
with the generic inet_frag_queue data type, this can
be moved into a common place.
The xxx_unlink() code is moved as well.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some sysctl variables are used to tune the frag queues
management and it will be useful to work with them in
a common way in the future, so move them into one
structure, moreover they are the same for all the frag
management codes.
I don't place them in the existing inet_frags object,
introduced in the previous patch for two reasons:
1. to keep them in the __read_mostly section;
2. not to export the whole inet_frags objects outside.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are some objects that are common in all the places
which are used to keep track of frag queues, they are:
* hash table
* LRU list
* rw lock
* rnd number for hash function
* the number of queues
* the amount of memory occupied by queues
* secret timer
Move all this stuff into one structure (struct inet_frags)
to make it possible use them uniformly in the future. Like
with the previous patch this mostly consists of hunks like
- write_lock(&ipfrag_lock);
+ write_lock(&ip4_frags.lock);
To address the issue with exporting the number of queues and
the amount of memory occupied by queues outside the .c file
they are declared in, I introduce a couple of helpers.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce the struct inet_frag_queue in include/net/inet_frag.h
file and place there all the common fields from three structs:
* struct ipq in ipv4/ip_fragment.c
* struct nf_ct_frag6_queue in nf_conntrack_reasm.c
* struct frag_queue in ipv6/reassembly.c
After this, replace these fields on appropriate structures with
this structure instance and fix the users to use correct names
i.e. hunks like
- atomic_dec(&fq->refcnt);
+ atomic_dec(&fq->q.refcnt);
(these occupy most of the patch)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
With all the users of the double pointers removed, this patch mops up by
finally replacing all occurances of sk_buff ** in the netfilter API by
sk_buff *.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the IPVS-specific version of skb_make_writable and
replaces it with the netfilter one.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that ip_frag always returns the packet given to it on input, we can
change it to return an integer indicating error instead. This patch does
that and updates all its callers accordingly.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As discussed before, this patch provides userland with a way to access
relevant options in Router Advertisements, after they are processed
and validated by the kernel. Extra options are processed in a generic
way; this patch only exports RDNSS options described in RFC5006, but
support to control which options are exported could be easily added.
A new rtnetlink message type is defined, to transport Neighbor
Discovery options, along with optional context information. At the
moment only the address of the router sending an RDNSS option is
included, but additional attributes may be later defined, if needed by
new use cases.
Signed-off-by: Pierre Ynard <linkfanel@yahoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
found via make randconfig build testing:
net/built-in.o: In function `init_p9':
mod.c:(.init.text+0x3b39): undefined reference to `p9_sysctl_register'
net/built-in.o: In function `exit_p9':
mod.c:(.exit.text+0x36b): undefined reference to `p9_sysctl_unregister'
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch make processing netlink user -> kernel messages synchronious.
This change was inspired by the talk with Alexey Kuznetsov about current
netlink messages processing. He says that he was badly wrong when introduced
asynchronious user -> kernel communication.
The call netlink_unicast is the only path to send message to the kernel
netlink socket. But, unfortunately, it is also used to send data to the
user.
Before this change the user message has been attached to the socket queue
and sk->sk_data_ready was called. The process has been blocked until all
pending messages were processed. The bad thing is that this processing
may occur in the arbitrary process context.
This patch changes nlk->data_ready callback to get 1 skb and force packet
processing right in the netlink_unicast.
Kernel -> user path in netlink_unicast remains untouched.
EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock
drop, but the process remains in the cycle until the message will be fully
processed. So, there is no need to use this kludges now.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a few typos in comments in include/net/netlink.h
Signed-off-by: Pierre Ynard <linkfanel@yahoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Expansion of original idea from Denis V. Lunev <den@openvz.org>
Add robustness and locking to the local_port_range sysctl.
1. Enforce that low < high when setting.
2. Use seqlock to ensure atomic update.
The locking might seem like overkill, but there are
cases where sysadmin might want to change value in the
middle of a DoS attack.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add port randomization rather than a simple fixed rover
for use with SCTP. This makes it act similar to TCP, UDP, DCCP
when allocating ports.
No longer need port_alloc_lock as well (suggestion by Brian Haley).
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the duplicate ipv6_{auth,esp,comp}_hdr structures since
they're identical to the IPv4 versions. Duplicating them would only create
problems for ourselves later when we need to add things like extended
sequence numbers.
I've also added transport header type conversion headers for these types
which are now used by the transforms.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IPv6 calling convention for x->mode->output is more general and could
help an eventual protocol-generic x->type->output implementation. This
patch adopts it for IPv4 as well and modifies the IPv4 type output functions
accordingly.
It also rewrites the IPv6 mac/transport header calculation to be based off
the network header where practical.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves some common code that conceptually belongs to the xfrm core
from af_key/xfrm_user into xfrm_alloc_spi.
In particular, the spin lock on the state is now taken inside xfrm_alloc_spi.
Previously it also protected the construction of the response PF_KEY/XFRM
messages to user-space. This is inconsistent as other identical constructions
are not protected by the state lock. This is bad because they in fact should
be protected but only in certain spots (so as not to hold the lock for too
long which may cause packet drops).
The SPI byte order conversion has also been moved.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the net namespaces many code leaved the __init section,
thus making the kernel occupy more memory than it did before.
Since we have a config option that prohibits the namespace
creation, the functions that initialize/finalize some netns
stuff are simply not needed and can be freed after the boot.
Currently, this is almost not noticeable, since few calls
are no longer in __init, but when the namespaces will be
merged it will be possible to free more code. I propose to
use the __net_init, __net_exit and __net_initdata "attributes"
for functions/variables that are not used if the CONFIG_NET_NS
is not set to save more space in memory.
The exiting functions cannot just reside in the __exit section,
as noticed by David, since the init section will have
references on it and the compilation will fail due to modpost
checks. These references can exist, since the init namespace
never dies and the exit callbacks are never called. So I
introduce the __exit_refok attribute just like it is already
done with the __init_refok.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that the only callers of xfrm_replay_notify are in xfrm, we can remove
the export.
This patch also removes xfrm_aevent_doreplay since it's now called in just
one spot.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The replay counter is one of only two remaining things in the output code
that requires a lock on the xfrm state (the other being the crypto). This
patch moves it into the generic xfrm_output so we can remove the lock from
the transforms themselves.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The functions xfrm_state_check and xfrm_state_check_space are only used by
the output code in xfrm_output.c so we can move them over.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most of the code in xfrm4_output_one and xfrm6_output_one are identical so
this patch moves them into a common xfrm_output function which will live
in net/xfrm.
In fact this would seem to fix a bug as on IPv4 we never reset the network
header after a transform which may upset netfilter later on.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The keys are only used during initialisation so we don't need to carry them
in esp_data. Since we don't have to allocate them again, there is no need
to place a limit on the authentication key length anymore.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The keys are only used during initialisation so we don't need to carry them
in esp_data. Since we don't have to allocate them again, there is no need
to place a limit on the authentication key length anymore.
This patch also kills the unused auth.icv member.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
AF_IUCV socket programs may waste Linux storage, because af_iucv
allocates an skb whenever posted by the receive callback routine and
receives the message immediately.
Message receival is now postponed if data from previous callbacks has
not yet been transferred to the receiving socket program. Instead a
message handle is saved in a message queue as a reminder. Once
messages could be given to the receiving socket program, there is
an additional checking for entries in the message queue, followed
by skb allocation and message receival if applicable.
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a bunch of sparse warnings. Mostly about 0 used as
NULL pointer, and shadowed variable declarations.
One notable case was that hash size should have been unsigned.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds sta_notify callback and removes sta_table_notification
which was not used by any driver.
sta_notify() is essential for drivers that keeps notion of station
internally and need to be notified about removal or addition of a station
to the (I)BSS or assocation to an AP.
This version adds interface id to the parameter list
as suggested by Johannes Berg
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many devices have LEDs to indicate the link status.
Export this functionality to drivers.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This "algorithm" is used only internally and is not useful.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Michael Buesch <mb@bu3sch.de>
Acked-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removes the management interface since it is only required
for hostapd/userspace MLME, will not be in the final tree
at least in this form and hostapd/userspace MLME currently
do not work against this tree anyway.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since I cannot convince the lazy driver authors (hello Michael)
to stop (ab)using the MGMT interface type internally in their
drivers, this patch introduces a new _INVALID type especially
for their use and changes all affected drivers to use it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hook up the 93cx6 eeprom code to the ax88796 driver and modify the ax88796
driver to read out the mac address from the eeprom. We need this for the
ax88796 on certain SuperH boards. The pin configuration used to connect
the eeprom to the ax88796 on these boards is the same as pointed out by the
ax88796 datasheet, so we can probably reuse this code for multiple
platforms in the future.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Similar to the conntrack ID, the per-expectation ID is not needed
anymore, kill it.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the per-conntrack ID, its not necessary anymore for dumping.
For compatiblity reasons we send the address of the conntrack to
userspace as ID.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no struct nfattr anymore, rename functions to 'nlattr'.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of the duplicated rtnetlink macros and use the generic netlink
attribute functions. The old duplicated stuff is moved to a new header
file that exists just for userspace.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is not useful since we do not support probe response
offload to hardware at this time and beacons are set in
another way.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Stateless NAT is useful in controlled environments where restrictions are
placed on through traffic such that we don't need connection tracking to
correctly NAT protocol-specific data.
In particular, this is of interest when the number of flows or the number
of addresses being NATed is large, or if connection tracking information
has to be replicated and where it is not practical to do so.
Previously we had stateless NAT functionality which was integrated into
the IPv4 routing subsystem. This was a great solution as long as the NAT
worked on a subnet to subnet basis such that the number of NAT rules was
relatively small. The reason is that for SNAT the routing based system
had to perform a linear scan through the rules.
If the number of rules is large then major renovations would have take
place in the routing subsystem to make this practical.
For the time being, the least intrusive way of achieving this is to use
the u32 classifier written by Alexey Kuznetsov along with the actions
infrastructure implemented by Jamal Hadi Salim.
The following patch is an attempt at this problem by creating a new nat
action that can be invoked from u32 hash tables which would allow large
number of stateless NAT rules that can be used/updated in constant time.
The actual NAT code is mostly based on the previous stateless NAT code
written by Alexey. In future we might be able to utilise the protocol
NAT code from netfilter to improve support for other protocols.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The typedef is not required, we can just use "enum ieee80211_key_alg"
instead of "ieee80211_key_alg"
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
These inlines are generally useful, not just with mac80211.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds a lot more documentation (in kernel-doc format)
to include/net/mac80211.h
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Currently, hardware flags that drivers must set are not
documented well enough. Fix this.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch changes mac80211 to verify that VLAN interfaces
are valid and not bother drivers about them any more.
VLAN interfaces are now only valid when an AP interface
is up with the same MAC address, and are automatically
turned off when the AP interface is set down.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Jouni Malinen <j@w1.fi>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Drivers are currently supposed to keep track of monitor
interfaces if they allow so-called "hard" monitor, and
they are also supposed to keep track of multicast etc.
This patch changes that, replaces the set_multicast_list()
callback with a new configure_filter() callback that takes
filter flags (FIF_*) instead of interface flags (IFF_*).
For a driver, this means it should open the filter as much
as necessary to get all frames requested by the filter flags.
Accordingly, the filter flags are named "positively", e.g.
FIF_ALLMULTI.
Multicast filtering is a bit special in that drivers that
have no multicast address filters need to allow multicast
frames through when either the FIF_ALLMULTI flag is set or
when the mc_count value is positive.
At the same time, drivers are no longer notified about
monitor interfaces at all, this means they now need to
implement the start() and stop() callbacks and the new
change_filter_flags() callback. Also, the start()/stop()
ordering changed, start() is now called *before* any
add_interface() as it really should be, and stop() after
any remove_interface().
The patch also changes the behaviour of setting the bssid
to multicast for scanning when IEEE80211_HW_NO_PROBE_FILTERING
is set; the IEEE80211_HW_NO_PROBE_FILTERING flag is removed
and the filter flag FIF_BCN_PRBRESP_PROMISC introduced.
This is a lot more efficient for hardware like b43 that
supports it and other hardware can still set the BSSID
to all-ones.
Driver modifications by Johannes Berg (b43 & iwlwifi), Michael Wu
(rtl8187, adm8211, and p54), Larry Finger (b43legacy), and
Ivo van Doorn (rt2x00).
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Denis V. Lunev <den@sw.ru> noticed that the locking rules
for the network namespace list are over complicated and broken.
In particular the current register_netdev_notifier currently
does not take any lock making the for_each_net iteration racy
with network namespace creation and destruction. Oops.
The fact that we need to use for_each_net in rtnl_unlock() when
the rtnetlink support becomes per network namespace makes designing
the proper locking tricky. In addition we need to be able to call
rtnl_lock() and rtnl_unlock() when we have the net_mutex held.
After thinking about it and looking at the alternatives carefully
it looks like the simplest and most maintainable solution is
to remove net_list_mutex altogether, and to use the rtnl_mutex instead.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since hardware header operations are part of the protocol class
not the device instance, make them into a separate object and
save memory.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add inline for common usage of hardware header creation, and
fix bug in IPV6 mcast where the assumption about negative return is
an errno. Negative return from hard_header means not enough space
was available,(ie -N bytes).
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes loopback_dev per network namespace. Adding
code to create a different loopback device for each network
namespace and adding the code to free a loopback device
when a network namespace exits.
This patch modifies all users the loopback_dev so they
access it as init_net.loopback_dev, keeping all of the
code compiling and working. A later pass will be needed to
update the users to use something other than the initial network
namespace.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows you to create a new network namespace
using sys_clone, or sys_unshare.
As the network namespace is still experimental and under development
clone and unshare support is only made available when CONFIG_NET_NS is
selected at compile time.
As this patch introduces network namespace support into code paths
that exist when the CONFIG_NET is not selected there are a few
additions made to net_namespace.h to allow a few more functions
to be used when the networking stack is not compiled in.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is unused.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add more mac80211 documentation.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IEEE80211_CONF_SSID_HIDDEN setting is not useful for any driver
we have and should be a per-interface setting anyway. Remove it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows drivers to indicate bad FCS/PLCP CRC to the stack and
have the stack drop packets like that except for monitor interfaces.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems I was actually able to hit this deadlock, on my quad G5 softmac
locks up more often than not. This fixes it by using an own workqueue
that can safely be flushed under RTNL.
Not sure if the patch is correct with the workqueue naming. And don't
think with the patch it doesn't continually lock up. It still does, just
doesn't invoke lockdep warnings all the time.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no reason to clear the sacktag skb hint when small part
of the rexmit queue changes. Account changes (if any) instead when
fragmenting/collapsing. RTO/FRTO do not touch SACKED_ACKED bits so
no need to discard SACK tag hint at all.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
ADD-IP spec requires AUTH. It is, in fact, dangerous without AUTH.
So, disable ADD-IP functionality if the peer claims to support
ADD-IP, but not AUTH.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add SCTP-AUTH API. The API implemented here was
agreed to between implementors at the 9th SCTP Interop.
It will be documented in the next revision of the
SCTP socket API spec.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements the receive path needed to process authenticated
chunks. Add ability to process the AUTH chunk and handle edge cases
for authenticated COOKIE-ECHO as well.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SCTP-AUTH, Section 6.2:
Endpoints MUST send all requested chunks authenticated where this has
been requested by the peer. The other chunks MAY be sent
authenticated or not. If endpoint pair shared keys are used, one of
them MUST be selected for authentication.
To send chunks in an authenticated way, the sender MUST include these
chunks after an AUTH chunk. This means that a sender MUST bundle
chunks in order to authenticate them.
If the endpoint has no endpoint pair shared key for the peer, it MUST
use Shared Key Identifier 0 with an empty endpoint pair shared key.
If there are multiple endpoint shared keys the sender selects one and
uses the corresponding Shared Key Identifier
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement processing for the CHUNKS, RANDOM, and HMAC parameters and
deal with how this parameters are effected by association restarts.
In particular, during unexpeted INIT processing, we need to reply with
parameters from the original INIT chunk. Also, after restart, we need
to update the old association with new peer parameters and change the
association shared keys.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements the internals operations of the AUTH, such as
key computation and storage. It also adds necessary variables to
the SCTP data structures.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Background: RFC 4293 deprecates existing individual, named ICMP
type counters to be replaced with the ICMPMsgStatsTable. This table
includes entries for both IPv4 and IPv6, and requires counting of all
ICMP types, whether or not the machine implements the type.
These patches "remove" (but not really) the existing counters, and
replace them with the ICMPMsgStats tables for v4 and v6.
It includes the named counters in the /proc places they were, but gets the
values for them from the new tables. It also counts packets generated
from raw socket output (e.g., OutEchoes, MLD queries, RA's from
radvd, etc).
Changes:
1) create icmpmsg_statistics mib
2) create icmpv6msg_statistics mib
3) modify existing counters to use these
4) modify /proc/net/snmp to add "IcmpMsg" with all ICMP types
listed by number for easy SNMP parsing
5) modify /proc/net/snmp printing for "Icmp" to get the named data
from new counters.
Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Background: RFC 4293 deprecates existing individual, named ICMP
type counters to be replaced with the ICMPMsgStatsTable. This table
includes entries for both IPv4 and IPv6, and requires counting of all
ICMP types, whether or not the machine implements the type.
These patches "remove" (but not really) the existing counters, and
replace them with the ICMPMsgStats tables for v4 and v6.
It includes the named counters in the /proc places they were, but gets the
values for them from the new tables. It also counts packets generated
from raw socket output (e.g., OutEchoes, MLD queries, RA's from
radvd, etc).
Changes:
1) create icmpmsg_statistics mib
2) create icmpv6msg_statistics mib
3) modify existing counters to use these
4) modify /proc/net/snmp to add "IcmpMsg" with all ICMP types
listed by number for easy SNMP parsing
5) modify /proc/net/snmp printing for "Icmp" to get the named data
from new counters.
[new to 2nd revision]
6) support per-interface ICMP stats
7) use common macro for per-device stat macros
Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I was looking at Patrick's fix to inet_diag and it occured
to me that we're using a pointer argument to return values
unnecessarily in netlink_run_queue. Changing it to return
the value will allow the compiler to generate better code
since the value won't have to be memory-backed.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
SCTP Supported Extenions parameter is specified in Section 4.2.7
of the ADD-IP draft (soon to be RFC). The parameter is
encoded as:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Parameter Type = 0x8008 | Parameter Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CHUNK TYPE 1 | CHUNK TYPE 2 | CHUNK TYPE 3 | CHUNK TYPE 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| .... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CHUNK TYPE N | PAD | PAD | PAD |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
It contains a list of chunks that a particular SCTP extension
uses. Current extensions supported are Partial Reliability
(FWD-TSN) and ADD-IP (ASCONF and ASCONF-ACK).
When implementing new extensions (AUTH, PKT-DROP, etc..), new
chunks need to be added to this parameter. Parameter processing
would be modified to negotiate support for these new features.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch slightly cleanups FIB rules framework. rules_list as a pointer
on struct fib_rules_ops is useless. It is always assigned with a static
per/subsystem list in IPv4, IPv6 and DecNet.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
The TKIP mixing code was added for the benefit of Intel's ipw3945
chipset but that code ended up not using it. We have previously
identified many problems with this code and it crystallized that
library functions for mixing are likely to handle this in much
more generality and might allow b43 to take advantage of hardware
acceleration for TKIP.
Due to these reasons, remove the TKIP mixing for hardware
accelerated crypto operations.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Buesch <mb@bu3sch.de>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the mac80211/driver interface rely only on the
IEEE80211_TXCTL_DO_NOT_ENCRYPT flag to signal to the driver whether
a frame should be encrypted or not, since mac80211 internally no
longer relies on HW_KEY_IDX_INVALID either this removes it, changes
the key index to be a u8 in all places and makes the full range of
the value available to drivers.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch formats some documentation in mac80211.h into kerneldoc
and also adds some more explanations for hardware crypto.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No existing drivers use this callback, hence there's no telling
how it might be used. In fact, it is unlikely to be of much use
as-is because the default key index isn't something that the
driver can do much with without knowing which interface it was
for etc. And if it needs the key index for the transmitted frame,
it can get it by keeping a reference to the key_conf structure
and looking it up by hw_key_idx.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reworks the various hardware crypto related
flags to make them more local, i.e. put them with each
key or each packet instead of into the hw struct.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes all mention of the atheros turbo modes that
can't possibly work properly anyway since in some places we don't
check for them when we should.
I have no idea what the iwlwifi drivers were doing with these but
it can't possibly have been correct.
Cc: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The problem: proc_net files remember which network namespace the are
against but do not remember hold a reference count (as that would pin
the network namespace). So we currently have a small window where
the reference count on a network namespace may be incremented when opening
a /proc file when it has already gone to zero.
To fix this introduce maybe_get_net and get_proc_net.
maybe_get_net increments the network namespace reference count only if it is
greater then zero, ensuring we don't increment a reference count after it
has gone to zero.
get_proc_net handles all of the magic to go from a proc inode to the network
namespace instance and call maybe_get_net on it.
PROC_NET the old accessor is removed so that we don't get confused and use
the wrong helper function.
Then I fix up the callers to use get_proc_net and handle the case case
where get_proc_net returns NULL. In that case I return -ENXIO because
effectively the network namespace has already gone away so the files
we are trying to access don't exist anymore.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When CONFIG_NET=no, init_net is unresolved because net_namespace.c
is not compiled and the include pull init_net definition.
This problem was very similar with the ipc namespace where the kernel
can be compiled with SYSV ipc out.
This patch fix that defining a macro which simply remove init_net
initialization from nsproxy namespace aggregator.
Compiled and booted on qemu-i386 with CONFIG_NET=no and CONFIG_NET=yes.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is done in order to, add support to changing the rate table to
use the upper-boundry L2T (length to time) value. Currently we use the
lower-boundry, which result in under-estimating the actual bandwidth
usage.
Extend the tc_ratespec struct, with two parameters: 1) "cell_align"
that allow adjusting the alignment of the rate table. 2) "overhead"
that allow adding a packet overhead before the lookup.
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change L2T (length to time) macros, in all rate based schedulers, to
call a common function qdisc_l2t() that does the rate table lookup.
This function handles if the packet size lookup is larger than the
rate table, which often occurs with TSO enabled.
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change allows the generic attribute interface to be used within
the netfilter subsystem where this flag was initially introduced.
The byte-order flag is yet unused, it's intended use is to
allow automatic byte order convertions for all atomic types.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes most of the generic device layer network
namespace safe. This patch makes dev_base_head a
network namespace variable, and then it picks up
a few associated variables. The functions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctl
were modified to take a network namespace argument, and
deal with it.
vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.
So basically anthing in the core of the network stack that was
affected to by the change of dev_base was modified to handle
multiple network namespaces. The rest of the network stack was
simply modified to explicitly use &init_net the initial network
namespace. This can be fixed when those components of the network
stack are modified to handle multiple network namespaces.
For now the ifindex generator is left global.
Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.
At the same time there are assumptions in the network stack
that the ifindex of a network device won't change. Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch passes in the namespace a new socket should be created in
and has the socket code do the appropriate reference counting. By
virtue of this all socket create methods are touched. In addition
the socket create methods are modified so that they will fail if
you attempt to create a socket in a non-default network namespace.
Failing if we attempt to create a socket outside of the default
network namespace ensures that as we incrementally make the network stack
network namespace aware we will not export functionality that someone
has not audited and made certain is network namespace safe.
Allowing us to partially enable network namespaces before all of the
exotic protocols are supported.
Any protocol layers I have missed will fail to compile because I now
pass an extra parameter into the socket creation code.
[ Integrated AF_IUCV build fixes from Andrew Morton... -DaveM ]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes /proc/net per network namespace. It modifies the global
variables proc_net and proc_net_stat to be per network namespace.
The proc_net file helpers are modified to take a network namespace argument,
and all of their callers are fixed to pass &init_net for that argument.
This ensures that all of the /proc/net files are only visible and
usable in the initial network namespace until the code behind them
has been updated to be handle multiple network namespaces.
Making /proc/net per namespace is necessary as at least some files
in /proc/net depend upon the set of network devices which is per
network namespace, and even more files in /proc/net have contents
that are relevant to a single network namespace.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sockets need to get a reference to their network namespace,
or possibly a simple hold if someone registers on the network
namespace notifier and will free the sockets when the namespace
is going to be destroyed.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the basic infrastructure needed to support network
namespaces. This infrastructure is:
- Registration functions to support initializing per network
namespace data when a network namespaces is created or destroyed.
- struct net. The network namespace data structure.
This structure will grow as variables are made per network
namespace but this is the minimal starting point.
- Functions to grab a reference to the network namespace.
I provide both get/put functions that keep a network namespace
from being freed. And hold/release functions serve as weak references
and will warn if their count is not zero when the data structure
is freed. Useful for dealing with more complicated data structures
like the ipv4 route cache.
- A list of all of the network namespaces so we can iterate over them.
- A slab for the network namespace data structure allowing leaks
to be spotted.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch modifies the current ipsec audit layer
by breaking it up into purpose driven audit calls.
So far, the only audit calls made are when add/delete
an SA/policy. It had been discussed to give each
key manager it's own calls to do this, but I found
there to be much redundnacy since they did the exact
same things, except for how they got auid and sid, so I
combined them. The below audit calls can be made by any
key manager. Hopefully, this is ok.
Signed-off-by: Joy Latten <latten@austin.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The type of owner in sock_lock_t is currently (struct sock_iocb *),
presumably for historical reasons. It is never used as this type, only
tested as NULL or set to (void *)1. For clarity, this changes it to type
int, and renames to owned, to avoid any possible type casting errors.
Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
This moves all the key handling code out from ieee80211_ioctl.c
into key.c and also does the following changes including documentation
updates in mac80211.h:
1) Turn off hardware acceleration for keys when the interface
is down. This is necessary because otherwise monitor
interfaces could be decrypting frames for other interfaces
that are down at the moment. Also, it should go some way
towards better suspend/resume support, in any case the
routines used here could be used for that as well.
Additionally, this makes the driver interface nicer, keys
for a specific local MAC address are only ever present
while an interface with that MAC address is enabled.
2) Change driver set_key() callback interface to allow only
return values of -ENOSPC, -EOPNOTSUPP and 0, warn on all
other return values. This allows debugging the stack when
a driver notices it's handed a key while it is down.
3) Invert the flag meaning to KEY_FLAG_UPLOADED_TO_HARDWARE.
4) Remove REMOVE_ALL_KEYS command as it isn't used nor do we
want to use it, we'll use DISABLE_KEY for each key. It is
hard to use REMOVE_ALL_KEYS because we can handle multiple
virtual interfaces with different key configuration, so we'd
have to keep track of a lot of state for this and that isn't
worth it.
5) Warn when disabling a key fails, it musn't.
6) Remove IEEE80211_HW_NO_TKIP_WMM_HWACCEL in favour of per-key
IEEE80211_KEY_FLAG_WMM_STA to let driver sort it out itself.
7) Tell driver that a (non-WEP) key is used only for transmission
by using an all-zeroes station MAC address when configuring.
8) Change the set_key() callback to have access to the local MAC
address the key is being added for.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the default_wep_only stuff, this wasn't really done well
and no current driver actually cares.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch embeds the struct ieee80211_key_conf into struct ieee80211_key
and thus avoids allocations and having data present twice.
This required some more changes:
1) The removal of the IEEE80211_KEY_DEFAULT_TX_KEY key flag.
This flag isn't used by drivers nor should it be since
we have a set_key_idx() callback. Maybe that callback needs
to be extended to include the key conf, but only a driver that
requires it will tell.
2) The removal of the IEEE80211_KEY_DEFAULT_WEP_ONLY key flag.
This flag is global, so it shouldn't be passed in the key
conf structure. Pass it to the function instead.
Also, this patch removes the AID parameter to the set_key() callback
because it is currently unused and the hardware currently cannot know
about the AID anyway. I suspect this was used with some hardware that
actually selected the AID itself, but that functionality was removed.
Additionally, I've removed the ALG_NULL key algorithm since we have
ALG_NONE.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ioctls
* PRISM2_PARAM_RADAR_DETECT
* PRISM2_PARAM_SPECTRUM_MGMT
are not used by hostapd or wpa_supplicant,
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ioctls
* PRISM2_PARAM_STA_ANTENNA_SEL
* PRISM2_PARAM_TX_POWER_REDUCTION
* PRISM2_PARAM_DEFAULT_WEP_ONLY
are not used by hostapd or wpa_supplicant.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ioctls
* PRISM2_PARAM_ANTENNA_MODE
* PRISM2_PARAM_STAT_TIME
are not used by hostapd or wpa_supplicant.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When doing key selection for software decryption, mac80211 gets
a few things wrong: it always uses pairwise keys if configured,
even if the frame is addressed to a multicast address. Also, it
doesn't allow using a key index of zero if a pairwise key has
also been found.
This patch changes the key selection code to be (more) in line
with the 802.11 specification. I have confirmed that with this,
multicast frames are correctly decrypted and I've tested with
WEP as well.
While at it, I've cleaned up the semantics of the hardware flags
IEEE80211_HW_WEP_INCLUDE_IV and IEEE80211_HW_DEVICE_HIDES_WEP
and clarified them in the mac80211.h header; it is also now
allowed to set the IEEE80211_HW_DEVICE_HIDES_WEP option even if
it only applies to frames that have been decrypted by the hw,
unencrypted frames must be dropped but encrypted frames that
the hardware couldn't handle can be passed up unmodified.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Unused in drivers, userspace and mac80211.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The flag is never checked because drivers can simply call
ieee80211_beacon_get() regardless of setting this flag.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The callback isn't used so remove it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hopefully captured all single statement cases under net/. I'm
not too sure if there is some policy about #includes that are
"guaranteed" (ie., in the current tree) to be available through
some other #included header, so I just added linux/kernel.h to
each changed file that didn't #include it previously.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
When XFRM policy and state are ready after TCP connection is started,
the traffic should be transformed immediately, however it does not
on IPv6 TCP.
It depends on a dst cache replacement policy with connected socket.
It seems that the replacement is always done for IPv4, however, on
IPv6 case it is done only when routing cookie is changed.
This patch fix that non-transformation dst can be changed to
transformation one.
This behavior is required by MIPv6 and improves IPv6 IPsec.
Fixes by Masahide NAKAMURA.
Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add v4mapped address inline to avoid calls to ipv6_addr_type().
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Makes caller side more obvious, there's no need to have
a wrapper for this oneliner!
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces autotuning to the sctp buffer management code
similar to the TCP. The buffer space can be grown if the advertised
receive window still has room. This might happen if small message
sizes are used, which is common in telecom environmens.
New tunables are introduced that provide limits to buffer growth
and memory pressure is entered if to much buffer spaces is used.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously code had IsReno/IsFack defined as macros that were
local to tcp_input.c though sack_ok field has user elsewhere too
for the same purpose. This changes them to static inlines as
preferred according the current coding style and unifies the
access to sack_ok across multiple files. Magic bitops of sack_ok
for FACK and DSACK are also abstracted to functions with
appropriate names.
Note:
- One sack_ok = 1 remains but that's self explanary, i.e., it
enables sack
- Couple of !IsReno cases are changed to tcp_is_sack
- There were no users for IsDSack => I dropped it
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
BUG_ON is an overkill. In fact, I was mislead by BUG_TRAP
severity (equals to WARN_ON) which is much lower than BUG_ON's
(that panics).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously TCP had a transitional state during which reno
counted segments that are already below the current window into
sacked_out, which is now prevented. In addition, re-try now
the unconditional S+L skb catching.
This approach conservatively calls just remove_sack and leaves
reset_sack() calls alone. The best solution to the whole problem
would be to first calculate the new sacked_out fully (this patch
does not move reno_sack_reset calls from original sites and thus
does not implement this). However, that would require very
invasive change to fastretrans_alert (perhaps even slicing it to
two halves). Alternatively, all callers of tcp_packets_in_flight
(i.e., users that depend on sacked_out) should be postponed
until the new sacked_out has been calculated but it isn't any
simpler alternative.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Left_out was dropped a while ago, thus leaving verifying
consistency of the "left out" as only task for the function in
question. Thus make it's name more appropriate.
In addition, it is intentionally converted to #define instead
of static inline because the location of the invariant failure
is the most important thing to have if this ever triggers. I
think it would have been helpful e.g. in this case where the
location of the failure point had to be based on some quesswork:
http://lkml.org/lkml/2007/5/2/464
...Luckily the guesswork seems to have proved to be correct.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
tp->left_out got removed but nothing came to replace it back
then (users just did addition by themselves), so add function
for users now.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is easily calculable when needed and user are not that many
after all.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
No other users exist for tcp_ecn.h. Very few things remain in
tcp.h, for most TCP ECN functions callers reside within a
single .c file and can be placed there.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Veth stands for Virtual ETHernet. It is a simple tunnel driver
that works at the link layer and looks like a pair of ethernet
devices interconnected with each other.
Mainly it allows to communicate between network namespaces but
it can be used as is as well.
The newlink callback is organized that way to make it easy to
create the peer device in the separate namespace when we have
them in kernel.
This implementation uses another interface - the RTM_NRELINK
message introduced by Patric.
Bug fixes from Daniel Lezcano.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This routine gets the parsed rtnl attributes and creates a new
link with generic info (IFLA_LINKINFO policy). Its intention
is to help the drivers, that need to create several links at
once (like VETH).
This is nothing but a copy-paste-ed part of rtnl_newlink() function
that is responsible for creation of new device.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ieee80211_get_radiotap_len() tries to dereference radiotap length without
taking care that it is completely unaligned and get_unaligned()
is required.
Signed-off-by: Andy Green <andy@warmcat.com>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
zd1211rw and bcm43xx are interested in being notified when ERP IE conditions
change, so that they can reprogram a register which affects how control frames
are transmitted.
This patch adds an interface similar to the one that can be found in softmac.
Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Similarly to CTS protection, whether short preambles are used for 802.11b
transmissions should be a per-subif setting, not device global.
For STAs, this patch makes short preamble handling automatic based on the ERP
IE. For APs, hostapd still uses the prism ioctls, but the write ioctl has been
restricted to AP-only subifs.
ieee80211_txrx_data.short_preamble (an unused field) was removed.
Unfortunately, some API changes were required for the following functions:
- ieee80211_generic_frame_duration
- ieee80211_rts_duration
- ieee80211_ctstoself_duration
- ieee80211_rts_get
- ieee80211_ctstoself_get
Affected drivers were updated accordingly.
Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
mac80211 informs the driver what the short and long retry values are through
set_retry_limit(), but when packets are being transmitted it did not inform the
driver which of the 2 retry limits should actually be used.
Instead it sends the actual value, but for drivers that can only set the retry limit
and the register and in the descriptor need to indicate which of the limits should
be used this is not really useful.
This patch will add a IEEE80211_TXCTL_LONG_RETRY_LIMIT flag to the
ieee80211_tx_control structure. By default the short retry limit should be
used but if the flag is set the long retry should be used.
This does not prevent the driver to ignore the request for "no retry" packets,
but at least those will be send out with the short retry limit. But there is no
perfect cure for this problem.. :(
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The sta_info code has some awkward locking which prevents some driver
callbacks from being allowed to sleep. This patch makes the locking more
focused so code that calls driver callbacks are allowed to sleep. It also
converts sta_lock to a rwlock.
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Based upon a report and initial patch by Peter Lieven.
tcp4_md5sig_key and tcp6_md5sig_key need to start with
the exact same members as tcp_md5sig_key. Because they
are both cast to that type by tcp_v{4,6}_md5_do_lookup().
Unfortunately tcp{4,6}_md5sig_key use a u16 for the key
length instead of a u8, which is what tcp_md5sig_key
uses. This just so happens to work by accident on
little-endian, but on big-endian it doesn't.
Instead of casting, just place tcp_md5sig_key as the first member of
the address-family specific structures, adjust the access sites, and
kill off the ugly casts.
Signed-off-by: David S. Miller <davem@davemloft.net>
It gets pointer to fastcall function, expects a pointer to normal
one and calls the sucker.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If ADDIP is enabled, when an ASCONF chunk is received with ASCONF
paramter length set to zero, this will cause infinite loop.
By the way, if an malformed ASCONF chunk is received, will cause
processing to access memory without verifying.
This is because of not check the validity of parameters in ASCONF chunk.
This patch fixed this.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
While processing OOTB chunks as well as chunks with an invalid
length of 0, it was possible to SCTP to get wedged inside an
infinite loop because we didn't catch the condition correctly,
or didn't mark the packet for discard correctly.
This work is based on original findings and work by
Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Since the sctp_sockaddr_entry is now RCU enabled as part of
the patch to synchronize sctp_localaddr_list, it makes sense to
change all handling of these entries to RCU. This includes the
sctp_bind_addrs structure and it's list of bound addresses.
This list is currently protected by an external rw_lock and that
looks like an overkill. There are only 2 writers to the list:
bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
These are already seriealized via the socket lock, so they will
not step on each other. These are also relatively rare, so we
should be good with RCU.
The readers are varied and they are easily converted to RCU.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Sridhar Samdurala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_localaddr_list is modified dynamically via NETDEV_UP
and NETDEV_DOWN events, but there is not synchronization
between writer (even handler) and readers. As a result,
the readers can access an entry that has been freed and
crash the sytem.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Sridhar Samdurala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PROTOCOL VIOLATION error cause in ABORT is bad encode when make abort
chunk. When SCTP encode ABORT chunk with PROTOCOL VIOLATION error cause,
it just add the error messages to PROTOCOL VIOLATION error cause, the
rest four bytes(struct sctp_paramhdr) is just add to the chunk, not
change the length of error cause. This cause the ABORT chunk to be a bad
format. The chunk is like this:
ABORT chunk
Chunk type: ABORT (6)
Chunk flags: 0x00
Chunk length: 72 (*1)
Protocol violation cause
Cause code: Protocol violation (0x000d)
Cause length: 62 (*2)
Cause information: 5468652063756D756C61746976652074736E2061636B2062...
Cause padding: 0000
[Needless] 00030010
Chunk Length(*1) = 72 but Cause length(*2) only 62, not include the
extend 4 bytes.
((72 - sizeof(chunk_hdr)) = 68) != (62 +3) / 4 * 4
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
When we recieve a FWD-TSN (meaning the peer has abandoned the data),
we need to clean up any partially received messages that may be
hanging out on the re-assembly or re-ordering queues. This is
a MUST requirement that was not properly done before.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com.>
Loading nf_nat causes the conntrack core to be loaded, but we need IPv4 as
well.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
As discovered by Evegniy Polyakov, if we try to sendmsg after
a connection reset, we can do incredibly stupid things.
The core issue is that inet_sendmsg() tries to autobind the
socket, but we should never do that for TCP. Instead we should
just go straight into TCP's sendmsg() code which will do all
of the necessary state and pending socket error checks.
TCP's sendpage already directly vectors to tcp_sendpage(), so this
merely brings sendmsg() in line with that.
Signed-off-by: David S. Miller <davem@davemloft.net>
A small fix to the SELinux/NetLabel glue code to ensure that the NetLabel
cache is utilized when possible. This was broken when the SELinux/NetLabel
glue code was reorganized in the last kernel release.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
sctp_chunk_cachep & sctp_bucket_cachep is used module global, so move it
to a header file.
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
The following code can now become static:
- struct unix_socket_table
- unix_table_lock
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/if_inet6.h includes linux/ipv6.h which also tries to include
net/if_inet6.h. Since the latter only needs it for forward
declarations, we can fix this by adding the declarations.
A number of files are implicitly including net/if_inet6.h through
linux/ipv6.h. They also use net/ipv6.h so this patch includes
net/if_inet6.h there.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
no code changes, just documenting existing types
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes the API for the callback that is done after an ACK is
received. It solves a couple of issues:
* Some congestion controls want higher resolution value of RTT
(controlled by TCP_CONG_RTT_SAMPLE flag). These don't really want a ktime, but
all compute a RTT in microseconds.
* Other congestion control could use RTT at jiffies resolution.
To keep API consistent the units should be the same for both cases, just the
resolution should change.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
no real bugs, just misannotations cropping up
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
SELinux: use SECINITSID_NETMSG instead of SECINITSID_UNLABELED for NetLabel
SELinux: enable dynamic activation/deactivation of NetLabel/SELinux enforcement
Create a new NetLabel KAPI interface, netlbl_enabled(), which reports on the
current runtime status of NetLabel based on the existing configuration. LSMs
that make use of NetLabel, i.e. SELinux, can use this new function to determine
if they should perform NetLabel access checks. This patch changes the
NetLabel/SELinux glue code such that SELinux only enforces NetLabel related
access checks when netlbl_enabled() returns true.
At present NetLabel is considered to be enabled when there is at least one
labeled protocol configuration present. The result is that by default NetLabel
is considered to be disabled, however, as soon as an administrator configured
a CIPSO DOI definition NetLabel is enabled and SELinux starts enforcing
NetLabel related access controls - including unlabeled packet controls.
This patch also tries to consolidate the multiple "#ifdef CONFIG_NETLABEL"
blocks into a single block to ease future review as recommended by Linus.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
Introduce API to dynamically register and unregister multicast groups.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
XFRM expects xfrm_dst->u.next to be same pointer as dst->next, which
was broken by the dst_entry reordering in commit 1e19e02c~, causing
an oops in xfrm_bundle_ok when walking the bundle upwards.
Kill xfrm_dst->u.next and change the only user to use dst->next instead.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
None of the existing TCP congestion controls use the rtt value pased
in the ca_ops->cong_avoid interface. Which is lucky because seq_rtt
could have been -1 when handling a duplicate ack.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The OPEN_MAX constant is an arbitrary number with no useful relation to
anything. Nothing should be using it. SCM_MAX_FD is just an arbitrary
constant and it should be clear that its value is chosen in net/scm.h
and not actually derived from anything else meaningful in the system.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (53 commits)
[TCP]: Verify the presence of RETRANS bit when leaving FRTO
[IPV6]: Call inet6addr_chain notifiers on link down
[NET_SCHED]: Kill CONFIG_NET_CLS_POLICE
[NET_SCHED]: act_api: qdisc internal reclassify support
[NET_SCHED]: sch_dsmark: act_api support
[NET_SCHED]: sch_atm: act_api support
[NET_SCHED]: sch_atm: Lindent
[IPV6]: MSG_ERRQUEUE messages do not pass to connected raw sockets
[IPV4]: Cleanup call to __neigh_lookup()
[NET_SCHED]: Revert "avoid transmit softirq on watchdog wakeup" optimization
[NETFILTER]: nf_conntrack: UDPLITE support
[NETFILTER]: nf_conntrack: mark protocols __read_mostly
[NETFILTER]: x_tables: add connlimit match
[NETFILTER]: Lower *tables printk severity
[NETFILTER]: nf_conntrack: Don't track locally generated special ICMP error
[NETFILTER]: nf_conntrack: Introduces nf_ct_get_tuplepr and uses it
[NETFILTER]: nf_conntrack: make l3proto->prepare() generic and renames it
[NETFILTER]: nf_conntrack: Increment error count on parsing IPv4 header
[NET]: Add ethtool support for NETIF_F_IPV6_CSUM devices.
[AF_IUCV]: Add lock when updating accept_q
...
The NET_CLS_ACT option is now a full replacement for NET_CLS_POLICE,
remove the old code. The config option will be kept around to select
the equivalent NET_CLS_ACT options for a short time to allow easier
upgrades.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The behaviour of NET_CLS_POLICE for TC_POLICE_RECLASSIFY was to return
it to the qdisc, which could handle it internally or ignore it. With
NET_CLS_ACT however, tc_classify starts over at the first classifier
and never returns it to the qdisc. This makes it impossible to support
qdisc-internal reclassification, which in turn makes it impossible to
remove the old NET_CLS_POLICE code without breaking compatibility since
we have two qdiscs (CBQ and ATM) that support this.
This patch adds a tc_classify_compat function that handles
reclassification the old way and changes CBQ and ATM to use it.
This again is of course not fully backwards compatible with the previous
NET_CLS_ACT behaviour. Unfortunately there is no way to fully maintain
compatibility *and* support qdisc internal reclassification with
NET_CLS_ACT, but this seems like the better choice over keeping the two
incompatible options around forever.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also remove two unnecessary EXPORT_SYMBOLs and move the
nf_conntrack_l3proto_ipv4 declaration to the correct file.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
nf_ct_get_tuple() requires the offset to transport header and that bothers
callers such as icmp[v6] l4proto modules. This introduces new function
to simplify them.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The icmp[v6] l4proto modules parse headers in ICMP[v6] error to get tuple.
But they have to find the offset to transport protocol header before that.
Their processings are almost same as prepare() of l3proto modules.
This makes prepare() more generic to simplify icmp[v6] l4proto module
later.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The accept_queue of an af_iucv socket will be corrupted, if
adding and deleting of entries in this queue occurs at the
same time (connect request from one client, while accept call
is processed for another client).
Solution: add locking when updating accept_q
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Acked-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the needlessly global __inet_twsk_kill() static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset moves non-filesystem interfaces of v9fs from fs/9p to net/9p.
It moves the transport, packet marshalling and connection layers to net/9p
leaving only the VFS related files in fs/9p. This work is being done in
preparation for in-kernel 9p servers as well as alternate 9p clients (other
than VFS).
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
The semantics of not having an add_interface callback are not well
defined, this callback is required because otherwise you cannot obtain
the requested MAC address of the device. Change the documentation to
reflect this, add a note about having no MAC address at all, add a
warning that mac_addr in struct ieee80211_if_init_conf can be NULL and
finally verify that a few callbacks are assigned by way of BUG_ON()
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Remove ieee80211_set_aid_for_sta and associated code.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Generic code to walk through the fields in a radiotap header, accounting
for nasties like extended "field present" bitfields and alignment rules
Signed-off-by: Andy Green <andy@warmcat.com>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Throw out the old mark & sweep garbage collector and put in a
refcounting cycle detecting one.
The old one had a race with recvmsg, that resulted in false positives
and hence data loss. The old algorithm operated on all unix sockets
in the system, so any additional locking would have meant performance
problems for all users of these.
The new algorithm instead only operates on "in flight" sockets, which
are very rare, and the additional locking for these doesn't negatively
impact the vast majority of users.
In fact it's probable, that there weren't *any* heavy senders of
sockets over sockets, otherwise the above race would have been
discovered long ago.
The patch works OK with the app that exposed the race with the old
code. The garbage collection has also been verified to work in a few
simple cases.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on <draft-ietf-ipv6-deprecate-rh0-00.txt>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
To better support and handle eSCO links in the future a bunch of
constants needs to be added and some basic routines need to be
updated. This is the initial step.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Convert DEBUGP to pr_debug and fix lots of non-compiling debug statements.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate the last global list searched for every new connection.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a last step of preventing DoS by creating lots of expectations, this
patch introduces a global maximum and a sysctl to control it. The default
is initialized to 4 * the expectation hash table size, which results in
1/64 of the default maxmimum of conntracks.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch brings back the per-conntrack expectation list that was
removed around 2.6.10 to avoid walking all expectations on expectation
eviction and conntrack destruction.
As these were the last users of the global expectation list, this patch
also kills that.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently all expectations are kept on a global list that
- needs to be searched for every new conncetion
- needs to be walked for evicting expectations when a master connection
has reached its limit
- needs to be walked on connection destruction for connections that
have open expectations
This is obviously not good, especially when considering helpers like
H.323 that register *lots* of expectations and can set up permanent
expectations, but it also allows for an easy DoS against firewalls
using connection tracking helpers.
Use a hashtable for expectations to avoid incurring the search overhead
for every new connection. The default hash size is 1/256 of the conntrack
hash table size, this can be overriden using a module parameter.
This patch only introduces the hash table for expectation lookups and
keeps other users to reduce the noise, the following patches will get
rid of it completely.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since conntrack currently allows to use masks for every bit of both
helper and expectation tuples, we can't hash them and have to keep
them on two global lists that are searched for every new connection.
This patch removes the never used ability to use masks for the
destination part of the expectation tuple and completely removes
masks from helpers since the only reasonable choice is a full
match on l3num, protonum and src.u.all.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently there is a wild mix of nf_conntrack_expect_, nf_ct_exp_,
expect_, exp_, ...
Consistently use nf_ct_ as prefix for exported functions.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
All callers pass NULL, this also doesn't seem very useful for modules.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert conntrack hash to hlists to reduce its size and cache
footprint. Since the default hashsize to max. entries ratio
sucks (1:16), this patch doesn't reduce the amount of memory
used for the hash by default, but instead uses a better ratio
of 1:8, which results in the same max. entries value.
One thing worth noting is early_drop. It really should use LRU,
so it now has to iterate over the entire chain to find the last
unconfirmed entry. Since chains shouldn't be very long and the
entire operation is very rare this shouldn't be a problem.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This kills the global 'destroy' operation which was used by NAT.
Instead it uses the extension infrastructure so that multiple
extensions can register own operations.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now memory space for help and NAT are allocated by extension
infrastructure.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
I will split 'struct nf_nat_info' out from conntrack. So I cannot use
'offsetof' to get the pointer to conntrack from it.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Old space allocator of conntrack had problems about extensibility.
- It required slab cache per combination of extensions.
- It expected what extensions would be assigned, but it was impossible
to expect that completely, then we allocated bigger memory object than
really required.
- It needed to search helper twice due to lock issue.
Now basic informations of a connection are stored in 'struct nf_conn'.
And a storage for extension (helper, NAT) is allocated by kmalloc.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This cleanup fell out after adding L2TP support where a new encap_rcv
funcptr was added to struct udp_sock. Have XFRM use the new encap_rcv
funcptr, which allows us to move the XFRM encap code from udp.c into
xfrm4_input.c.
Make xfrm4_rcv_encap() static since it is no longer called externally.
Signed-off-by: James Chapman <jchapman@katalix.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
First IrDA configuration netlink layer implementation.
Currently, we only support the set/get mode commands.
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove stats_lock pointers from qdisc-internal structures, in all cases
it points to dev->queue_lock. The only case where it is necessary is for
top-level qdiscs, where it might also point to dev->ingress_lock in case
of the ingress qdisc. Also remove it from actions completely, it always
points to the actions internal lock.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows other in-kernel functions to do SAD lookups.
The only known user at the moment is pktgen.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is clean-up for XFRM type modules and adds aliases with its
protocol:
ESP, AH, IPCOMP, IPIP and IPv6 for IPsec
ROUTING and DSTOPTS for MIPv6
It is almost the same thing as XFRM mode alias, but it is added
new defines XFRM_PROTO_XXX for preprocessing since some protocols
are defined as enum.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Acked-by: Ingo Oeser <netdev@axxeo.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes MIPv6 loadable module named "mip6".
Here is a modprobe.conf(5) example to load it automatically
when user application uses XFRM state for MIPv6:
alias xfrm-type-10-43 mip6
alias xfrm-type-10-60 mip6
Some MIPv6 feature is not included by this modular, however,
it should not be affected to other features like either IPsec
or IPv6 with and without the patch.
We may discuss XFRM, MH (RAW socket) and ancillary data/sockopt
separately for future work.
Loadable features:
* MH receiving check (to send ICMP error back)
* RO header parsing and building (i.e. RH2 and HAO in DSTOPTS)
* XFRM policy/state database handling for RO
These are NOT covered as loadable:
* Home Address flags and its rule on source address selection
* XFRM sub policy (depends on its own kernel option)
* XFRM functions to receive RO as IPv6 extension header
* MH sending/receiving through raw socket if user application
opens it (since raw socket allows to do so)
* RH2 sending as ancillary data
* RH2 operation with setsockopt(2)
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kill unnecessary CONFIG_IPV6_MIP6.
o It is redundant for RAW socket to keep MH out with the config then
it can handle any protocol.
o Clean-up at AH.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a nested compat attribute type that can be used to convert
attributes that contain a structure to nested attributes in a
backwards compatible way.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add rtnetlink API for creating, changing and deleting software devices.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes the RFCOMM TTY release process so that the TTY is kept
on the list until it is really freed. A new device flag is used to keep
track of released TTYs.
Signed-off-by: Ville Tervo <ville.tervo@nokia.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch enhances TIPC's stream socket send routine so that
it avoids transmitting data in chunks that require fragmentation
and reassembly, thereby improving performance at both the
sending and receiving ends of the connection.
The "maximum packet size" hint that records MTU info allows
the socket to decide how big a chunk it should send; in the
event that the hint has become stale, fragmentation may still
occur, but the data will be passed correctly and the hint will
be updated in time for the following send. Note: The 66060 byte
pseudo-MTU used for intra-node connections requires the send
routine to perform an additional check to ensure it does not
exceed TIPC"s limit of 66000 bytes of user data per chunk.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Jon Paul Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most drivers must handle fragmented HCI data packets and events. This
patch adds a generic function for their reassembly to the Bluetooth
core layer and thus allows to shrink the complexity of the drivers.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Support for the Asix AX88796 network controller, an
NE2000 compatible 10/100 ethernet device with internal
PHY.
The driver supports PHY settings via either ioctl() or
the ethtool driver ops.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Currently, if the socket is owned by the user, we drop the ICMP
message. As a result SCTP forgets that path MTU changed and
never adjusting it's estimate. This causes all subsequent
packets to be fragmented. With this patch, we'll flag the association
that it needs to udpate it's estimate based on the already updated
routing information.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Sridhar Samudrala <sri@us.ibm.com>
Introduce new function sctp_transport_update_pmtu that updates
the transports and destination caches view of the path mtu.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Sridhar Samudrala <sri@us.ibm.com>
From: G. Liakhovetski <gl@dsa-ac.de>
We need to switch to NRM _before_ sending the final packet otherwise
we might hit a race condition where we get the first packet from the
peer while we're still in LAP_XMIT_P.
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current NetLabel code has some redundant APIs which allow both
"struct socket" and "struct sock" types to be used; this may have made
sense at some point but it is wasteful now. Remove the functions that
operate on sockets and convert the callers. Not only does this make
the code smaller and more consistent but it pushes the locking burden
up to the caller which can be more intelligent about the locks. Also,
perform the same conversion (socket to sock) on the SELinux/NetLabel
glue code where it make sense.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we check for permission before deleting entries from SAD and
SPD, (see security_xfrm_policy_delete() security_xfrm_state_delete())
However we are not checking for authorization when flushing the SPD and
the SAD completely. It was perhaps missed in the original security hooks
patch.
This patch adds a security check when flushing entries from the SAD and
SPD. It runs the entire database and checks each entry for a denial.
If the process attempting the flush is unable to remove all of the
entries a denial is logged the the flush function returns an error
without removing anything.
This is particularly useful when a process may need to create or delete
its own xfrm entries used for things like labeled networking but that
same process should not be able to delete other entries or flush the
entire database.
Signed-off-by: Joy Latten<latten@austin.ibm.com>
Signed-off-by: Eric Paris <eparis@parisplace.org>
Signed-off-by: James Morris <jmorris@namei.org>
This reverts changesets:
6aaf47fa48b7b5f487abde34ed91c4fc038410b4
There are still some correctness issues recently
discovered which do not have a known fix that doesn't
involve doing a full hash table scan on port bind.
So revert for now.
Signed-off-by: David S. Miller <davem@davemloft.net>
A time_wait socket inherits sk_bound_dev_if from the original socket,
but it is not used when sending ACK packets using ip_send_reply.
Fix by passing the oif to ip_send_reply in struct ip_reply_arg and
use it for output routing.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The unix_state_*() locking macros imply that there is some
rwlock kind of thing going on, but the implementation is
actually a spinlock which makes the code more confusing than
it needs to be.
So use plain unix_state_lock and unix_state_unlock.
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_out_of_resources() and tcp_close() perform the
same checking of number of orphan sockets. Move this
code into common place.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current IPSEC rule resolution behavior we have does not work for a
lot of people, even though technically it's an improvement from the
-EAGAIN buisness we had before.
Right now we'll block until the key manager resolves the route. That
works for simple cases, but many folks would rather packets get
silently dropped until the key manager resolves the IPSEC rules.
We can't tell these folks to "set the socket non-blocking" because
they don't have control over the non-block setting of things like the
sockets used to resolve DNS deep inside of the resolver libraries in
libc.
With that in mind I coded up the patch below with some help from
Herbert Xu which provides packet-drop behavior during larval state
resolution, controllable via sysctl and off by default.
This lays the framework to either:
1) Make this default at some point or...
2) Move this logic into xfrm{4,6}_policy.c and implement the
ARP-like resolution queue we've all been dreaming of.
The idea would be to queue packets to the policy, then
once the larval state is resolved by the key manager we
re-resolve the route and push the packets out. The
packets would timeout if the rule didn't get resolved
in a certain amount of time.
Signed-off-by: David S. Miller <davem@davemloft.net>
The L2CAP configuration parameter handling was missing the support
for rejecting unknown options. The capability to reject unknown
options is mandatory since the Bluetooth 1.2 specification. This
patch implements its and also simplifies the parameter parsing.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
These are also in include/net/netfilter/nf_conntrack_helper.h
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
nf_nat_rule_find, alloc_null_binding and alloc_null_binding_confirmed
do not use the argument 'info', which is actually ct->nat.info.
If they are necessary to access it again, we can use the argument 'ct'
instead.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
__udp_lib_port_inuse() cannot make direct references to
inet_sk(sk)->rcv_saddr as that is ipv4 specific state and
this code is used by ipv6 too.
Use an operations vector to solve this, and this also paves
the way for ipv6 support for non-wild saddr hashing in UDP.
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge all compat ioctl handling into compat_ioctl.c instead of splitting it
over compat.c and compat_ioctl.c. This also allows to get rid of ioctl32.h
Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks-good-to: Andi Kleen <ak@suse.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The routines that interrogate the ieee80211_geo struct are missing a
channel to frequency entry. This patch adds it.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
During the INIT/COOKIE-ACK collision cases, it's possible to get
into a situation where the association id is not yet set at the time
of the user event generation. As a result, user events have an
association id set to 0 which will confuse applications.
This happens if we hit case B of duplicate cookie processing.
In the particular example found and provided by Oscar Isaula
<Oscar.Isaula@motorola.com>, flow looks like this:
A B
---- INIT-------> (lost)
<---------INIT------
---- INIT-ACK--->
<------ Cookie ECHO
When the Cookie Echo is received, we end up trying to update the
association that was created on A as a result of the (lost) INIT,
but that association doesn't have the ID set yet.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aggregate the SPD info TLVs.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aggregate the SAD info TLVs.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the inital implementation we missed to implement a skb backlog
queue . The result is that socket receive processing tossed packets.
Since AF_IUCV connections are working synchronously it leads to
connection hangs. Problems with read, close and select also occured.
Using a skb backlog queue is fixing all of these problems .
Signed-off-by: Jennifer Hunt <jenhunt@us.ibm.com>
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) struct ip6_flowlabel : moves 'users' field to avoid two 32bits
holes for 64bit arches. Shrinks by 8 bytes sizeof(struct
ip6_flowlabel)
2) ipv6_addr_cmp() and ipv6_addr_copy() dont need (void *) casts :
Compiler might take into account natural alignement of in6_addr
structs to emit better code for memcpy()/memcmp() Casts to (void *)
force byte accesses.
3) ipv6_addr_prefix() optimization :
Better to clear whole struct, as compiler can emit better code for
memset(addr, 0, 16) (2 stores on x86_64), and avoid some conditional
branches.
# size vmlinux.after vmlinux.before
text data bss dec hex filename
5262262 647612 557432 6467306 62aeea vmlinux.after
5262550 647612 557432 6467594 62b00a vmlinux.before
thats 288 bytes saved.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP has a transitional state when SACK is not in use during
which this invariant is temporarily broken. Without SACK,
tcp_clean_rtx_queue does not decrement sacked_out. Therefore
calls to tcp_sync_left_out before sacked_out is again
corrected by tcp_fastretrans_alert can trigger this trap as
sacked_out still has couple of segments that are already out
of window.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
__HAVE_ARCH_ADDR_SET seems unused these days, just get rid of it.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
[IPV4] SNMP: Support OutMcastPkts and OutBcastPkts
[IPV4] SNMP: Support InMcastPkts and InBcastPkts
[IPV4] SNMP: Support InTruncatedPkts
[IPV4] SNMP: Support InNoRoutes
[SNMP]: Add definitions for {In,Out}BcastPkts
[TCP] FRTO: RFC4138 allows Nagle override when new data must be sent
[TCP] FRTO: Delay skb available check until it's mandatory
[XFRM]: Restrict upper layer information by bundle.
[TCP]: Catch skb with S+L bugs earlier
[PATCH] INET : IPV4 UDP lookups converted to a 2 pass algo
[L2TP]: Add the ability to autoload a pppox protocol module.
[SKB]: Introduce skb_queue_walk_safe()
[AF_IUCV/IUCV]: smp_call_function deadlock
[IPV6]: Fix slab corruption running ip6sic
[TCP]: Update references in two old comments
[XFRM]: Export SPD info
[IPV6]: Track device renames in snmp6.
[SCTP]: Fix sctp_getsockopt_local_addrs_old() to use local storage.
[NET]: Remove NETIF_F_INTERNAL_STATS, default to internal stats.
[NETPOLL]: Remove CONFIG_NETPOLL_RX
...
This is a corner case where less than MSS sized new data thingie
is awaiting in the send queue. For F-RTO to work correctly, a
new data segment must be sent at certain point or F-RTO cannot
be used at all. RFC4138 allows overriding of Nagle at that
point.
Implementation uses frto_counter states 2 and 3 to distinguish
when Nagle override is needed.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
On MIPv6 usage, XFRM sub policy is enabled.
When main (IPsec) and sub (MIPv6) policy selectors have the same
address set but different upper layer information (i.e. protocol
number and its ports or type/code), multiple bundle should be created.
However, currently we have issue to use the same bundle created for
the first time with all flows covered by the case.
It is useful for the bundle to have the upper layer information
to be restructured correctly if it does not match with the flow.
1. Bundle was created by two policies
Selector from another policy is added to xfrm_dst.
If the flow does not match the selector, it goes to slow path to
restructure new bundle by single policy.
2. Bundle was created by one policy
Flow cache is added to xfrm_dst as originated one. If the flow does
not match the cache, it goes to slow path to try searching another
policy.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
SACKED_ACKED and LOST are mutually exclusive with SACK, thus
having their sum larger than packets_out is bug with SACK.
Eventually these bugs trigger traps in the tcp_clean_rtx_queue
with SACK but it's much more informative to do this here.
Non-SACK TCP, however, could get more than packets_out duplicate
ACKs which each increment sacked_out, so it makes sense to do
this kind of limitting for non-SACK TCP but not for SACK enabled
one. Perhaps the author had the opposite in mind but did the
logic accidently wrong way around? Anyway, the sacked_out
incrementer code for non-SACK already deals this issue before
calling sync_left_out so this trapping can be done
unconditionally.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calling smp_call_function can lead to a deadlock if it is called
from tasklet context.
Fixing this deadlock requires to move the smp_call_function from the
tasklet context to a work queue. To do that queue the path pending
interrupts to a separate list and move the path cleanup out of
iucv_path_sever to iucv_path_connect and iucv_path_pending.
This creates a new requirement for iucv_path_connect: it may not be
called from tasklet context anymore.
Also fixed compile problem for CONFIG_HOTPLUG_CPU=n and
another one when walking the cpu_online mask. When doing this,
we must disable cpu hotplug.
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this patch you can use iproute2 in user space to efficiently see
how many policies exist in different directions.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
IEEE80211_RADIOTAP_FCS is obsolete and should not be used. It's no
longer defined. Remove it from the comment too.
Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
After 13 years of use, it looks like my email address is finally going
to disappear. While this is likely to drop the amount of incoming spam
greatly ;-), it may also affect more appropriate messages, so let's
update my email address in various places. In addition, Host AP mailing
list is subscribers-only and linux-wireless can also be used for
discussing issues related to this driver which is now shown in
MAINTAINERS.
Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Document that all fields must be little endian. Use annotated types
even in the comments. Consistently use shorter type names (u8, s8).
Realign the comments.
Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add the Marvell Libertas 8388 802.11 USB driver.
Signed-off-by: Marcelo Tosatti <marcelo@kvack.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Fix miscellaneous networking compilation errors.
(*) Export ktime_add_ns() for modules.
(*) wext_proc_init() should have an ANSI declaration.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch cleans up the call paths from the core code into wext.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an interface to the AF_RXRPC module so that the AFS filesystem module can
more easily make use of the services available. AFS still opens a socket but
then uses the action functions in lieu of sendmsg() and registers an intercept
functions to grab messages before they're queued on the socket Rx queue.
This permits AFS (or whatever) to:
(1) Avoid the overhead of using the recvmsg() call.
(2) Use different keys directly on individual client calls on one socket
rather than having to open a whole slew of sockets, one for each key it
might want to use.
(3) Avoid calling request_key() at the point of issue of a call or opening of
a socket. This is done instead by AFS at the point of open(), unlink() or
other VFS operation and the key handed through.
(4) Request the use of something other than GFP_KERNEL to allocate memory.
Furthermore:
(*) The socket buffer markings used by RxRPC are made available for AFS so
that it can interpret the cooked RxRPC messages itself.
(*) rxgen (un)marshalling abort codes are made available.
The following documentation for the kernel interface is added to
Documentation/networking/rxrpc.txt:
=========================
AF_RXRPC KERNEL INTERFACE
=========================
The AF_RXRPC module also provides an interface for use by in-kernel utilities
such as the AFS filesystem. This permits such a utility to:
(1) Use different keys directly on individual client calls on one socket
rather than having to open a whole slew of sockets, one for each key it
might want to use.
(2) Avoid having RxRPC call request_key() at the point of issue of a call or
opening of a socket. Instead the utility is responsible for requesting a
key at the appropriate point. AFS, for instance, would do this during VFS
operations such as open() or unlink(). The key is then handed through
when the call is initiated.
(3) Request the use of something other than GFP_KERNEL to allocate memory.
(4) Avoid the overhead of using the recvmsg() call. RxRPC messages can be
intercepted before they get put into the socket Rx queue and the socket
buffers manipulated directly.
To use the RxRPC facility, a kernel utility must still open an AF_RXRPC socket,
bind an addess as appropriate and listen if it's to be a server socket, but
then it passes this to the kernel interface functions.
The kernel interface functions are as follows:
(*) Begin a new client call.
struct rxrpc_call *
rxrpc_kernel_begin_call(struct socket *sock,
struct sockaddr_rxrpc *srx,
struct key *key,
unsigned long user_call_ID,
gfp_t gfp);
This allocates the infrastructure to make a new RxRPC call and assigns
call and connection numbers. The call will be made on the UDP port that
the socket is bound to. The call will go to the destination address of a
connected client socket unless an alternative is supplied (srx is
non-NULL).
If a key is supplied then this will be used to secure the call instead of
the key bound to the socket with the RXRPC_SECURITY_KEY sockopt. Calls
secured in this way will still share connections if at all possible.
The user_call_ID is equivalent to that supplied to sendmsg() in the
control data buffer. It is entirely feasible to use this to point to a
kernel data structure.
If this function is successful, an opaque reference to the RxRPC call is
returned. The caller now holds a reference on this and it must be
properly ended.
(*) End a client call.
void rxrpc_kernel_end_call(struct rxrpc_call *call);
This is used to end a previously begun call. The user_call_ID is expunged
from AF_RXRPC's knowledge and will not be seen again in association with
the specified call.
(*) Send data through a call.
int rxrpc_kernel_send_data(struct rxrpc_call *call, struct msghdr *msg,
size_t len);
This is used to supply either the request part of a client call or the
reply part of a server call. msg.msg_iovlen and msg.msg_iov specify the
data buffers to be used. msg_iov may not be NULL and must point
exclusively to in-kernel virtual addresses. msg.msg_flags may be given
MSG_MORE if there will be subsequent data sends for this call.
The msg must not specify a destination address, control data or any flags
other than MSG_MORE. len is the total amount of data to transmit.
(*) Abort a call.
void rxrpc_kernel_abort_call(struct rxrpc_call *call, u32 abort_code);
This is used to abort a call if it's still in an abortable state. The
abort code specified will be placed in the ABORT message sent.
(*) Intercept received RxRPC messages.
typedef void (*rxrpc_interceptor_t)(struct sock *sk,
unsigned long user_call_ID,
struct sk_buff *skb);
void
rxrpc_kernel_intercept_rx_messages(struct socket *sock,
rxrpc_interceptor_t interceptor);
This installs an interceptor function on the specified AF_RXRPC socket.
All messages that would otherwise wind up in the socket's Rx queue are
then diverted to this function. Note that care must be taken to process
the messages in the right order to maintain DATA message sequentiality.
The interceptor function itself is provided with the address of the socket
and handling the incoming message, the ID assigned by the kernel utility
to the call and the socket buffer containing the message.
The skb->mark field indicates the type of message:
MARK MEANING
=============================== =======================================
RXRPC_SKB_MARK_DATA Data message
RXRPC_SKB_MARK_FINAL_ACK Final ACK received for an incoming call
RXRPC_SKB_MARK_BUSY Client call rejected as server busy
RXRPC_SKB_MARK_REMOTE_ABORT Call aborted by peer
RXRPC_SKB_MARK_NET_ERROR Network error detected
RXRPC_SKB_MARK_LOCAL_ERROR Local error encountered
RXRPC_SKB_MARK_NEW_CALL New incoming call awaiting acceptance
The remote abort message can be probed with rxrpc_kernel_get_abort_code().
The two error messages can be probed with rxrpc_kernel_get_error_number().
A new call can be accepted with rxrpc_kernel_accept_call().
Data messages can have their contents extracted with the usual bunch of
socket buffer manipulation functions. A data message can be determined to
be the last one in a sequence with rxrpc_kernel_is_data_last(). When a
data message has been used up, rxrpc_kernel_data_delivered() should be
called on it..
Non-data messages should be handled to rxrpc_kernel_free_skb() to dispose
of. It is possible to get extra refs on all types of message for later
freeing, but this may pin the state of a call until the message is finally
freed.
(*) Accept an incoming call.
struct rxrpc_call *
rxrpc_kernel_accept_call(struct socket *sock,
unsigned long user_call_ID);
This is used to accept an incoming call and to assign it a call ID. This
function is similar to rxrpc_kernel_begin_call() and calls accepted must
be ended in the same way.
If this function is successful, an opaque reference to the RxRPC call is
returned. The caller now holds a reference on this and it must be
properly ended.
(*) Reject an incoming call.
int rxrpc_kernel_reject_call(struct socket *sock);
This is used to reject the first incoming call on the socket's queue with
a BUSY message. -ENODATA is returned if there were no incoming calls.
Other errors may be returned if the call had been aborted (-ECONNABORTED)
or had timed out (-ETIME).
(*) Record the delivery of a data message and free it.
void rxrpc_kernel_data_delivered(struct sk_buff *skb);
This is used to record a data message as having been delivered and to
update the ACK state for the call. The socket buffer will be freed.
(*) Free a message.
void rxrpc_kernel_free_skb(struct sk_buff *skb);
This is used to free a non-DATA socket buffer intercepted from an AF_RXRPC
socket.
(*) Determine if a data message is the last one on a call.
bool rxrpc_kernel_is_data_last(struct sk_buff *skb);
This is used to determine if a socket buffer holds the last data message
to be received for a call (true will be returned if it does, false
if not).
The data message will be part of the reply on a client call and the
request on an incoming call. In the latter case there will be more
messages, but in the former case there will not.
(*) Get the abort code from an abort message.
u32 rxrpc_kernel_get_abort_code(struct sk_buff *skb);
This is used to extract the abort code from a remote abort message.
(*) Get the error number from a local or network error message.
int rxrpc_kernel_get_error_number(struct sk_buff *skb);
This is used to extract the error number from a message indicating either
a local error occurred or a network error occurred.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provide AF_RXRPC sockets that can be used to talk to AFS servers, or serve
answers to AFS clients. KerberosIV security is fully supported. The patches
and some example test programs can be found in:
http://people.redhat.com/~dhowells/rxrpc/
This will eventually replace the old implementation of kernel-only RxRPC
currently resident in net/rxrpc/.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- make the following needlessly global variables static:
- core/rtnetlink.c: struct rtnl_msg_handlers[]
- netfilter/nf_conntrack_proto.c: struct nf_ct_protos[]
- make the following needlessly global functions static:
- core/rtnetlink.c: rtnl_dump_all()
- netlink/af_netlink.c: netlink_queue_skip()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
On a system with a lot of SAs, counting SAD entries chews useful
CPU time since you need to dump the whole SAD to user space;
i.e something like ip xfrm state ls | grep -i src | wc -l
I have seen taking literally minutes on a 40K SAs when the system
is swapping.
With this patch, some of the SAD info (that was already being tracked)
is exposed to user space. i.e you do:
ip xfrm state count
And you get the count; you can also pass -s to the command line and
get the hash info.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the non-proc SNMP code into addrconf.c and reuses
IPv4 SNMP code where applicable.
As a result we can skip proc.o if /proc is disabled.
Note that I've made a number of functions static since they're only
used by addrconf.c for now. If they ever get used elsewhere we can
always remove the static.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the SNMP code shared between IPv4/IPv6 from proc.c
into net/ipv4/af_inet.c. This makes sense because these functions
aren't specific to /proc.
As a result we can again skip proc.o if /proc is disabled.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a comment that was part of my rtnl locking patch for
cfg80211 but which I forgot for the merge.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do some simple changes to make congestion control API faster/cleaner.
* use ktime_t rather than timeval
* merge rtt sampling into existing ack callback
this means one indirect call versus two per ack.
* use flags bits to store options/settings
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As scheduled, this patch removes the pointless wext over netlink code.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch creates the core cfg80211 code along with some sysfs bits.
This is a stripped down version to allow mac80211 to function, but
doesn't include any configuration yet except for creating and removing
virtual interfaces.
This patch includes the nl80211 header file but it only contains the
interface types which the cfg80211 interface for creating virtual
interfaces relies on.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hint from David Miller <davem@davemloft.net>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because stats pointer may not be aligned for u64, use memcpy
to fill u64 values.
Issue reported by David Miller <davem@davemloft.net>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
It is far too large to be an inline and not in any hot paths.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function is quite big and has several call sites and nothing
to collapse by compiler optimization on inlining.
Besides it's nicer to read in a in .c file.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This also fixes memory leak in error path.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we're now holding the rtnl during the entire dump operation, we
can remove qdisc_tree_lock, whose only purpose is to protect dump
callbacks from concurrent changes to the qdisc tree.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a transmitted packet is looped back directly, CHECKSUM_PARTIAL
maps to the semantics of CHECKSUM_UNNECESSARY. Therefore we should
treat it as such in the stack.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace the probing based MTU estimation, which usually takes 2-3 iterations
to find a fitting value and may underestimate the MTU, by an exact calculation.
Also fix underestimation of the XFRM trailer_len, which causes unnecessary
reallocations.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move generic skbuff stuff from XFRM code to generic code so that
AF_RXRPC can use it too.
The kdoc comments I've attached to the functions needs to be checked
by whoever wrote them as I had to make some guesses about the workings
of these functions.
Signed-off-By: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For consistency with other skb data accessors, reducing the number of direct
accesses to skb->data.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The results of FIB rules lookups are cached in the routing cache
except for IPv6 as no such cache exists. So far, it was the
responsibility of the user to flush the cache after modifying any
rules. This lead to many false bug reports due to misunderstanding
of this concept.
This patch automatically flushes the route cache after inserting
or deleting a rule.
Thanks to Muli Ben-Yehuda <muli@il.ibm.com> for catching a bug
in the previous patch.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a new rule action FR_ACT_GOTO which allows
to skip a set of rules by jumping to another rule. The rule
to jump to is specified via the FRA_GOTO attribute which
carries a rule preference.
Referring to a rule which doesn't exists is explicitely allowed.
Such goto rules are marked with the flag FIB_RULE_UNRESOLVED
and will act like a rule with a non-matching selector. The rule
will become functional as soon as its target is present.
The goto action enables performance optimizations by reducing
the average number of rules that have to be passed per lookup.
Example:
0: from all lookup local
40: not from all to 192.168.23.128 goto 32766
41: from all fwmark 0xa blackhole
42: from all fwmark 0xff blackhole
32766: from all lookup main
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The days are gone when this was not an issue, there are folks out
there with huge bot networks that can be used to attack the
established hash tables on remote systems.
So just like the routing cache and connection tracking
hash, use Jenkins hash with random secret input.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces a new NLA_BINARY attribute policy type with the
verification of simply checking the maximum length of the payload.
It also fixes a small typo in the example.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
As stated in the sctp socket api draft:
sac_info: variable
If the sac_state is SCTP_COMM_LOST and an ABORT chunk was received
for this association, sac_info[] contains the complete ABORT chunk as
defined in the SCTP specification RFC2960 [RFC2960] section 3.3.7.
We now save received ABORT chunks into the sac_info field and pass that
to the user.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Parameters only take effect when a corresponding flag bit is set
and a value is specified. This means we need to check the flags
in addition to checking for non-zero value.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This option induces partial delivery to run as soon
as the specified amount of data has been accumulated on
the association. However, we give preference to fully
reassembled messages over PD messages. In any case,
window and buffer is freed up.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@.hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This option was introduced in draft-ietf-tsvwg-sctpsocket-13. It
prevents head-of-line blocking in the case of one-to-many endpoint.
Applications enabling this option really must enable SCTP_SNDRCV event
so that they would know where the data belongs. Based on an
earlier patch by Ivan Skytte Jørgensen.
Additionally, this functionality now permits multiple associations
on the same endpoint to enter Partial Delivery. Applications should
be extra careful, when using this functionality, to track EOR indicators.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Uninline tcf_destroy and add a helper function to destroy an entire filter
chain.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The error pointer argument in netlink message handlers is used
to signal the special case where processing has to be interrupted
because a dump was started but no error happened. Instead it is
simpler and more clear to return -EINTR and have netlink_run_queue()
deal with getting the queue right.
nfnetlink passed on this error pointer to its subsystem handlers
but only uses it to signal the start of a netlink dump. Therefore
it can be removed there as well.
This patch also cleans up the error handling in the affected
message handlers to be consistent since it had to be touched anyway.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implements a unified, protocol independant rules dumping function
which is capable of both, dumping a specific protocol family or
all of them. This speeds up dumping as less lookups are required.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a new interface to register rtnetlink message
handlers replacing the exported rtnl_links[] array which
required many message handlers to be exported unnecessarly.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
on 64bit architectures, allowing us to combine the 4 bytes hole left by the
layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
:-)
Many calculations that previously required that skb->{transport,network,
mac}_header be first converted to a pointer now can be done directly, being
meaningful as offsets or pointers.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that all packet schedulers have been converted to hrtimers most users
of PSCHED_JIFFIE2US and PSCHED_US2JIFFIE are gone. The remaining users use
it to convert external time units to packet scheduler clock ticks, so use
PSCHED_TICKS_PER_SEC instead.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of the manual clock source selection mess and use ktime. Also
use a scalar representation, which allows to clean up pkt_sched.h a bit
more and results in less ktime_to_ns() calls in most cases.
The PSCHED_US2JIFFIE/PSCHED_JIFFIE2US macros are implemented quite
inefficient by this patch, following patches will convert all qdiscs
to hrtimers and get rid of them entirely.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove ugly special-casing of nf_conntrack_l4proto_generic, all it
wants is its sysctl tables registered, so do that explicitly in an
init function and move the remaining protocol initialization and
cleanup code to nf_conntrack_proto.c as well.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the obsolete IPv4 only connection tracking/NAT as scheduled in
feature-removal-schedule.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the places where we need a pointer to the transport header, it is
still legal to touch skb->h.raw directly if just adding to,
subtracting from or setting it to another layer header.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the quite common 'skb->h.raw - skb->data' sequence.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now the skb->nh union has just one member, .raw, i.e. it is just like the
skb->mac union, strange, no? I'm just leaving it like that till the transport
layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or
->mac_header_offset?), ditto for ->{h,nh}.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the common sequence "skb->nh.iph->ihl * 4", removing a good number of open
coded skb->nh.iph uses, now to go after the rest...
Just out of curiosity, here are the idioms found to get the same result:
skb->nh.iph->ihl << 2
skb->nh.iph->ihl<<2
skb->nh.iph->ihl * 4
skb->nh.iph->ihl*4
(skb->nh.iph)->ihl * sizeof(u32)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the places where we need a pointer to the network header, it is still legal
to touch skb->nh.raw directly if just adding to, subtracting from or setting it
to another layer header.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the common, open coded 'skb->nh.raw = skb->data' operation, so that we can
later turn skb->nh.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.
This one touches just the most simple case, next will handle the slightly more
"complex" cases.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We'll have skb_reset_network_header soon.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can
later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.
This one touches just the most simple case, next will handle the slightly more
"complex" cases.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that network timestamps use ktime_t infrastructure, we can add a new
SOL_SOCKET sockopt SO_TIMESTAMPNS.
This command is similar to SO_TIMESTAMP, but permits transmission of
a 'timespec struct' instead of a 'timeval struct' control message.
(nanosecond resolution instead of microsecond)
Control message is labelled SCM_TIMESTAMPNS instead of SCM_TIMESTAMP
A socket cannot mix SO_TIMESTAMP and SO_TIMESTAMPNS : the two modes are
mutually exclusive.
sock_recv_timestamp() became too big to be fully inlined so I added a
__sock_recv_timestamp() helper function.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: linux-arch@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Covert network warning messages from a compile time to runtime choice.
Removes kernel config option and replaces it with new /proc/sys/net/core/warnings.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now network timestamps use ktime_t infrastructure, we can add a new
ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'.
User programs can thus access to nanosecond resolution.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows the write queue implementation to be changed,
for example, to one which allows fast interval searching.
Signed-off-by: David S. Miller <davem@davemloft.net>